← Back to blog

Token Cost Engineering: How PEtFiSh Saves 20% on Every Long Session

May 2026 · Research Report

If you run AI coding agents on real tasks, you've noticed: long sessions get expensive. Not because the model is slow, but because compaction — the mechanism that summarizes conversation history when context fills up — is shockingly costly. Each compaction event burns 50K-80K tokens in overhead.

We ran two controlled experiments to understand and reduce this cost. The results surprised us.

-20%
Total token savings
-50%
Compaction frequency
0%
Quality loss
The dominant cost driver in AI agent sessions isn't prompt size or response length — it's how often compaction fires.

The Problem: Why v0.11.0 Regressed 37%

PEtFiSh v0.11.0 introduced a tiered architecture for agent rules: instead of one 1037-line inline file, rules were split into a 57-line entry point plus 7 on-demand sub-files. Cleaner, more maintainable.

But A/B testing revealed a 36.6% token regression. The reason: dynamically loaded rules land in uncached conversation context. They accumulate with each tool call, inflating the context window faster, triggering more compactions (2→3), each costing 50-80K tokens.

The fix wasn't "go back to inline." It was understanding where rules live in the LLM's memory architecture.

Experiment 1: System Prompt Injection

We built two plugins using OpenCode's experimental.chat.system.transform hook to move rules back into the cached system prompt prefix:

Results (21 messages, 3 topics, claude-sonnet-4)

MetricBaseline (v0.10.x)All-Rules PluginDelta
Total tokens586,917475,039-19.1%
Input tokens455,533327,834-28.0%
Compactions21-50%
Peak context152,990145,530-4.9%

Smart-rules achieved 12.3% savings but proved fragile — silent failures on missing mappings, false-positive keyword matching, manual maintenance burden. For rule sets under 30K tokens, all-rules wins on every dimension.

Key Insight

The 20-token overhead of injecting all rules into system prompt is negligible. What matters is that cached prefix content doesn't count toward compaction threshold accumulation. One fewer compaction = 50-80K tokens saved. The economics are overwhelming.

><(((^>

Experiment 2: Topic-Aware Compaction

A separate study asked: when compaction does fire, can PEtFiSh's topic management make it smarter?

The fish-trail topic system already tracks what you're working on — which topics are active, their relationships, their summaries. We built a Phase 2 plugin that restructures the compaction prompt using this topic data, telling the model: "here are 3 topics, compress each separately, prioritize the active one."

Results (21 messages, 3 interleaved topics, claude-sonnet-4)

MetricBaselineTopic PluginDelta
Total tokens857,115683,522-20.3%
API calls14089-36.4%
Wall time49 min30 min-39.4%
Cache reads10.6M5.3M-49.9%
Recall qualityPassPassNo loss

The Surprise: Behavioral Change

We expected savings from better compression ratios. That's not what happened.

The primary mechanism is behavioral change. When the model receives topic-structured context, it produces more focused responses — fewer intermediate tool calls (4.2/msg vs 6.7/msg), more consolidated answers. This cascades: fewer API calls → less cache reads → faster wall time.

This is why we shelved Phase 3 (pre-computed summaries that skip the LLM): it can't trigger this behavioral effect. The model needs to process topic-structured context during compaction, not just receive a pre-built summary.

><(((^>

What We Learned

  1. Compaction frequency dominates token cost. Everything else — prompt size, output length, caching strategy — is secondary. Reduce compactions and costs drop dramatically.
  2. Cached prefix is free real estate. Rules in system prompt cost almost nothing (cache reads are ~10x cheaper than input tokens). Rules in conversation context are a ticking time bomb toward the next compaction.
  3. Topic structure changes model behavior. Not just compression quality — the model actually becomes more efficient when it has structured context about what it's doing.
  4. Simple beats clever. All-rules (71 lines, zero config) beat Smart-rules (131 lines, registry dependency) on both cost and reliability. Don't optimize what doesn't need optimizing.

Limitations

Try It

Both plugins ship with PEtFiSh. The system prompt plugin is included in the companion pack. The topic-aware compaction plugin is included in the context pack (fish-trail).

# Install PEtFiSh with both plugins
curl -fsSL https://raw.githubusercontent.com/kylecui/petfish.ai/master/remote-install.sh \
  | bash -s -- --pack companion,context --detect

Full research data, A/B test harness, and raw results are in the GitHub repo:

All experiments ran on claude-sonnet-4 via the github-copilot provider in OpenCode.

><(((^>