Co-authored with Claude. Final editorial judgment and accountability are Alex's.
Pricing as of April 2026. All figures use Opus 4.6 API rates, warm cache unless noted. Subscription plans (Pro, Max) package this spend into usage allocations — the mechanics still apply.
Part 3 of 3. The thread running through all of it: most of what makes Claude Code expensive isn't the prompts you write — it's the habits you don't think about.
Part 1 was about when to stop and when to start fresh. Part 2 was about where to put the work. This one is about something even more invisible: the weight of everything that lands in your context and never leaves — and what happens when the system tries to fix it.
Tool Output That Lives Forever
When Claude runs a tool — a test suite, a build, a grep across the codebase — the full output lands in the conversation. Not a summary. The raw output. Every line, every stack trace, every passing test message.
That output stays in context for the rest of the session. It gets reprocessed on every subsequent turn at cache-read rates. A 10,000-token test log from turn 8 is still sitting there at turn 40, still costing $0.005 per turn to carry. You ran the test once. You looked at the result once. You're paying for it thirty-two more times.
What I Was Doing
Three flavors, all common.
The build log. I'd ask Claude to run a build. Builds are verbose — a typical npm or Drupal build dumps 5,000–20,000 tokens. Most of it is progress messages, deprecation warnings, dependency resolution noise. The two lines that matter — success or failure and maybe an error — are buried in thousands of tokens I'll never reference again.
The test suite. Same pattern, worse magnitude. Running a full test suite dumps every passing test, every assertion, every setup/teardown message. A 100-test PHPUnit run can produce 15,000–30,000 tokens. I need the failures. I don't need the 95 passes. But all of it is in context now, forever.
The exploratory grep. "Search the codebase for all uses of this function." Claude runs grep, finds 47 matches across 30 files. The full output — file paths, line numbers, surrounding context — lands as a single tool result. Maybe 8,000 tokens. I scan the list, identify the three files I actually need, and move on. The other 44 matches ride along for the rest of the session.
The Numbers
A developer who generates two moderate tool dumps per day (a build and a test run, 10K tokens each), reprocessed on 20 subsequent turns: 400K tokens of reprocessing per day, 100M per year.
At cache-read rates ($0.50/MTok): ~$50/year. Mix in cold-cache turns from Part 1 and the effective rate climbs toward $5/MTok, pushing the figure toward $500. Larger dumps push it higher. A developer routinely pasting 30K build logs: $150–$1,500/year depending on cache discipline.
The Sync Tool Call Trap
There's a nastier version that connects back to Part 1. When Claude kicks off a long synchronous tool call — a test suite, a build — it waits for the result. While it waits, the cache is aging. If the tool takes more than five minutes, the cache expires. That single turn — where Claude processes the tool result and continues — is the most expensive turn of the session. Full re-cache of the entire conversation, plus the new tool output on top.
Tool output is the most insidious cost in the series. The idle tax (Part 1) is visible if you know to look. Context rot correlates with session length, which you can feel. File bloat and delegation results (Part 2) are tied to actions you took. But tool output carrying cost is completely invisible — background radiation, always adding up, never flagged in any single turn.
The Compact Gamble
At some point in a long session, Claude Code will auto-compact. It summarizes the conversation, drops the original messages, and continues with a shorter version. This is supposed to save you money and keep the session workable. Sometimes it does. Sometimes it drops the one constraint you needed and the next ten turns go sideways.
You don't get to review the summary before it replaces your conversation. You don't get to undo it. And it fires at the worst possible moment.
How It Works
Auto-compact triggers when context approaches the model's window limit — at maximum context size, when the session is at its longest, most complex, most nuanced. The summarizer compresses everything into a fraction of its original length, deciding what's important without knowing what you'll need next.
The summary keeps the obvious stuff: what files were changed, the current task, the last decision. It drops what seems redundant: a constraint mentioned at turn 3, a failed approach you want to avoid, a specific number from a requirements discussion.
The Lost Constraint
I was working on a Drupal migration. At turn 8, I told Claude that the target database had a specific column length limitation — varchar(128), not varchar(255). Twenty turns later, context was massive, auto-compact fired, and the summary dropped that constraint. It was one sentence in a long session. Claude proceeded with varchar(255) mappings for the next ten turns. I didn't catch it until the migration failed in staging.
Those ten turns weren't just wasted time. They were wasted tokens — every turn carrying growing context of implementation built on a dropped constraint. Then the undo and redo cost another fifteen turns. Roughly $3–4 of wasted work from a single dropped sentence.
Compaction sounds like a feature saving you money. At 80–90% fidelity, it is — shedding dead weight while keeping what matters. Below 60%, you're likely to lose something important. You never know which number you got until something goes wrong. There's no undo. The pre-compact conversation is gone.
The Helpers: Make the Weight Visible
Two hooks from claude-cost-helpers cover both sides. watching-cost tracks tool output size — warning when a dump is heavy, when cumulative output is climbing, and when a verbose command should go to a file or CI instead. compact-safety writes a snapshot before auto-compact fires and reminds Claude to verify constraints survived the compression. Both warn but never block. Install details in the helper guide.
What Changes
The warning changed how I prompt. Before the hook, "run the tests" was my default. Now when the hook flags a 15K-token test dump, I think about it differently next time: "run the tests and show me only failures." Instead of "search for X," I try "search for X and return only the file paths." Claude is good at filtering if you ask — the unfiltered dump is the default only because nobody asked for concise.
I'm getting better at redirecting output to files instead of letting it stream into the conversation. And for the heavy stuff — full test suites, builds, deployments — the warning is a reminder that those might belong in CI. CI runs the job; Claude reads the result artifact. One file read instead of thousands of tokens permanently in context.
The compact snapshot is the safety net. But the biggest change came from upstream — the just-one-more-turn helper from Part 1 warns when sessions are getting long, and saving with /save-session before hitting the compact threshold is the one practice I've almost nailed. My curated handoff is higher fidelity than auto-compact's summary because I choose what to keep: constraints, decisions, the things a cold reader would need. When compact fires despite that — and it still does sometimes, especially when context starts heavier than I expected — the snapshot gives me something to check against immediately instead of discovering ten turns later that a constraint was dropped.
Across a team
If you're managing this across a team, start with two questions: which sessions are dragging around huge piles of tool output, and how often is compact firing at all? Sessions with high output-to-input ratios are easy to spot in any cost dashboard. If compact events are frequent, the upstream fixes from Parts 1 and 2 aren't landing. When they're rare, people have usually internalized the session-timing and delegation habits already. When they're frequent, the workflow problem will show up in the dashboard before it shows up in the bill.
The Rule
If you use it: Install both hooks. When watching-cost warns about a big dump, filter the next call, redirect to a file, or run it in CI. When compact fires, check the snapshot before continuing.
If you manage it: Sessions with high output-to-input ratios and frequent compact events are the ones to investigate first. If compact is firing often across the team, the upstream habits from Parts 1 and 2 aren't landing.
If you pay for it: Tool-output carrying cost is the most invisible line item in Claude Code spend. Every token that lands in context stays there until the session ends — or until compaction drops it and you have to guess what went missing.
The Series
Three posts, six patterns, six helpers:
- Session Timing — Gaps kill the cache. Sessions that go too long get expensive. The hooks warn you when you're cold and when you're carrying too much.
- The Agent Tradeoff — Delegation keeps files out of your context, but agent results flow back in. The hooks show you both sides.
- What You Carry — Tool output is permanent. Compaction is lossy. The hooks track what each dump costs and give you a safety net when the system summarizes.
All six helpers are in claude-cost-helpers. Install whichever ones match your workflow, or install all of them — they're independent hooks that don't conflict.
The common thread across all six: the expensive thing is never the prompt you write. It's the context you carry. Every helper in this series points at the same thing: what you're carrying and what it costs. Keep the context lean, keep the cache warm, and when a session has served its purpose, let it go.
Visuals: tool dump slider at econ-mini-05-output-in-context.html, sync tool call at econ-v3d-toolcall.html, compaction fidelity at econ-mini-04-compact-loss.html — embeddable via iframe, dark-mode aware, color-blind safe.
Code: zivtech/claude-cost-helpers on GitHub. GPL-3.0-or-later licensed.