Co-authored with Claude. Final editorial judgment and accountability are Alex's.
Pricing as of April 2026. All figures use Opus 4.6 API rates, warm cache unless noted. Subscription plans (Pro, Max) package this spend into usage allocations — the mechanics still apply.
Part 2 of 3. Part 1 covered session timing — when to start, when to stop, when to start fresh. This one is about where you put the work.
Part 1 was about the session lifecycle: gaps that kill the cache, and sessions that go too long. This one is about a different question — not when to do the work, but where.
I needed Claude to survey a codebase. Two hundred files across a Drupal project, looking for a specific pattern in how hooks were being used. The kind of task Claude is perfect for. So I asked it to do the work in my main session.
Forty minutes later the answer was good. The session was ruined.
Two Sides of the Same Context
Delegation in Claude Code has two cost surfaces, and they're mirror images of each other. Understanding both is the difference between delegation as a cost optimization and delegation as a cost trap.
Side One: Files That Never Leave
When Claude reads a file in your session, the entire contents become part of the conversation. Every subsequent message rereads all of it. Not a summary. Not a reference. The full text, reprocessed at cache-read rates on every turn.
Read one file? Maybe 2,000–3,000 tokens. No big deal. Read 50 for a focused survey? That's 150,000 tokens of file contents sitting in your context, reprocessed on every turn for the rest of the session. Try to read 200 and you'll hit auto-compact (more on that in Part 3) well before you're done.
At Opus 4.6 cache-read rates ($0.50/MTok), 150K tokens of file contents cost $0.075 to reprocess on every single turn. Twenty more turns of actual work after the survey? That's $1.50 of pure overhead — just from carrying the survey results.
The fix is subagents — separate Claude instances you spawn to handle a focused task. Not because they're faster (sometimes they are, sometimes they aren't), but because they're a context firewall.
When you spawn a subagent with a focused brief — "read all module files and tell me which ones implement hook_form_alter" (a Drupal extension point — the specifics don't matter; what matters is that it meant reading 200 files to find 15 matches) — it gets its own conversation context. It reads 200 files in its window. It synthesizes. It returns a 2,000-token summary to the parent session.
Those files never touch the parent's context. The parent gets the answer without carrying 150K tokens for the rest of the session. Per-turn cost stays flat.
The key insight: subagents don't shrink total spend. They move where the spend lands. The subagent still pays to read 200 files. But it pays once, in an isolated context that gets thrown away when it's done. The parent — the session that's going to run for another 30 turns of implementation — stays lean.
Side Two: Results That Come Back
I internalized that lesson. I delegate aggressively now — research agents, critic pipelines, parallel explorers. The multi-agent pattern is genuinely powerful. And then I noticed a new cost.
When a subagent finishes, its result lands in the parent session. That result — the summary, the findings, the review — sits in your context for the rest of the session. It gets reprocessed on every subsequent turn. The subagent's own context is disposable. Its result is permanent.
I was optimizing the wrong side. Keeping files out of my session (good) while letting agent results pile up in it (bad). The delegation habit that fixed one cost was creating another — one that scales with how enthusiastically you delegate.
Three patterns, all common in my workflow:
The research swarm. I launch 3–5 agents in parallel when I need to understand a problem from multiple angles. One explores the codebase, one reads documentation, one checks test coverage, one reviews git history. Each returns 3–8K tokens of findings. Combined: 15–40K tokens now sitting in the parent. I read them once to synthesize. They ride along on every turn for the rest of the session.
The critic pipeline. My standard review workflow: proposal-critic evaluates the plan, then react-critic, a11y-critic, and drupal-critic review in parallel. Each returns 4–8K tokens of structured feedback. By the time I'm ready to act, the parent is carrying 20–30K tokens of review output. That output was useful for one decision. It's dead weight for the next forty turns of implementation.
The background agent that finished. I use run_in_background (a flag that tells Claude Code to spawn an agent without blocking your session) a lot — it's the right pattern for long-running research while I keep working. But when the agent finishes, its result arrives in an already-heavy context. One more 5K-token result adding to a stack that's already expensive to carry.
The Numbers: Invoice vs. Tax
The delegation tax has two components. Most people only see the first.
The invoice is what the subagent spent: its own input and output tokens. A typical research agent processes ~30K input tokens and generates ~5K output. At Opus 4.6 rates ($5/MTok input, $25/MTok output), that's about $0.28 per agent. Five agents: ~$1.40.
The tax is what the parent pays to carry the results. Five agents returning 25K tokens combined, reprocessed over 20 subsequent turns at cache-read rates ($0.50/MTok): ~$0.25. Mix in cold-cache turns from Part 1 — where the rate jumps to $5/MTok — and that $0.25 becomes $2.50. And that's one delegation round. My sessions regularly have 2–3.
The Cache Gotcha
There's a catch that connects both sides back to Part 1. When you spawn agents that take more than five minutes, the parent's cache expires while it waits. Your next turn is a full re-cache — the 12.5× penalty.
The worst pattern: spawning three agents, tabbing away for ten minutes, coming back to a cold parent with 100K+ tokens of context. You pay the idle tax on top of whatever the agents cost.
The Helpers: Make Both Sides Visible
Two hooks from claude-cost-helpers cover both sides. subagent-isolation warns when file reads cross the threshold where delegation would be cheaper. delegation-cost tracks agent-result weight landing in the parent — per-result, cumulatively, and whether the parent's cache went cold while agents ran. A companion hook nudges when agent prompts lack output constraints. All warn but never block. Install details in the helper guide.
What Changes
The file-read warning was what finally made me separate exploration and implementation into different sessions. I run planners and research agents first, then start fresh for implementation with just the summary. That separation happened naturally once the hook made the cost of mixing them visible.
The per-result warning on the delegation side was the eye-opener. The first time the hook told me an agent had dropped 12K tokens into my context, I realized I'd never once thought about what agents were returning — only about what I was sending them. Now when the hook flags a big result, I add output constraints to the next agent prompt: "report in under 200 words" or "return only file paths and line numbers."
The cumulative warning triggers session splits. When the hook says I'm past 50K tokens of accumulated agent results, I save and start fresh. The synthesis is in my head or in the handoff note. The raw results are dead weight.
File-based handoff — having agents write findings to a file instead of returning them inline — is a pattern the hook keeps making a case for every time I see a 10K result land. Fifty tokens for a file path versus five thousand for the findings. The habit hasn't caught up to the hook's advice yet.
Across a team
If you're managing this across a team, two questions matter. Are people doing exploration in the parent session or in subagents? And when they delegate, are they constraining what comes back? A parent spending $0.30/turn on file-read overhead looks very different from one that delegated and spends $0.01/turn. Sessions with high tax-to-invoice ratios — agents that cost $0.28 to run but return 15K tokens — create more drag in the parent than agents that cost twice as much to run but return 2K tokens. Most cost dashboards show you the invoice. The tax is the part you usually have to go looking for.
The Rule
If you use it: Delegate exploration — constrain what comes back. The hooks show you both sides: when files are bloating your session, and when agent results are piling up. Keep the parent lean on both ends.
If you manage it: Check whether your team defaults to exploration-in-parent or delegation. The cost difference is 75× in context weight for the same research.
If you pay for it: The invoice (what agents cost to run) is visible in any dashboard. The tax (what their results cost the parent to carry) usually isn't. Ask about the tax.
Coming Up
- Part 3: What You Carry — tool output that lives forever, and the compaction gamble that tries to fix it
Visuals: parent vs subagent at econ-mini-03-parent-vs-subagent.html, agent results stacking at econ-mini-08-delegation-tax.html, invoice-vs-tax at econ-v6-delegation-carrying.html, agent spawning timeline at econ-v3c-spawn.html — embeddable via iframe, dark-mode aware, color-blind safe.
Code: zivtech/claude-cost-helpers on GitHub. GPL-3.0-or-later licensed.