The Economics of Claude Code — Part 1: Session Timing

By: Alex Urevick-Ackelsberg, Zivtech

Co-authored with Claude. Final editorial judgment and accountability are Alex's.

Pricing as of April 2026. All figures use Opus 4.6 API rates, warm cache unless noted. Subscription plans (Pro, Max) package this spend into usage allocations — the mechanics still apply.

Part 1 of 3. This series is for anyone who uses Claude Code, manages a team that does, or signs off on the bill. The thread running through all of it: most of what makes Claude Code expensive isn't the prompts you write — it's the habits you don't think about.


I've been using Claude Code daily for about a year. It's the most productive tool I've ever picked up. It also has costs that are almost entirely invisible unless you go looking for them — and they're not where most people think they are.

This series is about that gap: between what Claude Code costs and what you think it costs. Not the pricing page (you can read that yourself) but the habits, the rhythms, the "I didn't realize that was expensive" moments that add up to real money over a normal workweek.

The One Thing You Need to Know About Pricing

Quick primer, then we move on. Claude Code charges per token — chunks of text going in and out. If you're on the API, you pay these rates directly. If you're on a Pro or Max subscription, the same token mechanics determine how fast you burn through your usage allocation — you're spending tokens either way, just packaged differently. But there's a critical detail: prompt caching. Claude caches the beginning of your conversation so it doesn't reprocess it every turn. Cache reads cost 90% less than processing those same tokens fresh.

That cache has a 5-minute TTL (time-to-live). No activity for 5 minutes and it evaporates. The next message reprocesses everything from scratch — roughly 12.5× more expensive than the same message with a warm cache. Same tokens, same conversation, same work. Just a different price because the gap was too long.

There's no warning. The cache quietly dies.

Six Patterns, Three Posts

Over months of running Claude Code across two businesses (Zivtech, a dev agency, and Milk Jawn, an ice cream manufacturer), I've found six patterns that drive most of the avoidable spend. None of them are about writing better prompts. They fall into three categories:

Session timing — Gaps that kill the cache, and sessions that go too long. Two failure modes of the same session lifecycle. (This post.)

The agent tradeoff — Delegation keeps files out of your context, but agent results flow back in. Two sides of the same coin. (Part 2.)

What you carry — Tool output that lives forever in context, and the compaction gamble that tries to fix it. (Part 3.)

The common thread: every one of these is invisible while it's happening. You only see it in the bill — and even then only if you know what to look for.

Each post comes with a pair of helpers from claude-cost-helpers — open-source hooks and slash commands that make these cost mechanics visible at the moment they happen.


The Idle Tax

Here's the tension nobody talks about: some tasks in Claude take real time. Reading files, running tests, working through a complex implementation — you can't just sit and watch. If I did, I'd be slower than I was before AI. The whole point is that Claude works while I do other things.

So I multitask. And that's where the money goes.

What I Was Doing

Two flavors, and I do both.

Flavor one: Claude is working and I get pulled into something else. Mid-session on a Drupal client thing. Claude is reading files and working through an approach. I'm not going to sit and watch — so I check Slack, answer an email, take a call from our production manager at Milk Jawn. Fifteen minutes gone. Come back, type the next prompt. Cold cache. About 12×.

Flavor two — the one that actually surprised me: I'm running multiple sessions because that's the only way to stay productive. I usually have multiple Claude sessions going across two laptops — different projects, different contexts. I rotate through them every couple of minutes, which keeps them all warm. Then one session hits a moment that demands actual thinking: a careful prompt, files I need to read myself, a problem I have to sit with for fifteen minutes. While I'm sunk into that session, the others are aging out — WARM, then COOLING, then COLD. The next message I send to any of them pays the penalty.

Six or seven of those a day, multiplied across the week, and the math gets ugly fast. The worst part: the work in any one session was efficient. I wasn't using Claude badly. I was doing exactly what you're supposed to do — use the tool and keep working while it works.

Drag the idle slider past 5 minutes. The session being open isn't the same as the session being warm — cache doesn't care about the window.

What Changed

My workflow wasn't always this expensive. Before March 2026, the prompt cache TTL was one hour, not five minutes. I could walk away for 20 minutes, take a call, rotate through four sessions at a leisurely pace, and the cache was still warm when I came back. I built my entire workflow around that.

Then three things changed in the same month:

  1. Around March 6, Anthropic dropped the cache TTL from 1 hour to 5 minutes. This was intentional and server-side. A detailed analysis on GitHub tracked it day by day across months of session data — 1-hour TTL through February, 5-minute TTL from March 8 onward. One analysis found a 17–26% cost increase and subscription users started hitting their 5-hour quota limits for the first time.

  2. Around March 15, a client-side cache bug (GitHub #34629) broke caching in resumed sessions — only the system prompt was cached, while the full conversation was re-uploaded from scratch on every message. This was fixed in v2.1.97.

  3. On March 27, the off-peak 2× usage promotion expired and Anthropic added peak-hour throttling the same week — quota drains faster during business hours now.

Then a month later, a fourth change:

  1. On April 16, Opus 4.7 shipped with xhigh as its default effort level — a level that didn't exist on 4.6. Higher effort means more thinking tokens per turn, on every turn. From the model config docs: "When you first run Opus 4.7, Claude Code applies xhigh even if you previously set a different effort level." The override is silent. The statusline quietly reads "with xhigh effort" and you keep working. The only mechanism that survives: the CLAUDE_CODE_EFFORT_LEVEL environment variable, set in ~/.claude/settings.json's env block.

Anthropic later published an April 23 postmortem explaining that some of the March and April weirdness came from product regressions, including a stale-session bug that caused extra cache misses and faster usage drain after long idle periods. That's important context. Some of what people felt was a bug, not the baseline economics.

But the lesson survives the distinction. I didn't suddenly start working wrong. The economics became visible before my habits caught up. Whether the trigger is a product regression, a limit change, a model update, a cache policy change, or plain old pricing, the same rule applies: don't spend money, energy, or model compute when you don't need to.


Context Rot

The idle tax is about gaps. This problem is the opposite: sessions that never stop.

You're in a flow. Claude is writing good code, tests are passing, you're knocking things off the list. Every turn feels productive, so you keep going. Turn 20. Turn 40. Turn 60. Why would you stop? The work is working.

Because every turn is getting more expensive than the last, and you can't feel it happening.

The Mechanic

Every file Claude reads, every tool result, every code block it writes back — all of it becomes permanent context. Claude rereads the entire conversation on every single turn. Turn 5 rereads five turns of history. Turn 50 rereads fifty.

The cost per turn isn't fixed. It's a function of everything that's come before.

What I Was Doing

The honest marathon. A big refactor across six files. Claude reads the files, plans the approach, starts implementing. I'm giving feedback, it's iterating, we're making progress. By turn 50, we've touched all six files, run the tests twice, read a config, read a migration. Every one of those artifacts is still sitting in context. The session feels the same as turn 10. But the meter is three or four times what it was at the start.

The session that should have been two. I finish the refactor at turn 35 and think "while I'm here, let me also fix that bug in the auth module." Different task, different files, different mental model. But starting a new session feels like overhead — I'd have to explain the project again, re-read the CLAUDE.md (the project-level instructions file Claude Code loads on startup), lose the momentum. So I keep going. Now I'm carrying 35 turns of refactor context that's completely irrelevant to the auth fix, paying to reprocess all of it on every turn.

This is the trap. The marginal cost of "just one more turn" is invisible. It rises like a slow tide.

The knee lands around turn 50. Past that, you're paying principal-engineer rates for junior-engineer context management.

The Numbers

At Opus 4.6 ($5/MTok input, $0.50/MTok cache read):

  • Turn 20 (~60K context): about $0.80 cumulative. Each turn costs roughly $0.03. Fine.
  • Turn 50 (~150K context): about $1.90 cumulative. Each turn costs roughly $0.15. Getting expensive — approaching the auto-compact threshold.
  • Turn 80 (if you dodge compact): about $9.30 cumulative. Each turn costs over $0.33. You're paying more per turn than the entire first 20 turns combined.

Roll that up: one session per workday, 250 days a year. A developer who routinely rides past the knee spends ~$2,300/year. One who splits at the knee spends ~$420/year. The difference is nearly $1,900 per developer, per year — on context accumulation alone.

Two failure modes look completely different on the meter. The bloated session creeps. The cache-miss session spikes. Knowing which one you're in tells you what to fix.

The Helpers: Make the Timing Visible

Two hooks from claude-cost-helpers cover both sides. idle-tax warns when you return to a cold cache — a real choice between continuing (sometimes right) and starting fresh (often cheaper). just-one-more-turn tracks turn count and context size, warning when the session is getting heavy. Both warn but never block. Install details in the helper guide.

What Changes

The idle-tax hook turned something invisible into something I could react to.

I come back to a session, start typing, and the hook tells me COLD. So instead of continuing and paying the 12.5× penalty on every remaining message, I /save-session (a slash command that captures the current state into a handoff note), close the window, and open fresh. The first message in the new session pays the same setup cost I would've paid anyway. Every message after that rides a warm cache.

The handoff cycle — save, close, reopen with notes — is the one habit from all this that actually stuck. It doesn't always capture everything perfectly, but it's been consistently useful. And it only happened because the hook made the cost visible at the moment I could do something about it.

I also try to keep my active session count down. I used to run four sessions across two laptops without thinking about it. Now when I switch back and find them all COLD after a focused stretch, I save and close instead of continuing. I don't always catch it in time — but the hook catches what I miss.

The turn counter changed how I think about sessions too. Before the hook, turn 20 and turn 60 felt the same — Claude was still responsive, still doing good work. Now I see the number climbing and the cost estimate next to it. When the hook shows me I'm at turn 35 with 180K context and I'm about to pivot to a different problem, the economics are right there. A 5,000-token handoff in a new session versus 180,000 tokens of irrelevant history riding along on every remaining turn. Once that tradeoff is visible, the decision is obvious.

I still run long sessions when the task demands it. The difference is that now I know what it costs.

Across a team

If you're managing ten developers, each juggling multiple sessions, you need the same signal everywhere. The question isn't "is this session too long?" — it's "which developers are routinely riding past the knee, and what's it costing?" Sessions with high input-to-output ratios and frequent cold-cache restarts are easy to spot in any cost dashboard. Making it legible is often enough to change behavior.

The Rule

If you use it: Install both hooks. When one says COLD, save and start fresh. When the other shows you the cost is climbing, stop carrying dead context into the next task.

If you manage it: Watch for developers routinely riding past turn 50 or returning to cold sessions. Those are the two highest-leverage habits to change.

If you pay for it: Session discipline is the single biggest cost lever. A team that splits at the knee and avoids cold starts spends 4–5× less on input tokens than one that doesn't.

Coming Up


Visuals: interactive cost calculator at econ-cost-calculator.html, cumulative cost curve at econ-mini-02-turns-vs-cost.html, three trajectories at econ-d2-cost-trajectory.html — embeddable via iframe, dark-mode aware, color-blind safe.

Code: zivtech/claude-cost-helpers on GitHub. GPL-3.0-or-later licensed.