The Economics of Claude Code: Where the Money Actually Goes

By: Alex Urevick-Ackelsberg, Zivtech

Co-authored with Claude. Final editorial judgment and accountability are Alex's.

Pricing as of April 2026. All figures use Opus 4.6 API rates, warm cache unless noted. Subscription plans (Pro, Max) package the same token spend into usage allocations — the underlying mechanics still apply.

Claude Code isn't free, and the costs aren't where most people think they are. Whether you're the one typing the prompts, managing a team that does, or signing off on the bill — the economics below apply to you.

The short version: Claude Code is a cab with the meter running. The meter runs whether the cab is moving or parked. The rate changes depending on which driver you called and how much luggage you loaded. If you step out for more than five minutes, getting back in costs 12.5× more than if you'd stayed in the car. Six patterns drive most of the avoidable spend, covered across three posts:

  1. Session Timing — Gaps kill the cache. Sessions that go too long get expensive. The hooks warn you when you're cold and when you're carrying too much.
  2. The Agent Tradeoff — Delegation keeps files out of your context, but agent results flow back in. The hooks show you both sides.
  3. What You Carry — Tool output is permanent. Compaction is lossy. The hooks track what each dump costs and give you a safety net when the system summarizes.

Each post walks through one habit in detail — the story, the math, the fix, and a free open-source helper that makes the cost visible. This page is the primer: how pricing works and what a healthy Claude rhythm looks like. For the full list of cost drivers and benchmarks, see the reference page.


I've spent a lot of time looking at usage patterns, digging into Anthropic's pricing, and figuring out where spend is efficient and where it's being lit on fire. Claude (my co-author, and yes, the irony is noted) dug into the cache mechanics. This post is about the economics of the tool, not our workflow at Zivtech or anyone's in particular. Plenty of people use Claude Code in ways that look nothing like ours. The underlying economics apply to all of them.

Laura Bratton's piece in The Information makes the team-scale version painfully concrete: Uber's heavy adoption of Claude Code and Cursor reportedly maxed out its full-year AI budget just a few months into 2026. "I'm back to the drawing board because the budget I thought I would need is blown away already," CTO Praveen Neppalli Naga said. That's the problem this series is about: the tools can be genuinely useful and still expensive enough to force a rethink.

I'll be honest: the past month has been genuinely painful. The wild swings in how I'm able to work efficiently (pricing changes, model updates, shifting cache behavior) have forced me to rethink habits I'd barely finished forming. Anthropic's April 23 postmortem explains some of that pain as product regressions, not just pricing or cache policy. That distinction matters, but it doesn't change the larger point: whatever exposed it, the meter was always there.

I feel a strong obligation to get this right, on two fronts. The first is moral: these models consume real electricity, and wasting tokens isn't just wasting money. It's wasting energy. The second is practical: today's token prices are heavily subsidized. They will not stay this cheap. The habits you build now at subsidized rates are the habits you'll be stuck with when the real prices arrive. I'd rather be forced to develop good habits now than be caught flat-footed later.

The goal is to give you a mental model for why some activities cost what they cost, a list of the cost drivers with guidance on how to manage and monitor each one, a clear picture of the rhythm of Claude work (which is where most of the hidden spend lives), and some rough benchmarks you can use to check whether something is off in how you're working.

How Claude Code Pricing Actually Works

Quick primer. Claude Code charges per token, the chunks of text that go in and out of the model. If you're on the API, you pay these rates directly. If you're on a Pro or Max subscription, the same token mechanics determine how fast you burn through your usage allocation — you're spending tokens either way, just packaged differently. You pay for input tokens (everything Claude reads) and output tokens (everything Claude writes back).

But there's a critical detail that changes the math: prompt caching. Claude caches the beginning of your conversation so it doesn't have to reprocess it every time you send a message. Reading from the cache costs 90% less than processing those same tokens fresh. Writing to the cache costs 25% more than a normal read, but it pays for itself on the second message.

That cache has a five-minute time-to-live (TTL). If no message is sent for five minutes, the cache expires and Claude reprocesses your entire conversation context from scratch on the next turn. The next message after a cache miss costs 12.5x more than the same message with a warm cache. Same tokens, same conversation, same work — just a different price because the gap between messages was too long.

There's no native warning. The cache just quietly dies.

Here's the reference table most people don't have in their heads:

Operation Sonnet 4.6 Opus 4.6 Haiku 4.5
Base input (no cache) $3.00/MTok $5.00/MTok $1.00/MTok
5-minute cache write $3.75/MTok $6.25/MTok $1.25/MTok
Cache read (hit) $0.30/MTok $0.50/MTok $0.10/MTok
Output $15.00/MTok $25.00/MTok $5.00/MTok

A 100K-token Opus session costs about $0.05 per turn on a cache hit, about $0.63 per turn on a cache miss. Over a long day, that gap compounds into real money.

The Rhythm of the Work

Before the cost drivers, it helps to name the rhythm of a Claude Code session, because most of the expensive surprises come from drifting into the wrong rhythm without noticing.

A session has a natural tempo. There are three recognizable modes.

Fast execution. Claude reads files, writes code, runs commands, iterates on errors. Turns are five to 30 seconds apart. The cache stays hot. Each message is small and focused because Claude is doing concrete work, not deliberating. This is the cheapest mode and the one where the token-per-dollar ratio of useful output is highest.

Deliberation. Claude stops to ask a question. It presents options. It asks "should I proceed?" You read, think, and answer. Claude reprocesses the full conversation to act on your response. Two or three round trips per real decision. Still cheap per turn if the gaps are short, but this is the mode that most easily drifts into the expensive one.

Wait-and-see. Claude kicks off a long-running task (a build, a test suite, a deep analysis) and waits. You wait with it, or you walk away. Minutes pass. The cache dies. When the result finally arrives, that single message may be the most expensive one in the session because it has to rehydrate the whole conversation.

Most real work mixes all three. The economic question is whether you notice which mode you're in and nudge toward fast execution whenever you can.

The worst shape for a session is repeated wait-and-see cycles with long gaps, because each wake-up is a full re-cache. The best shape is dense fast-execution runs with short deliberation checkpoints and long-running work offloaded to infrastructure that doesn't make Claude wait.

Rhythm is invisible to any billing report. You have to feel it. If you're finding yourself going long stretches between messages, your meter is probably running hotter than you think.

Going Deeper

Ten specific cost drivers, each with management and monitoring guidance, plus benchmarks for checking whether your spend looks healthy: Cost Driver Reference.

All six helpers with install instructions, ecosystem tools, and configuration details: Helper Install Guide.

The Companion Helpers

I built a set of local helpers — hooks and slash commands that run in your ~/.claude/ config — to make the cost mechanics described in this series visible at the moment they happen. No platform dependency. Just bash and python3.

They cover the failure modes I kept hitting myself: cache expiry sneaking up on you, sessions that should have been split ten turns ago, subagent context bloat, lossy compaction with no safety net, tool output silently inflating every future turn, and delegation results piling up in the parent session. Each helper warns you when the cost is about to spike and gives you a one-command way to avoid it.

I built them for myself first, then for my team at Zivtech, then for the clients and organizations I work with (mostly through Joyus AI), and then for the broader open-source communities I care about. They're free, GPL-3.0-or-later licensed, and on GitHub. Pull requests are welcome. Full install details are in the helper guide.

The Point

That's the whole point of the cab metaphor: the meter is always running, but it doesn't always run at the same speed. The rate changes depending on which driver you called, how much luggage you loaded, and whether you let the cache go cold.

The ten cost drivers are where the money moves. Most of them are manageable once you can see them. A few are workflow problems more than settings problems. One or two aren't really controllable at all. They're just the price of doing business.

The rhythm piece matters more than any single driver. Sessions that stay in the fast-execution mode are cheap almost regardless of length. Sessions that drift into wait-and-see cost more than most people expect, even short ones. The biggest single improvement most teams can make isn't tuning a config — it's noticing which rhythm they're in and nudging toward the cheap one.

If you're finding expensive mistakes in your own usage, we'd like to hear about them.