Usage limitsToken usage

Claude Code Hitting Usage Limits Too Fast? A 9-Point Checklist to Fix It

By Neo Zino - builder of ClockedCodeUpdated July 12, 20268 min read

Burning through your Claude Code usage limits way faster than expected? Work through this 9-point checklist to cut wasted tokens before you pay for a bigger plan.

On this page

If Claude Code keeps hitting its usage limit far sooner than you expect, the cause is almost always wasted tokens, not a plan that is too small. A handful of default habits quietly inflate every request: the wrong model, context you never cleared, a bloated config, and redundant tools. Fix those first and most people stop hitting limits without spending a cent more.

TL;DR: Before you upgrade your plan, check these nine things: your model choice, whether you clear context between tasks, how big your CLAUDE.md is, how many MCP tools you load, how you scope prompts, whether you re-read large files, your use of /compact, parallel-agent sprawl, and running multiple sessions against one shared limit.

Why am I hitting Claude Code limits so fast?

Claude Code usage is measured in tokens, and tokens are consumed by everything in the context window on every turn, not just your latest message. Each prompt re-sends the system instructions, your CLAUDE.md, the running conversation, every tool definition, and every file already read. A long session with a heavy config can spend several times more tokens per turn than a lean one doing the same work. So two people on the same plan can have wildly different limits in practice. The fix is to shrink what rides along on every turn.

The 9-point checklist

Work through these in order. The first four fix the biggest leaks.

You are on the wrong model for the task. Running the most powerful model for routine edits spends your budget fast. Use a lighter model for straightforward work and reserve the heavyweight model (or a plan-with-the-big-model, build-with-the-light-model split) for genuinely hard reasoning. This single change is usually the largest saving.
You never clear context between tasks. Every message in a long session is re-sent on the next turn. When you finish one task and start an unrelated one, run /clear to wipe the slate. Carrying a finished task's context into a new one is pure waste.
Your CLAUDE.md is too big. Your global and project CLAUDE.md files are re-read on every single turn, so a 600-line config taxes every request for the whole session. Keep it tight: rules you actually rely on, not an essay. If a section only matters occasionally, move it into a skill that loads on demand instead.
You load MCP tools you never use. Every connected MCP server injects its tool definitions into the context on every turn, whether you use it or not. Ten servers you "might need someday" cost tokens constantly. Curate down to the few you actually use, and let the rest load only when needed.
Your prompts are vague, so Claude over-explores. A vague request makes Claude read more files, try more approaches, and write more before it lands. State the goal, the constraints, and what to leave alone. Tight scope means fewer tokens to the same result.
You let Claude re-read large files repeatedly. Pointing Claude at huge files or whole directories pulls all of it into context. Reference the specific file and lines you mean, and it reads far less.
You wait until the context is full to /compact. Compacting at 95 percent full means you already paid for a giant context many times over. Compact earlier, around the point where the current task's essentials are settled, so later turns ride on a smaller summary.
You spin up parallel agents for small work. Subagents are powerful for big, independent, parallel tasks, but each one is a fresh context that re-derives everything. For a quick change you could do inline, the cold-start cost is not worth it. Delegate when the work is genuinely large and parallel, not by reflex.
You forget the limit is shared across sessions. Your usage limit is tied to your account, not a single window. Two terminals, a second project, and the desktop app all draw from the same bucket at once. If you are blocked sooner than expected, check whether another session is quietly burning the same allowance.

Which of these is costing me the most?

Use this to find your biggest leak fast.

Symptom	Most likely cause	First fix
Blocked after only a few tasks	Wrong model + no `/clear`	Switch to a lighter model; clear context between tasks
Every session feels expensive from the start	Bloated `CLAUDE.md` or too many MCP tools	Trim the config; disconnect unused servers
Long sessions die suddenly	Context filled before compacting	`/compact` earlier, or `/clear` and restart the task
Limit hit while you were barely working	A second session sharing the limit	Close the other terminal or session

Are Claude Code rate limits the same as usage limits?

People say "rate limits" for both, but they are two different systems, and knowing which one you are hitting changes the fix.

On a Pro or Max subscription, Claude Code has usage limits: a rolling 5-hour window that opens with your first message, plus weekly caps on top. Everything on your account draws from the same pool - Claude Code, claude.ai, the desktop app, the IDE extension. When you run out, you are blocked until the window resets, and the message tells you the reset time. (Anthropic doubled the 5-hour allowance in May 2026 and dropped the peak-hour throttling, so if you have not hit a session cap in a while, that is why.)

On an API key, you get actual rate limits: requests per minute and input/output tokens per minute, set by your usage tier. Exceed one and you get a 429 error that clears in seconds - the SDKs retry automatically. There is no weekly lockout; you just pay for every token.

The quick tell: a reset time hours away means the 5-hour window, days away means the weekly cap. A 429 that clears on retry means you are on API billing, and the fix is pacing, not waiting.

How do Claude Code weekly limits work?

Weekly limits sit on top of the 5-hour window and have existed since August 2025. Pro has one weekly cap covering all models. Max plans carry two: an overall cap plus a separate one for heavier model usage. Anthropic's own help pages currently disagree on exactly how that second cap is split by model, so trust the Usage screen over any blog post, including this one.

Two things people get wrong about them. First, the reset is a fixed weekly time assigned to your account, not seven days after you started - Settings > Usage on claude.ai shows your exact day and hour. Second, they are genuinely hard to hit: Anthropic said at launch that fewer than 5 percent of subscribers would ever touch them. If you are in that 5 percent every single week, either you run heavy parallel workloads for real, or something in the checklist above is bleeding tokens.

Also worth knowing: since June 2026, headless runs, the Agent SDK, and Claude Code GitHub Actions no longer draw from your subscription pool - they run on a separate monthly credit allotment. Your CI is not eating your week anymore.

How do I see my Claude Code token usage?

Claude Code ships with the tracking built in. Most people just never run the commands.

/usage is the main one. It shows the session's token counts, and on a subscription it adds your 5-hour and weekly bars with reset times, plus a breakdown attributing recent usage to skills, subagents, and individual MCP servers. That breakdown is checklist item 4 made visible: if a server you never use tops the list, you have found your leak.
/context shows what is riding along in the context window right now - your CLAUDE.md, tool definitions, files already read, conversation history, and how full the window is. It is the fastest way to see the "everything is re-sent every turn" problem with your own eyes.
/cost shows the session in dollars. Useful on an API key, where it maps to real billing; on Pro or Max the number is an estimate and your plan covers it anyway.

For history beyond one session, ccusage (run npx ccusage) reads Claude Code's local logs and gives daily, weekly, and per-session reports, including how each 5-hour window filled up. And if you want a verdict on the setup itself rather than the usage it produces, UsageCut is my free scanner that flags what in your config is silently burning tokens on every turn.

When should I actually upgrade my plan?

Upgrade only after you have done the checklist and still genuinely run out, on real work, with a lean setup. Most people who feel "rate-limited constantly" are paying for waste, not capacity. The honest order is: fix the leaks first, measure your usage for a week, and only then decide whether you need more headroom. Paying more to keep wasting tokens is the most expensive way to use Claude Code.

If hand-tuning all nine of these sounds like work, that is exactly what ClockedCode packages: a tuned global CLAUDE.md, a curated set of MCP tools instead of a pile of them, and the model and context discipline baked into a setup you paste in once. It is the checklist above, already done.

More guides

All guides

Token usageClaude Code Auto-Compact: What It Does and How to Control It Token usageClaude Code Context Window: How It Works and How to Not Waste It MCPHow to Make MCP Servers Not Take Up As Much Context in Claude Code

Grab the prompts

All prompts

Claude Code pluginCaveman Mode Claude Code workflowThe 50% Context Rule CLAUDE.md instructionParallel Agents