![]()
Two developers use Claude Code on the same project. One gets clean, working code on the first try. The other fights through hallucinations, wrong file edits, and off-target suggestions. The difference isn't their prompts — it's their context.
Context engineering has replaced prompt engineering as the highest-leverage skill for working with AI coding tools. While prompt engineering focuses on how you phrase a request, context engineering focuses on what information the model has access to when it generates a response. In Claude Code, this means deliberately curating your CLAUDE.md files, managing session length, delegating to subagents, and structuring your requests so Claude sees exactly what it needs — nothing more, nothing less.
This guide covers every context engineering technique available in Claude Code today, with concrete examples you can apply immediately.
Claude Code's context window is currently 1 million tokens on Opus 4.6 and Sonnet 4.6. That sounds enormous, but it fills up faster than you'd expect. Every file Claude reads, every command it runs, every tool definition from MCP servers — all of it consumes tokens from that window. And as the window fills, performance degrades.
Here's what actually happens inside a Claude Code session:
| Context Source | Token Cost | Impact |
|---|---|---|
| System prompt + tool definitions | ~15,000-30,000 | Fixed overhead every session |
| CLAUDE.md files (all levels) | ~500-4,000 | Loaded at session start |
| Each file read | ~200-10,000+ | Depends on file size |
| Each command output | ~100-5,000+ | Depends on verbosity |
| MCP server tool definitions | ~1,000-5,000 per server | Adds up with multiple servers |
| Conversation history | Grows continuously | The biggest consumer over time |
The problem isn't running out of tokens — Claude Code auto-compacts at roughly 83.5% capacity. The problem is that a bloated context makes Claude less accurate. Important details get diluted by irrelevant information. The model struggles to find the signal in the noise.
Context engineering is the practice of maximizing signal-to-noise ratio in that window.
CLAUDE.md files are the most cost-effective context you can provide. They load automatically at session start, persist across restarts, and cost zero conversation tokens because they're part of the initial context. But they need to be lean.
Claude Code reads CLAUDE.md files from three locations, each serving a different purpose:
~/.claude/CLAUDE.md # Global — applies to every project
./CLAUDE.md # Project root — applies to this repo
./src/components/CLAUDE.md # Directory-level — applies when working here
Global (~/.claude/CLAUDE.md): Your personal preferences that apply everywhere. Coding style, preferred testing frameworks, communication preferences. Keep this under 100 tokens.
Project root (./CLAUDE.md): The most important one. Project stack, build commands, testing commands, architecture decisions, key conventions. This is where most of your context budget should go.
Directory-level: Module-specific rules. If your src/api/ directory has conventions that differ from the rest of the project, put a CLAUDE.md there instead of cluttering the root file.
A focused 400-token CLAUDE.md outperforms a sprawling 4,000-token one. Here's the framework:
Always include:
Never include:
Here's a CLAUDE.md that hits the sweet spot:
# Project: Acme Dashboard
Next.js 15 app with App Router, Supabase (auth + DB), Tailwind CSS, shadcn/ui.
## Commands
- `npm run dev` — start dev server (port 3000)
- `npm run test` — run vitest
- `npm run test:e2e` — run playwright (requires dev server running)
- `npm run lint` — eslint + prettier check
- `npm run typecheck` — tsc --noEmit
## Architecture
- Server components by default. Client components only when interactivity is required.
- All DB access through Supabase client in `src/lib/supabase/`.
- Auth uses Supabase SSR helpers — never use `createClient()` directly.
- API routes in `src/app/api/` return NextResponse, not plain Response.
## Gotchas
- Supabase RLS policies are on — test queries must use service role key.
- `next/headers` and `cookies()` can only be called in server components/actions.
That's roughly 200 tokens. Claude now knows the stack, can run your test suite, and won't make the three most common mistakes new developers make on this codebase.
Run /context in Claude Code to see exactly how many tokens your CLAUDE.md files consume. If they're over 1,000 tokens combined, audit ruthlessly:
Every message in your conversation adds to the context window. After 30-40 exchanges, you're carrying a lot of history that's no longer relevant — early exploration, dead-end approaches, superseded decisions.
Claude Code handles this automatically through compaction: when you approach ~83.5% of the context window, it summarizes the conversation history and replaces it with that summary. This frees up space but it's lossy — details get dropped.
Don't wait for auto-compaction. When you notice the token usage bar climbing past 50%, you have three options:
Option 1: Manual compaction. Run /compact to trigger a summary on your terms. You can even add a focus hint:
/compact focus on the database migration work and the auth bug fix
This tells Claude what to prioritize when summarizing, so the details that matter survive the compression.
Option 2: Start a new session. For genuinely unrelated work, exit and restart. Your CLAUDE.md context loads fresh, and you get a full clean window. Session memory and auto memory persist across sessions, so Claude still remembers important decisions.
Option 3: Delegate to a subagent. If you're about to do exploratory work that generates lots of intermediate output, spawn it in a subagent so the exploration doesn't pollute your main context.
| Scenario | Best Approach |
|---|---|
| Switching from feature work to bug fix | New session |
| Continuing implementation from yesterday | --continue or --resume |
| About to read 20 files to understand a module | Subagent |
| Hit 80% token usage, still on same task | /compact with focus hint |
| Hit 80% token usage, task is almost done | Push through, compaction is fine |
Subagents are the most powerful context engineering tool in Claude Code. Each subagent runs in its own context window with its own system prompt and tool access. It does its work, synthesizes the results, and sends back only the final answer — keeping all the intermediate noise out of your main conversation.
The rule of thumb: if a task will generate a lot of intermediate output you won't need again, it belongs in a subagent.
Great subagent tasks:
src/api/ and summarize the endpoint structure"fetchUser function"auth.test.ts is failing"Bad subagent tasks:
You can tell Claude to use subagents directly in your prompt:
Use a subagent to read all files in src/services/ and create a dependency
map showing which services call which other services. Report back just
the map, not the file contents.
The key phrase is "report back just the map." This tells Claude what to extract from the subagent's work, keeping your main context clean.
If you keep delegating the same type of work, create a custom agent in .claude/agents/:
# .claude/agents/code-reviewer.md
You are a code review agent. When given a file or diff:
1. Check for security issues (injection, XSS, auth bypasses)
2. Check for performance problems (N+1 queries, missing indexes, memory leaks)
3. Check for correctness (edge cases, error handling, race conditions)
Report ONLY issues found. If the code is clean, say "No issues found."
Do not suggest style changes or refactors.
Now you can invoke it from any session:
Run the code-reviewer agent on the changes in src/api/payments.ts
The agent runs in isolation, reads the files it needs, and returns only its findings.
One of the biggest context engineering mistakes is front-loading too much information. Instead of pasting entire files into your prompt, use references and let Claude load what it needs.
Instead of:
Here's the content of auth.ts: [500 lines of code]
Here's the content of middleware.ts: [300 lines of code]
Fix the authentication bug where tokens expire prematurely.
Do this:
There's a bug in src/auth/auth.ts where JWT tokens expire earlier than
the configured TTL. The token creation is around line 45 and the
validation is in src/middleware/auth-middleware.ts around line 20.
Fix the expiration logic.
Claude reads exactly what it needs, right when it needs it. The first approach burns 800 lines of context immediately. The second approach burns only the relevant sections when Claude opens those files.
Vague requests like "look at the codebase and fix the auth bug" force Claude to explore broadly, consuming tokens on irrelevant files. Specific requests like "the auth bug is in src/auth/session.ts around line 120 where the session TTL calculation uses milliseconds instead of seconds" let Claude go straight to the problem.
The more precisely you point Claude at the right files and functions, the less context gets wasted on exploration.
The /context command shows you exactly where your tokens are going:
System prompt: 12,847 tokens (1.3%)
Tools: 18,234 tokens (1.8%)
CLAUDE.md files: 847 tokens (0.1%)
Conversation: 284,923 tokens (28.5%)
Available: 683,149 tokens (68.3%)
If "Tools" is eating 50,000+ tokens, you probably have too many MCP servers enabled. Disable the ones you're not using for this task.
If "Conversation" is above 400,000, it's time to compact or start fresh.
How you structure a request directly affects how much context Claude wastes figuring out what you want.
The most effective prompt structure for Claude Code follows this pattern:
[TASK] What you want done
[CONTEXT] What Claude needs to know that it can't infer
[CONSTRAINTS] Boundaries and requirements
Example:
Add rate limiting to the POST /api/comments endpoint.
Context: We use Express with Redis already configured in src/lib/redis.ts.
The rate limiter should use a sliding window algorithm.
Constraints:
- 10 requests per minute per authenticated user
- 3 requests per minute for unauthenticated users
- Return 429 with a Retry-After header
- Add tests to src/api/__tests__/comments.test.ts
This gives Claude everything it needs to implement correctly on the first try, without reading extra files to figure out the stack or guessing at requirements.
Including the expected outcome is the single highest-leverage thing you can do. Tests serve as both specification and self-check:
Refactor the date formatting utility in src/utils/dates.ts.
The existing tests in src/utils/__tests__/dates.test.ts should all
still pass. Add these additional test cases:
- formatRelative(new Date('2026-01-15'), new Date('2026-01-16')) → "yesterday"
- formatRelative(new Date('2026-01-10'), new Date('2026-01-16')) → "6 days ago"
- formatRelative(new Date('2025-06-15'), new Date('2026-01-16')) → "Jul 2025"
Now Claude can implement the refactor AND verify it works by running the tests — all without you checking intermediate output.
For complex tasks where you're not sure what context Claude needs, flip the dynamic:
I want to add real-time notifications to the dashboard. Before
implementing anything, interview me about the requirements. Ask about
technical constraints, UX expectations, and edge cases I might not
have considered.
Claude will ask targeted questions using the AskUserQuestion tool, surfacing requirements you didn't think to specify. This is more efficient than writing a massive prompt upfront, because Claude asks only about what's ambiguous.
Claude Code respects .claudeignore files with the same syntax as .gitignore. This prevents Claude from reading files that would only waste context:
# .claudeignore
# Build artifacts
dist/
build/
.next/
# Large data files
*.csv
*.sql.gz
fixtures/large-dataset.json
# Generated code
src/generated/
*.pb.ts
# Dependencies
node_modules/
This is especially important for repositories with large generated files, fixture data, or vendored dependencies. Every file Claude doesn't read is context saved for actual work.
Each MCP server you enable adds its tool definitions to the system prompt. A server with 10 tools might consume 3,000-5,000 tokens of your context window — before you've even started working.
Run /context to see the tool token overhead. If you're not using a particular MCP server for your current task, disable it:
claude config set mcpServers.slack.disabled true
Re-enable it when you need it:
claude config set mcpServers.slack.disabled false
For projects that use multiple MCP servers for different workflows, consider creating separate configuration profiles or documenting which servers to enable for which tasks in your CLAUDE.md.
For large features that span multiple sessions, you need a strategy for maintaining context continuity without carrying stale history.
Before implementation, have Claude create a plan document:
Analyze the requirements for the new billing system and create a
plan in docs/billing-plan.md. Include:
- Architecture decisions
- File changes needed
- Implementation order
- Open questions
Don't implement anything yet.
In subsequent sessions, reference the plan:
Continue implementing the billing system from docs/billing-plan.md.
We completed steps 1-3 in the last session. Start with step 4:
the webhook handler.
The plan document acts as compressed context — it captures decisions and structure without carrying the full conversation history.
Claude Code's memory system provides automatic context continuity:
These systems work in the background, but you can amplify them by being explicit about important decisions:
Important: We decided to use event sourcing for the billing system
instead of direct DB writes. This is a hard requirement — don't
suggest alternatives.
Explicit statements like this are more likely to be captured by auto memory and survive compaction.
Here's a practical checklist to audit your context engineering:
Before starting a session:
.claudeignore excluding large/generated files?During a session:
/context?For complex tasks:
Between sessions:
Context engineering isn't theoretical. Developers who apply these techniques consistently report:
The compound effect is significant. A 10-minute investment in CLAUDE.md plus consistent session hygiene saves hours of debugging and re-prompting over the course of a project. The developers getting "10x results" from Claude Code aren't using magic prompts — they're engineering their context.
Production-ready CLAUDE.md templates, MCP server configs, custom hooks, and battle-tested workflows. Stop configuring, start building.