Claude Code Context Engineering: Get 10x Better Results

Two developers use Claude Code on the same project. One gets clean, working code on the first try. The other fights through hallucinations, wrong file edits, and off-target suggestions. The difference isn't their prompts — it's their context.

Context engineering has replaced prompt engineering as the highest-leverage skill for working with AI coding tools. While prompt engineering focuses on how you phrase a request, context engineering focuses on what information the model has access to when it generates a response. In Claude Code, this means deliberately curating your CLAUDE.md files, managing session length, delegating to subagents, and structuring your requests so Claude sees exactly what it needs — nothing more, nothing less.

This guide covers every context engineering technique available in Claude Code today, with concrete examples you can apply immediately.

Why Context Matters More Than Prompts

Claude Code's context window is currently 1 million tokens on Opus 4.6 and Sonnet 4.6. That sounds enormous, but it fills up faster than you'd expect. Every file Claude reads, every command it runs, every tool definition from MCP servers — all of it consumes tokens from that window. And as the window fills, performance degrades.

Here's what actually happens inside a Claude Code session:

Context Source	Token Cost	Impact
System prompt + tool definitions	~15,000-30,000	Fixed overhead every session
CLAUDE.md files (all levels)	~500-4,000	Loaded at session start
Each file read	~200-10,000+	Depends on file size
Each command output	~100-5,000+	Depends on verbosity
MCP server tool definitions	~1,000-5,000 per server	Adds up with multiple servers
Conversation history	Grows continuously	The biggest consumer over time

The problem isn't running out of tokens — Claude Code auto-compacts at roughly 83.5% capacity. The problem is that a bloated context makes Claude less accurate. Important details get diluted by irrelevant information. The model struggles to find the signal in the noise.

Context engineering is the practice of maximizing signal-to-noise ratio in that window.

The CLAUDE.md Hierarchy: Your Free Context

CLAUDE.md files are the most cost-effective context you can provide. They load automatically at session start, persist across restarts, and cost zero conversation tokens because they're part of the initial context. But they need to be lean.

The Three-Level Structure

Claude Code reads CLAUDE.md files from three locations, each serving a different purpose:

~/.claude/CLAUDE.md              # Global — applies to every project
./CLAUDE.md                       # Project root — applies to this repo
./src/components/CLAUDE.md        # Directory-level — applies when working here

Global (~/.claude/CLAUDE.md): Your personal preferences that apply everywhere. Coding style, preferred testing frameworks, communication preferences. Keep this under 100 tokens.

Project root (./CLAUDE.md): The most important one. Project stack, build commands, testing commands, architecture decisions, key conventions. This is where most of your context budget should go.

Directory-level: Module-specific rules. If your src/api/ directory has conventions that differ from the rest of the project, put a CLAUDE.md there instead of cluttering the root file.

What to Include (and What to Cut)

A focused 400-token CLAUDE.md outperforms a sprawling 4,000-token one. Here's the framework:

Always include:

One-liner project description ("Next.js 15 e-commerce app with Stripe and Supabase")
Build, test, lint, and deploy commands (Claude uses these verbatim)
Architecture gotchas that would trip up a new developer
Naming conventions only if they're non-obvious

Never include:

Code style rules that a linter already enforces
Generic best practices ("write clean code", "use meaningful variable names")
Step-by-step tutorials for common workflows
Information that only matters for specific tasks (put these in directory-level files instead)

A Real-World Example

Here's a CLAUDE.md that hits the sweet spot:

# Project: Acme Dashboard

Next.js 15 app with App Router, Supabase (auth + DB), Tailwind CSS, shadcn/ui.

## Commands
- `npm run dev` — start dev server (port 3000)
- `npm run test` — run vitest
- `npm run test:e2e` — run playwright (requires dev server running)
- `npm run lint` — eslint + prettier check
- `npm run typecheck` — tsc --noEmit

## Architecture
- Server components by default. Client components only when interactivity is required.
- All DB access through Supabase client in `src/lib/supabase/`.
- Auth uses Supabase SSR helpers — never use `createClient()` directly.
- API routes in `src/app/api/` return NextResponse, not plain Response.

## Gotchas
- Supabase RLS policies are on — test queries must use service role key.
- `next/headers` and `cookies()` can only be called in server components/actions.

That's roughly 200 tokens. Claude now knows the stack, can run your test suite, and won't make the three most common mistakes new developers make on this codebase.

Auditing Your CLAUDE.md

Run /context in Claude Code to see exactly how many tokens your CLAUDE.md files consume. If they're over 1,000 tokens combined, audit ruthlessly:

Delete anything Claude already does correctly without the instruction
Convert enforceable rules to hooks or linter rules
Move task-specific instructions to directory-level files
Replace paragraphs with bullet points

Session Management: The Compaction Problem

Every message in your conversation adds to the context window. After 30-40 exchanges, you're carrying a lot of history that's no longer relevant — early exploration, dead-end approaches, superseded decisions.

Claude Code handles this automatically through compaction: when you approach ~83.5% of the context window, it summarizes the conversation history and replaces it with that summary. This frees up space but it's lossy — details get dropped.

The 80% Rule

Don't wait for auto-compaction. When you notice the token usage bar climbing past 50%, you have three options:

Option 1: Manual compaction. Run /compact to trigger a summary on your terms. You can even add a focus hint:

/compact focus on the database migration work and the auth bug fix

This tells Claude what to prioritize when summarizing, so the details that matter survive the compression.

Option 2: Start a new session. For genuinely unrelated work, exit and restart. Your CLAUDE.md context loads fresh, and you get a full clean window. Session memory and auto memory persist across sessions, so Claude still remembers important decisions.

Option 3: Delegate to a subagent. If you're about to do exploratory work that generates lots of intermediate output, spawn it in a subagent so the exploration doesn't pollute your main context.

When to Start Fresh vs. Continue

Scenario	Best Approach
Switching from feature work to bug fix	New session
Continuing implementation from yesterday	`--continue` or `--resume`
About to read 20 files to understand a module	Subagent
Hit 80% token usage, still on same task	`/compact` with focus hint
Hit 80% token usage, task is almost done	Push through, compaction is fine

Subagents: Your Context Firewall

Subagents are the most powerful context engineering tool in Claude Code. Each subagent runs in its own context window with its own system prompt and tool access. It does its work, synthesizes the results, and sends back only the final answer — keeping all the intermediate noise out of your main conversation.

When to Use Subagents

The rule of thumb: if a task will generate a lot of intermediate output you won't need again, it belongs in a subagent.

Great subagent tasks:

"Read all files in src/api/ and summarize the endpoint structure"
"Search the codebase for all uses of the deprecated fetchUser function"
"Investigate why the test in auth.test.ts is failing"
"Review this PR diff and list potential issues"

Bad subagent tasks:

"Implement the new payment flow" (you need the output in your main context)
"Fix this one-line bug" (overhead isn't worth it)
Tasks where you'll need to iteratively discuss the results

Explicit Subagent Delegation

You can tell Claude to use subagents directly in your prompt:

Use a subagent to read all files in src/services/ and create a dependency
map showing which services call which other services. Report back just
the map, not the file contents.

The key phrase is "report back just the map." This tells Claude what to extract from the subagent's work, keeping your main context clean.

Custom Agents for Repeated Patterns

If you keep delegating the same type of work, create a custom agent in .claude/agents/:

# .claude/agents/code-reviewer.md

You are a code review agent. When given a file or diff:

1. Check for security issues (injection, XSS, auth bypasses)
2. Check for performance problems (N+1 queries, missing indexes, memory leaks)
3. Check for correctness (edge cases, error handling, race conditions)

Report ONLY issues found. If the code is clean, say "No issues found."
Do not suggest style changes or refactors.

Now you can invoke it from any session:

Run the code-reviewer agent on the changes in src/api/payments.ts

The agent runs in isolation, reads the files it needs, and returns only its findings.

Just-in-Time Context: Load What You Need, When You Need It

One of the biggest context engineering mistakes is front-loading too much information. Instead of pasting entire files into your prompt, use references and let Claude load what it needs.

Reference-Based Prompting

Instead of:

Here's the content of auth.ts: [500 lines of code]
Here's the content of middleware.ts: [300 lines of code]
Fix the authentication bug where tokens expire prematurely.

Do this:

There's a bug in src/auth/auth.ts where JWT tokens expire earlier than
the configured TTL. The token creation is around line 45 and the
validation is in src/middleware/auth-middleware.ts around line 20.
Fix the expiration logic.

Claude reads exactly what it needs, right when it needs it. The first approach burns 800 lines of context immediately. The second approach burns only the relevant sections when Claude opens those files.

Be Specific About What to Look At

Vague requests like "look at the codebase and fix the auth bug" force Claude to explore broadly, consuming tokens on irrelevant files. Specific requests like "the auth bug is in src/auth/session.ts around line 120 where the session TTL calculation uses milliseconds instead of seconds" let Claude go straight to the problem.

The more precisely you point Claude at the right files and functions, the less context gets wasted on exploration.

Using /context to Monitor

The /context command shows you exactly where your tokens are going:

System prompt:     12,847 tokens (1.3%)
Tools:             18,234 tokens (1.8%)
CLAUDE.md files:      847 tokens (0.1%)
Conversation:     284,923 tokens (28.5%)
Available:        683,149 tokens (68.3%)

If "Tools" is eating 50,000+ tokens, you probably have too many MCP servers enabled. Disable the ones you're not using for this task.

If "Conversation" is above 400,000, it's time to compact or start fresh.

Structuring Requests for Maximum Signal

How you structure a request directly affects how much context Claude wastes figuring out what you want.

The Task-Context-Constraints Pattern

The most effective prompt structure for Claude Code follows this pattern:

[TASK] What you want done
[CONTEXT] What Claude needs to know that it can't infer
[CONSTRAINTS] Boundaries and requirements

Example:

Add rate limiting to the POST /api/comments endpoint.

Context: We use Express with Redis already configured in src/lib/redis.ts.
The rate limiter should use a sliding window algorithm.

Constraints:
- 10 requests per minute per authenticated user
- 3 requests per minute for unauthenticated users
- Return 429 with a Retry-After header
- Add tests to src/api/__tests__/comments.test.ts

This gives Claude everything it needs to implement correctly on the first try, without reading extra files to figure out the stack or guessing at requirements.

Include Tests or Expected Output

Including the expected outcome is the single highest-leverage thing you can do. Tests serve as both specification and self-check:

Refactor the date formatting utility in src/utils/dates.ts.

The existing tests in src/utils/__tests__/dates.test.ts should all
still pass. Add these additional test cases:

- formatRelative(new Date('2026-01-15'), new Date('2026-01-16')) → "yesterday"
- formatRelative(new Date('2026-01-10'), new Date('2026-01-16')) → "6 days ago"
- formatRelative(new Date('2025-06-15'), new Date('2026-01-16')) → "Jul 2025"

Now Claude can implement the refactor AND verify it works by running the tests — all without you checking intermediate output.

Ask Claude to Interview You

For complex tasks where you're not sure what context Claude needs, flip the dynamic:

I want to add real-time notifications to the dashboard. Before
implementing anything, interview me about the requirements. Ask about
technical constraints, UX expectations, and edge cases I might not
have considered.

Claude will ask targeted questions using the AskUserQuestion tool, surfacing requirements you didn't think to specify. This is more efficient than writing a massive prompt upfront, because Claude asks only about what's ambiguous.

The .claudeignore File: Excluding Noise

Claude Code respects .claudeignore files with the same syntax as .gitignore. This prevents Claude from reading files that would only waste context:

# .claudeignore

# Build artifacts
dist/
build/
.next/

# Large data files
*.csv
*.sql.gz
fixtures/large-dataset.json

# Generated code
src/generated/
*.pb.ts

# Dependencies
node_modules/

This is especially important for repositories with large generated files, fixture data, or vendored dependencies. Every file Claude doesn't read is context saved for actual work.

MCP Server Context Budget

Each MCP server you enable adds its tool definitions to the system prompt. A server with 10 tools might consume 3,000-5,000 tokens of your context window — before you've even started working.

Audit and Disable

Run /context to see the tool token overhead. If you're not using a particular MCP server for your current task, disable it:

claude config set mcpServers.slack.disabled true

Re-enable it when you need it:

claude config set mcpServers.slack.disabled false

For projects that use multiple MCP servers for different workflows, consider creating separate configuration profiles or documenting which servers to enable for which tasks in your CLAUDE.md.

Advanced: Multi-Session Context Strategies

For large features that span multiple sessions, you need a strategy for maintaining context continuity without carrying stale history.

The Planning Document Pattern

Before implementation, have Claude create a plan document:

Analyze the requirements for the new billing system and create a
plan in docs/billing-plan.md. Include:
- Architecture decisions
- File changes needed
- Implementation order
- Open questions

Don't implement anything yet.

In subsequent sessions, reference the plan:

Continue implementing the billing system from docs/billing-plan.md.
We completed steps 1-3 in the last session. Start with step 4:
the webhook handler.

The plan document acts as compressed context — it captures decisions and structure without carrying the full conversation history.

Session Memory and Auto Memory

Claude Code's memory system provides automatic context continuity:

Auto Memory records patterns, corrections, and decisions as you work
Session Memory tracks what you discussed in specific sessions
Auto Dream consolidates memories between sessions to prevent decay

These systems work in the background, but you can amplify them by being explicit about important decisions:

Important: We decided to use event sourcing for the billing system
instead of direct DB writes. This is a hard requirement — don't
suggest alternatives.

Explicit statements like this are more likely to be captured by auto memory and survive compaction.

The Context Engineering Checklist

Here's a practical checklist to audit your context engineering:

Before starting a session:

Is your CLAUDE.md under 500 tokens and up to date?
Are unused MCP servers disabled?
Do you have a .claudeignore excluding large/generated files?
For continuing work, do you have a plan document or clear starting point?

During a session:

Are you pointing Claude at specific files instead of asking it to explore?
Are you using subagents for exploratory or read-heavy tasks?
Are you monitoring token usage with /context?
Are you compacting proactively at ~50% rather than waiting for auto-compaction?

For complex tasks:

Did you include tests or expected output in your request?
Did you use the task-context-constraints pattern?
For ambiguous requirements, did you ask Claude to interview you?

Between sessions:

Is auto memory enabled to capture cross-session patterns?
Do you have plan documents for multi-session features?
Are you starting fresh sessions for unrelated work?

Real Results

Context engineering isn't theoretical. Developers who apply these techniques consistently report:

Fewer iterations. Tasks that used to take 3-4 rounds of correction complete in one shot when Claude has precise context from the start.
Less hallucination. Claude hallucinates when it lacks information and has to guess. Better context eliminates guessing.
Longer productive sessions. Proactive compaction and subagent delegation keep the context window clean, so sessions stay productive for 50+ exchanges instead of degrading after 20.
Better code quality. When Claude knows your architecture, conventions, and constraints, it generates code that fits your codebase instead of generic solutions.

The compound effect is significant. A 10-minute investment in CLAUDE.md plus consistent session hygiene saves hours of debugging and re-prompting over the course of a project. The developers getting "10x results" from Claude Code aren't using magic prompts — they're engineering their context.