telegram-en

Claude Code Compaction: How Context Compression Actually Works

Daniil Okhlopkov

31 Mar 2026 • 4 min read

There's a hidden mechanic in Claude Code that most people don't think about until it costs them an hour of work. Compaction is the automatic compression of your conversation context when you approach the token limit. I ran into it when my always-on agent "forgot" a critical rule from the beginning of a session after 200 minutes of continuous operation. Here's what I learned by digging into how it works.

What is compaction and why it exists

Claude Code runs in your terminal and maintains long sessions. Unlike ChatGPT where each message is a separate request, Claude Code accumulates context: files it read, commands it executed, grep results, diffs, error messages. All of this stacks into one massive prompt.

The problem: models have context window limits. Claude Sonnet has 200K tokens, Opus goes up to 1M. But even a million tokens runs out when your agent is actively working through a codebase. Compaction kicks in when context reaches roughly 95% of the limit.

What happens: Claude Code takes the entire conversation history, sends it to a separate model call with a prompt like "compress this into key facts," and replaces the full history with a condensed summary. You don't see this happening — at some point, the agent starts working from "memory" of past interactions rather than the full transcript.

What survives compaction, what doesn't

From my experience running agents daily, compaction reliably preserves:

The current task and its immediate context
File names that were recently modified
Recent errors and their solutions
General project architecture

What consistently gets lost:

Instructions from the start of the session — "don't touch this file," "use this format"
Intermediate decisions — why you chose approach A over B
Specific code snippets discussed 50 messages ago
Subtle style rules — "no emoji," "no Co-Authored-By in commits"

This makes sense: compaction optimizes for "what to do next," not "why we did what we did." Decision context is the first casualty of compression.

Custom compaction prompts

Claude Code lets you override the compaction prompt. This might be the most underrated setting. In settings.json (at ~/.claude/settings.json), there's a parameter:

{
  "compactPrompt": "Preserve ALL rules from CLAUDE.md verbatim. Keep file paths, error messages, and architectural decisions. Summarize tool outputs but keep their conclusions."
}

I've tested multiple variations. The default prompt is a generic "summarize the conversation." It works fine for short sessions, but for agents running for hours, you need more precision.

My current compaction prompt:

{
  "compactPrompt": "When compacting, you MUST preserve:\n1. All rules and constraints from CLAUDE.md and system prompt — copy them verbatim\n2. Current task context and progress\n3. File paths that were modified\n4. Specific error messages and their solutions\n5. User corrections and preferences stated during the session\n\nYou MAY summarize:\n- Tool call outputs (keep conclusions, drop raw output)\n- File contents that were read (keep what was learned, drop the text)\n- Exploratory steps that led nowhere"
}

The difference is noticeable. After switching to this custom prompt, my agents stopped "forgetting" CLAUDE.md rules during long sessions.

CLAUDE.md vs compaction: where to put what

The key insight from three months of running Claude Code agents: never rely on compaction for critical rules. Everything the agent must always remember should live in CLAUDE.md.

CLAUDE.md loads as part of the system prompt at the start of every session. Compaction doesn't touch it — it exists outside the conversation history. It's the only place that's guaranteed to survive any compression.

My strategy:

CLAUDE.md — architectural rules, code style, prohibitions, credential paths, naming conventions. Anything that applies to every session
Conversation context — current task, intermediate results, discussions. It's OK to lose this during compaction
Project files — TODO.md, ARCHITECTURE.md, notes. The agent can re-read these if compaction erased the context

Rule of thumb: if you catch yourself repeating the same instruction to the agent — move it to CLAUDE.md. I covered CLAUDE.md configuration in detail in my Claude Code setup guide.

How to tell when compaction happened

Claude Code shows a [context compacted] indicator in the terminal. But if you're multitasking, it's easy to miss. A more reliable signal: watch the cost counter in the bottom-right corner. A sudden cost spike (compaction is a separate API call) plus a context counter reset means compaction just fired.

Another telltale sign: the agent starts re-asking things you already discussed. "Which file has the config?" — even though you edited that file 20 minutes ago. That's a clear signal that context was compressed and details were lost.

When compaction causes real problems

A concrete case from my work. I run a server-agent — Claude Code running 24/7 on a VPS, accessible via Telegram. The agent receives a message, processes it, waits for the next one. Sessions can last hours.

The problem: after compaction, the agent would "forget" access control rules. I have a permission matrix — which files can be read from which chat context. These rules were in the initial prompt, but after compression the model lost them and started responding to requests it should have blocked.

The fix: I moved all security rules into CLAUDE.local.md (which loads on top of the base CLAUDE.md). Now they're in the system prompt, not in the conversation history — compaction can't touch them.

Practical tips for managing compaction

Customize your compactPrompt. Tailor it to your workflow. Working with data? Ask it to preserve SQL queries. Writing articles? Ask it to keep style rules. The default is too generic for specialized work.

Keep sessions short when possible. A fresh session means clean context plus full CLAUDE.md. Sometimes that's better than a long session with two compaction events. I've found that breaking work into focused 30-60 minute sessions produces better results than marathon 4-hour runs.

Use files as external memory. Ask the agent to write key decisions to a TODO.md or NOTES.md file in your project. After compaction, it can re-read them. This pattern is especially important for Obsidian-based workflows where the vault acts as long-term memory.

Put everything critical in CLAUDE.md. Rules that must always apply go there, nowhere else. Don't hope that compaction will preserve them — assume it won't.

Test compaction intentionally. Run a long session, wait for compression, then check — does the agent still remember the rules from the beginning? This will save you hours of debugging later.

Compaction isn't a bug — it's a necessary compromise. Context windows are growing, but as long as they're finite, compression will be part of working with AI agents. Better to understand how it works than to be surprised when your agent "forgets" a critical rule at the 200-minute mark.

Follow me on Twitter/X @danokhlopkov for more on AI agents and automation.