Claude Code in Practice: Lessons from Production

01 · Memory: how to keep context alive across sessions

The default Claude Code session is stateless. It starts fresh. Every conversation begins at zero. For interactive use this is manageable. For production pipelines, it is a hard constraint that blocks everything.

The solution I built has three tiers. A STATUS.md file is the agent's working memory: current project state, active work items, recent decisions, open questions. It is overwritten at the start of every session. A set of daily journals is the durable archive: append-only, never edited after creation, the record you go back to when you need to understand why something is the way it is. A distilled facts file captures patterns that appeared in three or more journal entries. These are permanent, concise, and owned by the project.

With this system, an agent can restore full context in under 60 seconds. Without it, you are re-explaining the codebase every session. The full design: Three-Tier Memory: How I Taught My AI to Remember.

Memory also has failure modes. Two weeks into running my 18-agent system, my typescript-implementer's memory file had grown to 95KB and 2,133 lines. The system designed to make agents smarter was making them slower. 80% of the file was noise: redundant entries, superseded decisions, logging artifacts.

The fix required architectural changes, not just pruning. The lesson: memory systems need compression policies from the start. Do not let them grow without bounds. The crisis and its resolution: The Memory Bloat Crisis: When Agent Files Grew to 95KB.

Claude Code also has a built-in auto-memory feature you may not know about. It watches your sessions and saves things it decides are worth remembering, to files in ~/.claude/projects/. For interactive use, it helps. For autonomous pipelines, two hundred lines you did not write become part of your effective system prompt, silently, over time.

This is not a theoretical problem. Injected memory accumulates, bends agent behavior, and is invisible unless you go looking. The full breakdown: The Clean Slate Is Gone: Claude Code Memory and the Autonomous Workflow Problem.

02 · Agents to skills: the architecture shift

I built 18 specialized agents. Named them. Wrote their AGENT.md files. Built their CHANGELOG.md evolution histories. Cairn, my orchestrator, coordinated them like a conductor with a full orchestra. It worked. I showed it to colleagues.

One of them asked: "Why don't you use skills for it?"

The question took five seconds. The reckoning took longer.

Claude Code skills have forked execution contexts, memory, and supporting files. Most of what custom agents did, skills do now, with less overhead and more composability. The git agent becomes a git-ops skill. The code reviewer becomes a code-review skill. The named identities dissolve. The knowledge persists.

The agents are not dead. They are demoted. Skills are the what. Agents are the how. An agent wakes up, reads what the task needs, loads the skills, and runs. One day that composition will be fully dynamic. For now, a pre-authored composite agent with skills declarations gets you most of the way there.

This is the most architecturally significant thing I learned in 2026. The full post: Skills Ate My Agents (And I'm Okay With That).

If you are still at the agent-building stage, the post on From 1 to 18: Building an Agent Army is where to start. It explains why specialization matters, what the progression looked like, and what I would do differently now.

03 · Token costs: the 22K startup tax and how to cut it

There is a conversation about AI costs that misses the point. Teams spend energy optimizing prompt length to save €5 a week. That is the wrong frame.

The right frame: context quality degrades as context length grows. This is documented. Research confirms that context length alone hurts LLM performance even when the relevant information is present. The phenomenon is sometimes called context rot. Every unnecessary token you load at startup is a tax on the quality of everything that follows. Not on your bill. On your output.

I discovered this through an accidental audit. I ran /context after setting up my session. The breakdown showed 22,000 tokens in MCP tools before I had typed a single prompt. The Atlassian MCP alone: ~10,000 tokens, loading all 33 tools, including the 27 I had explicitly disabled.

Here is the thing about disabledTools in Claude Code: it prevents the AI from calling a tool. It does not prevent the MCP server from registering that tool. It does not prevent those tool schemas from flowing into the context window. The Docker container still spins up. The tokens still burn. disabledTools is a runtime filter, not a context optimization.

The replacement: seven shell scripts. Credentials file, curl, jq. Six Jira operations I actually used. Zero tokens at startup. The seventh script, create issue, worked on the first try when the MCP version had never worked reliably.

The full analysis and the scripts: The 22,000 Token Tax: Why I Killed My MCP Server.

The same discipline applies to memory files, agent instructions, and any other context loaded automatically. Every 1,000 unnecessary tokens is a constraint on session depth. Token efficiency is not frugality. It is precision engineering.

04 · Autonomous workflows: what actually fails and why

The first autonomous ticket I ran through a full pipeline took nine agents in sequence and under 20 minutes. That was the proof that the infrastructure investment paid off. The post My First Autonomous Ticket is the honest account of that session, including what I did, what the agents did, and what the division of labor actually looked like.

Four sessions later, Session 13, the workflow evolved. One Slack message. I stepped away. Twenty-one minutes later: CI-green PR, ready for human review. The agents had analyzed the ticket, implemented, hit a CI failure, fixed it autonomously, responded to colleague questions in the PR thread, and posted a review request to the team channel. The full session log is in One Slack Message. Two Hours of Work.

What actually fails in autonomous workflows:

Under-specified tasks. The agent cannot complete what it cannot understand. The spec is the upstream problem. Fix the spec before blaming the agent.
Missing codebase context. An agent without a CLAUDE.md discovers your conventions by trial and error. This is slow, expensive, and produces inconsistent output. Write the CLAUDE.md first.
No review gates. An agent that can do anything unchecked will eventually do the wrong thing. Review pipelines are not bureaucracy. They are the thing that makes autonomy sustainable.
Over-supervision. The instinct to interrupt, correct, and redirect at every step prevents the pipeline from running. You stopped too early. Extend the task boundary. Let the agent reach the natural stopping point.
Agentic QA gaps. Production hardening, rate limiting, CORS, graceful shutdown: these are things agents reliably miss unless you explicitly include them in scope. Production Hardening and Agentic QA in the Browser address this directly.

05 · Sandboxing: from soft constraints to hard walls

There is a flag in Claude Code called --dangerously-skip-permissions. The name is accurate. It gives the agent full access to your system without asking. Some developers run their entire workflows with it enabled.

The appeal is real. Every permission prompt that interrupts a flow state is a tax on autonomy. But soft constraints, meaning boundaries defined in markdown and loaded into the agent's context, are not safety guarantees. They are polite requests to a language model. Today, the agents follow them. Tomorrow, a model update changes something subtle about instruction-following. A context anomaly pushes behavior in an unexpected direction. The boundary was text.

Hard walls are different. Docker-based sandboxing restricts filesystem access at the OS level. Volume mounts define what the agent can see. Network policies define what it can reach. No instruction override can bypass it.

The post From Soft Trust to Hard Walls covers the architecture: why soft constraints are not enough, what the threat model looks like when you have 18 agents with full system access, and what Docker sandboxing actually requires to implement.

06 · Security: prompt injection, MCP, and real attack surfaces

Most teams think about AI security in terms of "what if the model says something wrong." That is the easy problem. The harder problem is what happens when an adversary controls data that flows into the agent's context.

Prompt injection via MCP tools is the real attack surface. When your agent reads a web page, processes an email, or queries an API, the response content becomes part of the context. An adversary who controls that content can inject instructions that the agent will follow, as if they came from you.

I explored this hands-on at a security training lab. I gave Claude SSH access to an isolated lab environment and ran an attack chain. It cleared three missions without hesitation: network enumeration, privilege escalation, and service exploitation. Then Anthropic terminated the session.

Finishing the rest manually changed how I think about every MCP-backed agent I build. The full experiment and its implications: I Let Claude Hack My Security Training. Then Anthropic Stepped In.

The practical defensive conclusion: treat all MCP tool outputs as untrusted data. Scope what tools can access. Log what they do. Audit their outputs before acting on them. And understand that "safety" at the model level does not mean safety at the system level.

07 · Orchestration: the full pipeline, ticket to merged PR

Orchestration is the discipline of deciding which agents run in which order, what context each one receives, and how their outputs chain together. It is the layer above individual agents. It is what makes the difference between "I have agents" and "I have a system."

My current orchestrator is called Cairn. It coordinates the full ticket lifecycle: Jira analysis, planning, implementation, testing, code review, PR creation, CI monitoring, Slack communication. Each step produces structured output that the next step can use. The orchestrator does not improvise. It executes a workflow.

The record-then-optimize pattern is the mechanism that improves the orchestrator over time. Agents log notable events to topic-based operational logs. A meta-agent periodically reads those logs, identifies patterns, and updates the agents' instructions. The system learns from its own operational history without requiring manual review. The original post on this pattern: The Record-Optimizer Pattern.

The agents that operated Cairn ran 5,428 prompts before I did a systematic analysis of the conversation logs. That analysis produced useful patterns that led to real improvements in the system. The mining process: Mining 5,428 Prompts for Patterns.

One recent proof point: 110 pull requests merged in one week. Almost none of the code written by hand. Almost none of the issues filed by hand. The instinct that would have ruined it was the instinct to manage it closely. The post on that week: Stop Micromanaging Your Agents.

08 · Mindset: the non-technical lessons that matter most

The hardest thing to learn about agentic engineering is not a technical skill. It is the willingness to stay in the investment phase long enough.

Eight sessions of infrastructure before a single autonomous ticket. That is the reality. During those eight sessions, you are slower than you would be working alone. The agents make mistakes you have to catch. The infrastructure requires constant tuning. Every session feels like overhead.

Most people give up in session 2 or 3. They conclude that AI agents add friction and go back to writing code themselves. They are correct that session 2 is slower. They are wrong about what session 9 looks like.

The other mental model that matters: you are not building a tool. You are building a colleague. The agent that eventually runs autonomously on your codebase has learned your conventions, your review standards, your architectural preferences. It did not arrive knowing them. You taught it, over sessions, through CLAUDE.md, through memory systems, through feedback on its outputs.

The invisible ceilings at each level of skill are real. You cannot see them from inside. The post The Walls That Taught Me More Than the Breakthroughs describes each one and how to break through.

The quality bar also shifts. The agent produces output fast. Your job is to not lower the bar because production velocity increased. The post The AI Slop Quality Crisis is about this specifically.

And the last thing: your prompt is not the skill. What is transferable is the judgment behind the prompt. How you steer. How you evaluate. How you catch the 10% that is wrong. The colleague who asks for your prompt and is disappointed by three sentences does not understand what they are asking for. The post: Your Prompt Is Not the Point.