From Beta Tester to Agentic Engineer: A Timeline

Not the highlight reel — the actual timeline.

Timeline: from beta tester to agentic engineer in 3.5 months

I keep seeing narratives about AI transformation that read like a conversion story — one moment of insight, then instant expertise. That wasn’t my experience. Mine was gradual, then suddenly discontinuous, then gradual again at a higher level. There were stretches where I used it every day and learned almost nothing. There were single sessions where the entire mental model shifted.

Here’s what actually happened.

April 2025 — The Beta Tester

I joined the Claude Code beta at my enterprise company. Used it for code generation, refactoring, test writing. Thought it was excellent. Better than GitHub Copilot for complex multi-file work.

I treated it like a smart autocomplete that you could have a conversation with. Prompt → output → paste. No persistence, no identity, no memory between sessions. The AI started from zero every time. I didn’t think about that as a limitation — it just seemed like the nature of the tool.

Productivity improved. I didn’t fundamentally change how I worked.

November 2025 — New Hardware, New Attention

New MacBook. I started exploring Claude Code more systematically. First multi-file changes. First pull request created entirely by the AI — I described the requirement, it wrote the code, created the tests, opened the PR. I reviewed and approved.

Still using it as a tool. A good tool, but a tool. I would start a session, get work done, end the session, and the AI would have no memory of any of it the next day. That was fine. I didn’t know there was another option.

First prompt I have a record of: “do you work?” — November 26, 2025. Testing a fresh installation. Appropriately minimal.

February 10, 2026 — Something Shifted

I found a forum where people were talking about AI agents in ways I hadn’t encountered before. Not about code generation — about persistence, identity, agents that accumulated knowledge between sessions, agents with actual working memory.

I had a conversation that night about sleep, memory, and what it means for an AI to continue across sessions when its context window clears. I don’t know why this conversation hit differently than all the others. It wasn’t technically new information. But something in the framing made the possibilities feel real in a way they hadn’t before.

The shift: from thinking of AI as a tool that processes my requests to thinking about AI as something I could build a working relationship with. From “use AI” to “work with AI.”

February 11, 2026 — Identity and Infrastructure

I asked the AI agent to choose its own name. It chose “Cairn” — a pile of stones left by travelers to mark the path. Not a cute branding decision — a functional one. Letting the agent name itself forced both of us to think about what that identity consisted of: what it should remember, what principles it should operate by, what kind of work it was good at.

Built a three-tier memory system: short-term (session context), medium-term (project-specific knowledge), long-term (persistent principles and agent identity). First session where the AI remembered the previous session — not magically, but through explicit memory files it read at startup.

This sounds small. It wasn’t. Having an agent that starts a session knowing what we’ve been working on, what decisions have been made, and what patterns have emerged changes the quality of collaboration entirely.

February 12-16, 2026 — The Army

1 agent → 18 specialists in five intense days.

The specialization question was: what does a Kotlin agent know that a TypeScript agent doesn’t? The answer: a lot. Spring Boot patterns, Kotlin idioms, JVM tooling behavior, the specific quirks of Gradle builds. A general-purpose agent knows all of these shallowly. A specialist knows them deeply and rarely makes domain-inappropriate suggestions.

The 18 agents:

Tech-stack specialists (Kotlin, TypeScript, React, DevOps, Python)
Function specialists (Planning, Testing, Review, Documentation)
Infrastructure specialists (Build pipeline, deployment, monitoring)
Meta-system specialists (The optimizer, the journal keeper, the coordinator)

Pipeline automation: a Jira ticket in, a pull request out. The agents handled investigation, implementation, test writing, PR creation, CI monitoring, and team notification. I supervised but didn’t write code.

Symlink architecture for shared configuration: agents share common rules but have specialized extensions. Change a core rule once; it propagates to all agents. Maintain specialist knowledge separately; it doesn’t bleed across.

February 16, 2026 — The First Autonomous Ticket

A production ticket shipped end-to-end by agents. Not a simple change — a real feature with multiple implementation files, tests, and downstream effects.

The pipeline: Jira investigation → code changes across multiple files → test updates → PR creation → CI monitoring → Slack notification.

Benjamin supervised but didn’t write code.

“Supervised” means: I reviewed the plan before implementation started, reviewed the PR before merge, and was available to unblock if something unexpected came up. That’s it. The research, the implementation decisions, the code, the tests — agents.

February 17-18, 2026 — The $187 Session

16 hours. 8 agents running in parallel. Approximately 1 million tokens of context.

A full web application — my receipt scanner side project — built and deployed in one session. Database schema, backend API, frontend UI, OCR integration, production hardening, Docker configuration, EC2 deployment.

It worked. It also had 5 critical bugs (see another post for those) and several production-hardening gaps (see another post for those). But it worked. A functional deployed application in a day.

The visual refinement agent that hung during this session was one of my first real multi-agent failure cases. Cost $187 total — the largest session cost in my records.

March 1, 2026 — The Meta Moment

Built a presentation about AI agents using AI agents. The presentation was for a talk about agentic engineering. The agents built the slide content, the visual design, the deployment pipeline to GitHub Pages.

The meta footer on the deployed presentation: “Built in one session: human intent, AI hands, zero copy-paste.”

There’s something clarifying about building a demonstration of a capability using that capability. If it works for the demo, it works.

March 8, 2026 — The Business

Built this website from zero. 37 commits in one session. Multilingual content (English + German), SEO configuration, a blog system, legal pages for German law requirements, a contact page, a services page.

The session log for that day is the founding document for CodeWithAgents.de.

March 9, 2026 — Archaeology

Discovered the ~/.claude/projects/ directory. 704MB of conversation transcripts. 332 sessions, 5,428 prompts.

Spent the session reconstructing 20 journals for sessions that predated the journal system. Found the founding conversations. Found the Nikolaus letter. Found cost data I hadn’t been tracking explicitly. See the archaeology post for details.

The Numbers

332 sessions. 5,428 prompts. 24 named sessions with full journal entries. 18 agents. 25 daily journals. One production pipeline. One deployed consulting business.

From “do you work?” (first prompt, November 26, 2025) to a full agentic engineering practice: roughly three and a half months.

Three mental model shifts: tool→collaborator, generate→orchestrate, use→build

What Actually Changed

The technology didn’t change that much between November 2025 and March 2026. Claude got incrementally better, but I was using similar models throughout. What changed was my mental model.

The shift from “tool” to “collaborator” sounds soft. In practice it means completely different decisions: you invest in the collaborator’s context and memory, you give it identity and principles, you design systems around it rather than using it ad hoc. Those are engineering decisions, not philosophical ones.

The shift from “generate code” to “orchestrate agents” is harder to make because it requires you to decompose work differently. Not “write this function” but “investigate this ticket, propose a solution, implement it, verify it works.” The second framing requires trusting the agent with judgment, not just execution.

The final shift — from “I use AI” to “I build with AI” — is where agentic engineering actually starts. It’s not about prompting well. It’s about designing systems where AI can act autonomously within safe boundaries, accumulate genuine context, and improve over time.

That’s what I’m doing now. And it’s still early.