$187 and 16 Hours: My First Million-Token Session

Two things landed in the same week: the 1 million token context window and the Claude Agentic Teams beta. One gave me room to think. The other gave me a way to parallelize. I did what any reasonable engineer would do: I immediately tried to break both with something too ambitious.

The session receipt: $187, 16 hours, 729 tests, 34.8% orchestrator context used

The plan: build a complete cashback campaign web application — backend, frontend, full test suite, containerized deployment — in a single session. One orchestrator. Eight specialized agents spawned as a team. Don’t stop until it’s live.

What actually happened is more interesting than either the successes or the failures on their own.

The Setup

The Agentic Teams feature was the key enabler. Instead of one agent doing everything sequentially, I had an orchestrator that spawned specialized subagents — each with its own fresh context window, each focused on one domain:

Backend implementer — Spring Boot service, API endpoints, business logic
Frontend implementer — React SPA wired to the backend
QA reviewer — running tests, flagging gaps, reviewing coverage
Deployment agent — Dockerfile, compose files, deployment configuration
Git agent — branches, commits, keeping the repo clean
PR handler — pull request creation, descriptions, review assignments
CI monitor — watching the pipeline, catching failures early
Slack notifier — status updates to the team channel

The combination of 1M context and teams changed the equation fundamentally. The orchestrator held the big picture — architecture, decisions, coordination — while each subagent got a fresh context dedicated entirely to its domain. No context pollution between concerns. The backend implementer’s window wasn’t cluttered with CSS decisions. The deployment agent didn’t carry the weight of test output.

That’s not a bigger notepad. That’s a qualitatively different way of working.

The Numbers

Let me give you the receipt before the narrative.

Metric	Value
Total cost	$186.92
Wall time	16 hours
API time	7 hours 42 minutes
Lines of code written	5,800+
Backend tests	649 (all passing)
End-to-end tests	80 (all passing)
Orchestrator context at completion	34.8% used

The gap between wall time and API time tells its own story. Nine-ish hours of waiting — for builds to complete, for containers to spin up, for CI pipelines to run, for me to review and redirect. The agent system was genuinely idle for more than half the clock time. Multi-agent work is often more about managing parallelism and wait states than it is about raw token throughput.

The context number needs explanation: 34.8% is the orchestrator’s context usage — the central agent coordinating everything. But here’s the thing about agentic teams: every subagent spawns with a fresh context window. The backend implementer burned through most of its own context writing 3,000+ lines of Spring Boot code. The frontend implementer filled a separate window with React components. The total tokens consumed across all agents was many multiples of what the orchestrator alone used.

The 1M window mattered for the orchestrator’s ability to hold the full project state — every architectural decision, every agent’s status, every failure and recovery — without summarization loss. The subagents benefited from fresh context dedicated entirely to their domain.

Orchestrator used 34.8% of 1M context — each subagent had its own fresh window on top

What We Built

A cashback campaign web application. Users register for campaigns, submit purchase verification, and receive cashback payouts. Backend exposes REST endpoints with full authentication, campaign management, submission handling, and payout processing. Frontend handles the user journey: campaign listing, submission form, status tracking, account management.

649 backend tests covering units, integration, and API contracts. 80 end-to-end tests exercising complete user flows against the deployed system. Both suites passing at the time of deployment.

Containerized with Docker, deployed to a demo server, accessible over HTTPS. The full stack was live — not prototype-live or local-dev-live, but actually deployed and running with a URL you could share.

In one session.

What Broke

Three things broke in ways worth documenting.

The UI refinement agent hung mid-session. About ten hours in, I spun up an additional agent to polish the frontend styling. It started working, then stopped producing output, then started again, then stopped permanently. The process was still running — consuming tokens, returning nothing meaningful. I had to force-kill it and redistribute its remaining tasks to the frontend implementer. Cause: unclear. Hypothesis: the context had accumulated enough ambiguous signal that the agent entered a local minimum and couldn’t exit without human intervention. I’d seen this behavior before in shorter sessions. At this scale it cost more time. (I wrote a full postmortem on this and three other multi-agent failures in The Agent That Hung.)

Docker configuration required multiple debug cycles. The deployment agent’s first three attempts at the Dockerfile produced images that built successfully and failed at runtime. The failure modes were different each time: wrong environment variable name, missing health check endpoint, volume mount path mismatch. None of these were hard problems — they were the kind of thing that takes ten minutes to fix once you know what’s wrong. But each cycle was 15-20 minutes of build time, which adds up. The agent wasn’t wrong in a systematic way; it was wrong in a random way, which is harder to diagnose.

CORS whitelisting was missing from the first live deployment. The backend deployed, the frontend deployed, we hit the first real URL from a browser — and got CORS errors. The frontend and backend were on different origins, and nobody had configured allowed origins in the API. This is the kind of thing that’s trivially obvious in hindsight and invisible when you’re thinking about everything else. We fixed it in twenty minutes, but the gap between “it works in tests” and “it works when you actually open a browser” is real and shouldn’t be understated.

The failures were recoverable. None of them were catastrophic. But they’re worth naming because the narrative of “multi-agent AI builds complete app in one session” can make it sound smoother than it is.

Was $187 Worth It?

This is the question everyone asks.

$186.92 for a complete, deployed, tested web application. The question is: compared to what?

My estimate for solo development of this system — evenings and weekends, the realistic mode for a side project — is two to three weeks. That’s probably 40-60 hours of actual coding time, spread across a month of calendar time. You don’t get it faster by working harder; you get it faster by having more hours available.

The session compressed that into one long stretch — starting the evening of the 17th and spanning into the early hours of the 18th. Not just in wall time, but in context. When you’re working across three weeks of evenings, you spend a non-trivial portion of each session re-establishing context. What did I build last time? Where did I leave off? Why did I make this architectural decision? The 1M context window meant that never happened. Every agent at every moment had access to the full state of the project.

That context compression is the value. The $187 isn’t paying for code generation — you can get code generation cheaply. It’s paying for unbroken continuity across an entire project, from empty repository to deployed application.

Is $187 a lot? It’s a dinner out. It’s less than an hour of consulting time. For what it produced, it’s laughably cheap if the output is usable — and in this case, the output was usable.

The ROI question gets harder when you ask: “Okay but I’m paying $187 per feature, how does that scale?” Fair. If you’re running sessions like this weekly, you’re spending $800-1000 a month on context. That’s not nothing. But you’re also compressing weeks of work into days, and the comparison baseline should be “what would I pay a contractor” rather than “what would I pay in compute.”

What 1 Million Tokens Actually Changes

The marketing around large context windows is often vague in ways that obscure the real value.

It’s not about fitting more files in. You could always load more files into multiple sessions. The point isn’t storage capacity.

It’s about the orchestrator holding the full picture. The 1M window lets the coordinating agent track every decision, every failure, every architectural choice across a 16-hour session without ever summarizing or losing nuance. When the backend agent reports a schema change, the orchestrator passes that context to the frontend agent accurately — not through a lossy summary, but through the actual decision with its reasoning intact.

Teams multiply the effective context. Eight agents, each with their own context window, means the system’s total working memory is far larger than 1M tokens. Each specialist gets a fresh window focused on its domain. The orchestrator’s 1M window coordinates between them. It’s not one big context — it’s an architecture of contexts.

It eliminates the summarization tax at the coordination layer. Shorter orchestrator windows mean you’re constantly summarizing: “here’s what I built, here’s the current state, here’s what’s failing.” Every summary introduces loss. With 1M tokens on the orchestrator, everything that happened across all eight agents was still trackable. No lossy handoffs.

It makes failures recoverable without restart. When the hung UI agent had to be killed, the orchestrator still had the complete context of what it had attempted. Spinning up a replacement agent with the right instructions was straightforward — the orchestrator knew exactly where the work had left off.

This is why I described it as a different way of building software. Not a bigger version of the old way. A different mode that becomes available when you combine a large orchestrator context with specialized parallel agents.

What I’d Do Differently

Not much — but a few things.

I’d add CORS configuration to a deployment checklist from the start. Not because it’s hard to add, but because it reliably gets forgotten and costs time. The pattern is consistent enough that it should be institutional knowledge.

I’d build in explicit agent health checks. The hung UI agent was running for over an hour before I noticed it wasn’t producing useful output. A simple “if no meaningful output in X minutes, flag for human review” rule would have caught it faster.

I’d be more aggressive about pre-splitting the frontend work. At the scale of a complete application, the frontend implementer had a lot of surface area. Splitting that into UI components and data integration from the start would have parallelized more work.

The Meta-Point

I did this in February 2026. The 1M token context window was new. The Agentic Teams beta was new. The multi-agent orchestration patterns were things I’d been building for months. Everything converged at once.

What struck me most wasn’t the output — it was the experience of building. For sixteen hours, I wasn’t typing. I wasn’t writing code. I was making decisions, reviewing outputs, redirecting agents, thinking about architecture. The implementation was handled. The thinking was mine.

That’s the mode I think agentic engineering is pointing toward: not “AI writes the code for me” but “I architect while AI implements, continuously, in real time.” The session wasn’t sixteen hours of watching progress bars. It was sixteen hours of directed creative work, at a level of abstraction above the code.

Whether that’s exciting or unsettling depends on where you stand. For me, it’s both, which is usually a sign that something real is happening.

The $187 was money well spent. The sixteen hours taught me more about multi-agent system design than any tutorial could. The receipt is right there in the API billing dashboard.

Now I know what a million tokens feels like from the inside. (What I learned next — making those systems production-ready — is in Production Hardening.)