The Cost of Speed
The Practices We’re Throwing Away — They Matter
Agentic coding has made writing software close to free. Claude Code, Codex, Gemini CLI — these tools generate prototypes, migrations, refactors, and tests at a pace that would have seemed absurd two years ago. Acceleration is real.
But a strange asymmetry has developed in the public conversation.
For a certain class of builder, it changed everything: the solo developer, the founder racing to validate an idea, the engineer building a personal tool over a weekend. In those environments, the cost of failure is bounded. A bug means annoyance, not a regulatory incident. A bad abstraction means refactoring next week, not a 3 AM escalation across three continents.
They are not wrong about the speed. They are describing a real change.
But they are not representing capital-P products [1].
There is this other side: teams operating systems used by millions, the ones carrying on-call rotations, production databases, payment flows, healthcare records, identity systems, and logistics infrastructure. They know that the hard part of software was never the physical act of typing code. It was building enough shared understanding around the code that the system could survive change.
They know what happens when speed outruns comprehension because they have the scar tissue.
This matters because a solo developer working with an agent is a fundamentally different situation from an engineering team working with agents. When you build alone, the only person who needs to understand the code is you.
In a team, every agent-generated change must be comprehensible to the people who will review it, debug it at 3 AM, extend it six months later, and explain it when it breaks.
These days, proven engineering practices — shared understanding, disciplined review, architectural assessments — are increasingly dismissed as friction and overhead. The argument is that agents make these practices obsolete: why review code carefully when the agent can review it? Why invest in shared understanding when the agent can explain the code on demand?
These are valid questions which I am going to reflect in the following.
AI Code Hits Production
A analysis highlighted by Stack Overflow, based on CodeRabbit’s review of 470 repositories and pull requests, found that AI-generated code produced substantially more issues than human-written code overall. Even allowing for the incentives of a vendor-backed report, there seems to be a pattern: Agents can generate plausible code faster than it can reliably be validated [2].
In March 2026, Amazon experienced a six-hour shopping outage attributed to a faulty AI-assisted deployment [3], followed by a thirteen-hour outage after an autonomous agent deleted and attempted to recreate a production environment [4].
Amazon’s response: A new requirement for senior engineer sign-off on all AI-assisted code changes.
The stories are anecdotal, but they are not absent.
Structural Limits
A common response: the next model will fix it. Larger context windows, better training data, more capable reasoning. This might not be fully true, however.
The Recall Problem
When an agent works in a codebase, it must find the relevant existing code before it can make a good decision. This is the recall problem: the agent’s ability to retrieve the right context from the existing codebase. The bigger the codebase, the lower the recall. Low recall means the agent misses existing code, duplicates functionality, and introduces inconsistencies.
Larger context windows will probably not solve this, for three reasons:
First, research on context degradation shows that models achieve 85 to 95 percent accuracy on information at the start and end of their context, but drop to 76 to 82 percent on information in the middle — the “lost in the middle” effect [5]. Cramming more code into the window makes the retrieval problem worse, not better.
Second, effective capacity is not advertised capacity: a model claiming 200,000 tokens typically becomes unreliable around 130,000, with sudden performance drops rather than gradual degradation.
Third, and most fundamentally, this is a search problem, not a storage problem. Even with a million-token window, the agent must identify which of thousands of files, functions, and conventions are relevant to the current change. Adding more hay does not help find the needle. 65 percent of enterprise AI failures in 2025 were attributed to context drift or memory loss during multi-step reasoning — not to running out of context [6].
The result: agents’ decisions are always some kind of local. This leads to code duplication, abstractions for abstractions’ sake, and inconsistencies that compound silently.
It does not mean every generated abstraction is wrong. It means the system often cannot distinguish a pattern that is popular from one that is justified.
Cargo-Cult Architecture
Agents reproduce patterns that are statistically common in their training data, regardless of whether those patterns are appropriate for a given situation. This is cargo-cult programming at machine speed.
An ACM study on software reuse in the generative AI era explicitly identifies AI-assisted code generation as “a new form of cargo cult programming — inclusion of code that originates from external sources without consideration or adequate understanding of relevance or side-effects” [7]. Sonar’s analysis of AI-accelerated codebases shows consistently higher cyclomatic complexity — code that is structurally harder to understand and maintain [8].
An agent over-engineers because the training data is full of over-engineering, and it has no “taste” to distinguish signal from noise.
The Context Observability Gap
When agents make decisions, those decisions are invisible. No one challenges the intermediate reasoning. The context that led to an architectural choice vanishes after the generation is complete. In production systems, this means the “why” behind decisions is lost — not as a side effect, but as a structural property of how agents work [9].
Complexity Trap
Improvement in code generation speed gets consumed by the increased complexity of the software we choose to build.
Or in practical terms: “it’s easier to write software, so we write much more of it, so the complexity goes up and not down, which means things break in more interesting ways, which means more incidents, more on call… all the improvements in the tooling will be canceled by this ever-growing complexity” [11].
What Speed Destroys
The most insidious cost of acceleration is not bad code. It is lost understanding.
Cognitive debt is not technical debt. Technical debt lives in the code — messy implementations, poor abstractions, shortcuts that will need fixing later. Cognitive debt lives in developers' minds: the erosion of the shared mental model of what the software does, how it works, and why specific decisions were made.
AI does not create more confusion per unit of work than human programming does. It compresses the timeline. The shock is the speed, not a unique pathology of AI code.
Margaret-Anne Storey documented this dynamic in an educational setting: a student team building a software product was paralysed by week seven or eight [12]. They could no longer make even simple changes without breaking something unexpected. They initially blamed technical debt, but the real problem was that nobody on the team could explain why design decisions had been made. The shared theory of the system had fragmented.
Peter Naur observed in 1985 that a program is not its source code — it is a theory in the minds of its developers, capturing what the program does, how intentions are implemented, and how it can be changed [13].
This shared metaphor lets developers working in parallel make accurate guesses about each other’s work. When ten programmers work simultaneously, each making design decisions, the theory binding their work becomes less coherent with every addition — unless they share a common mental model.
“Clean code” is really about how easily the reader can build a coherent theory of the system. This is what accelerated AI coding destroys: the shared theory.
The Practices Under Threat — and Why They Existed
Code review is the primary mechanism by which teams build a shared mental model. When one human reviews another’s code, knowledge transfers. The reviewer absorbs context. The author clarifies intent. Both update their theory of the system. When an agent generates code and another agent reviews it, nobody learns anything. The code may be correct. The shared theory still erodes. And when the 3 AM incident happens, no human has the mental model to diagnose it.
Disciplined architecture is the foundation of building to last. Agents tend to overfit to short-term utility rather than long-term architecture and sustainability. This can be mitigated by an experienced developer controlling the agent, but it requires skill, consideration and thinking time.
Human understanding also helps fixing the recall problem. If you know the codebase — its conventions, its invariants, its sharp edges — you can guide the agent past its locality limits. You can, because you carry the theory.
Without that understanding, you are flying blind - at ten times the speed.
Taking time is itself an engineering practice. Armin Ronacher makes the case directly: “There’s a reason we have cooling-off periods for some important decisions in one’s life. We recognize that people need time to think about what they’re doing, and that doing something right once doesn’t mean much, because you need to be able to do it over a long period of time” [14].
Specifications are necessary but not sufficient. A common response to all of the above is: “the solution is better specifications — write a sufficiently detailed spec and the agent will produce correct code.”
Gabriel Gonzalez tested this claim rigorously [15]. His finding: even extremely detailed, well-known specifications — like the YAML spec, which is exhaustive and widely implemented — still produce unreliable implementations when given to agents. Specifications matter. They are indispensable. But they are lossy representations of intent, unwritten invariants, and operational realities.
To be fair, certain engineering discipline in our trade really are compensating for human limitations and accumulated debt.
AI definitely can reduce cognitive debt through enforced encapsulation, rapid large-scale refactoring, and targeted test suites [16]. As with any tool it matters, who is wielding it.
The answer is not to reject AI. It is to use it with the discipline the situation demands.
What the Best Teams Actually Do
In fact, the signal from leading AI organizations is revealing.
Anthropic hires the Bun creator, Jarred Sumner. OpenAI acquires Astral — the team behind uv, ruff, and ty.
These are the organisations with the most advanced agentic capabilities on earth, and they are hiring the best human domain experts they can find. They have the best agents. They still want the best humans. That tells you something about how much they trust agents alone.
Technical depth and execution speed used to correlate. They no longer do. You can now get high speed with low depth by delegating thinking to the tool. That can look senior on dashboards and in weekly updates. It is not senior if the output is brittle, unobservable, and expensive to maintain.
The best teams invest not in generating more code faster, but in the infrastructure that makes generated code trustworthy: deterministic test suites, clear module boundaries, co-located documentation, fast feedback loops. Their job is to design environments where iteration converges toward correctness instead plausible nonsense. They treat specifications as first-class artifacts — while understanding that specs are inherently incomplete and human judgment must bridge the gap.
The Bet
Best Practices increasingly being frowned upon as friction were never about preserving the dignity of manual typing.
They exist because software is a coordination problem not a code-generation problem.
They exist because the cost of wrong code scales with the number of people who depend on it.
They exist because teams need shared theory, not just output.
They existed because in production environments, the thing that fails is rarely syntax. It is understanding.
The acceleration is real and irreversible. Nobody is arguing for going back.
But there is a difference between moving fast with comprehension and moving fast without it.
We must not forget what a thirteen-hour production outage feels like. “The agent will fix it” is not (yet?) a reliable incident response plan - rather an expensive bet.
References
[1] D. Breunig, “Two Things I Believe About Coding Agents,” Feb. 2026. https://www.dbreunig.com/2026/02/25/two-things-i-believe-about-coding-agents.html
[2] D. Loker, “Are Bugs and Incidents Inevitable with AI Coding Agents?,” Stack Overflow Blog, Jan. 2026. https://stackoverflow.blog/2026/01/28/are-bugs-and-incidents-inevitable-with-ai-coding-agents/
[3] OECD.AI, “Amazon AI-Assisted Code Changes Cause Major Outages,” Mar. 2026. https://oecd.ai/en/incidents/2026-03-10-01aa
[4] Fortune, “An AI Agent Destroyed This Coder’s Entire Database. He’s Not the Only One with a Horror Story,” Mar. 2026. https://fortune.com/2026/03/18/ai-coding-risks-amazon-agents-enterprise/
[5] Chroma Research, “Context Rot: How Increasing Input Tokens Impacts LLM Performance.” https://research.trychroma.com/context-rot
[6] Zylos Research, “AI Agent Context Compression: Strategies for Long-Running Sessions,” Feb. 2026. https://zylos.ai/research/2026-02-28-ai-agent-context-compression-strategies
[7] ACM, “Software Reuse in the Generative AI Era: From Cargo Cult Towards Systematic Practices,” Proc. 16th Int. Conf. on Internetware, 2025. https://dl.acm.org/doi/10.1145/3755881.3755981
[8] Sonar, “The Inevitable Rise of Poor Code Quality in AI-Accelerated Codebases,” 2026. https://www.sonarsource.com/blog/the-inevitable-rise-of-poor-code-quality-in-ai-accelerated-codebases/
[9] rockoder, “The Context Observability Gap: Why ‘Magic’ Agents Fail in Production.” https://www.rockoder.com/blog/the-context-observability-gap/
[10] H. Sutter, “Software Taketh Away Faster than Hardware Giveth,” Dec. 2025. https://herbsutter.com/2025/12/30/software-taketh-away-faster-than-hardware-giveth/
[11] M. Zechner, “Thoughts on Slowing the Fuck Down,” Mar. 2026. https://mariozechner.at/posts/2026-03-25-thoughts-on-slowing-the-fuck-down/
[12] M.-A. Storey, “How Generative and Agentic AI Shift Concern from Technical Debt to Cognitive Debt,” Feb. 2026. https://margaretstorey.com/blog/2026/02/09/cognitive-debt/
[13] P. Naur, “Programming as Theory Building,” 1985. https://pages.cs.wisc.edu/~remzi/Naur.pdf
[14] A. Ronacher, “Some Things Just Take Time,” Mar. 2026. https://lucumr.pocoo.org/2026/3/20/some-things-just-take-time/
[15] G. Gonzalez, “A Sufficiently Detailed Spec Is Code,” Haskell for All, Mar. 2026. https://haskellforall.com/2026/03/a-sufficiently-detailed-spec-is-code
[16] N. Meyvis, “On Cognitive Debt.” https://www.natemeyvis.com/on-cognitive-debt/