AI Is Breaking and Remaking Software Teams

01 Mar, 2026

Execution Is Cheap, Judgment Is Scarce: The Real Shift in Software Engineering

The Fracture Point

Something fundamental is breaking in software engineering — not the tools, not the languages, but the organizational assumptions that have governed how teams build software for over two decades.

Scrum ceremonies, story points, sprint planning, and the careful separation of Product Owner, Developer, Tester, and Scrum Master were all designed for a world where writing code was the primary bottleneck.

AI has made code cheap. And in doing so, it has exposed everything else.

This shift is often mis-characterized as a simple productivity boost. Headlines focus on five-times gains, smaller teams shipping faster, or AI “replacing” developers.

The reality is more nuanced. For every team experiencing dramatic acceleration, others are drowning in pull requests, review backlogs, and systems that no one truly understands.

The Immutable Value of Fundamentals

Counterintuitively, the AI era does not erase the need for engineering rigor; it aggressively rewards it. AI is like a mirror, it amplifies whatever it ‘sees’.

Test-driven development, clean architecture, well-defined interfaces, and comprehensive documentation make AI agents exponentially more effective and their output more reliable. Teams with strong discipline see compounding returns from AI, while teams with poor discipline might drift into chaos.

Stripe’s internal “minions” system, for example, demonstrates that agent performance depends less on raw model intelligence (still very importatnt) than on reproducible development environments, fast deterministic tests, and strict conventions embedded into tooling and workflows.

Where context is clean and constraints are explicit, agents perform reliably. Where they are not, failure modes multiply.

Amdahl’s Law and the Productivity Paradox

Making coding exponentially faster yields only a marginal improvement in overall delivery speed. If writing code historically constituted about 20% of the delivery cycle, optimizing it to near-zero time quickly exposes the true bottlenecks.

The friction now concentrates at the human-gated stages: Requirements gathering, code review, rigorous testing, security audits, and deployment.

The result is a familiar but intensified pathology. Senior engineers become review bottlenecks, spending their time debugging enormous AI-generated pull requests instead of designing systems. Review queues grow. Cognitive load explodes. The system jams.

The wrong answer is loosening review standards to unblock the pipeline. This simply translates artificial speed into production incidents.

The Acceleration of Cognitive Debt

Programming is fundamentally an exercise in theory building. “Cognitive debt” is the erosion of this shared understanding: how the system works, what invariants matter, and where the sharp edges are.

AI does not necessarily write worse code than humans; but it compresses time drastically.

Teams are hitting the natural complexity limits of their architecture in a matter of weeks instead of years. Without proper comprehension of the underlying design choices, a team eventually becomes paralyzed.

Cognitive debt will spike unless human comprehension is actively managed.

The Collapse of Predictability and the Story Point Fallacy

Agile methodologies rely on the assumption that effort is at least loosely predictable. Story points, velocity tracking, and sprint commitments all depend on this premise.

AI breaks it.

An agent can complete a complex, historically “five-point” feature in minutes, then spend hours looping on a trivial dependency issue that would be a “one-point” task for a human. Effort becomes non-linear and highly variable. Velocity ceases to be meaningful.

Handoffs Stop Making Sense

As code generation approaches zero marginal cost, the valuable work shifts decisively upstream and downstream: problem framing, system integration, operational reliability, and product judgment.

That work does not map cleanly onto siloed handoffs from product manager to designer to engineer to QA to SRE. Each translation step introduces latency and distortion.

The separation of responsibilities vanishes because rapid iteration requires tighter loops and fewer translation steps.

The Momentum of Change: Patterns from the Frontier

To get an idea of the trajectory of AI-native engineering, we can look at some “high-performing” teams:

Agent-Friendly Infrastructure: At Stripe, unattended coding “minions” run in standardized dev environments, execute workflows orchestrated by deterministic blueprints, and integrate with internal tools via the Model Context Protocol (MCP). This demonstrates that agent performance relies less on raw model intelligence and more on engineering hygiene: reproducible environments, fast tests, and safe tooling boundaries.
The Durable vs. Disposable Divide: Teams must explicitly split their codebases. In the disposable domain (prototypes, scripts), speed dominates and maintenance expectations are low. In the durable domain (systems of record, safety-critical services), trust is earned through tests, observability, and disciplined change control. As highlighted by Honeycomb, AI makes disposable code dramatically cheaper while raising the premium on rigor for durable code.
Context Engineering: Developers are moving toward Spec-Driven Development. Specifications, repository rule files, and structural conventions are treated as first-class artifacts that agents consume.

Defining the Shift: A Spectrum of Maturity

For meaningful discussion we need well defined semantics:

AI-Assisted: The organization gives developers coding assistants but leaves workflows, team structures, and ceremonies unchanged.
AI-Augmented: AI is embedded into testing, documentation, and code review. Processes begin to adapt — sprints shorten, review standards adjust — but traditional team topologies and handoffs largely persist.
AI-Native: The organization is redesigned from first principles. The operating model involves small teams, collapsed role boundaries, spec-driven development, and agent orchestration where humans direct and review far more than they implement.

Moving from assisted to augmented is a tooling change; moving from augmented to native is an organizational transformation that fundamentally alters the identity of a software engineer.

From SCRUM to AI-Native Cells

For existing organizations, transformation will be staged:

Stage 1: Stabilization and Gating (Must-Haves)

The objective is to integrate AI coding tools while aggressively protecting codebase durability.

Leaders must invest in clearing non-code chokepoints: review bandwidth, testing velocity, deployment confidence, and decision latency.
Begin documenting architectural decisions and domain context in machine-readable formats, i.e. create your foundational set of project invariants, style guides, and instructions
Maintain strict human review requirements: require that at least one human developer fully comprehends both the mechanics and the why behind every AI-generated commit before it merges.
Trap to avoid: Do not get distracted by tooling or custom agent development at this stage; focus relentlessly on getting your context and testing foundations solid. MCP is mostly distraction.

The hard work is to bring tacit team knowledge into agent consumable form. The team’s intuition, pattern recognition, and contextual judgment that formed through long-term direct engagement with the domain.

Stage 2: Structural Redesign and Spec-Driven Workflows

This stage dismantles legacy Agile and restructures teams to match AI’s operational cadence.

Transition to Micro-Teams: Disband large departments into small, highly autonomous cells of cross-functional (PO-)engineers who own an entire vertical slice of the product.
Adopt Spec-Driven Development (SDD): Teams act as “context engineers,” writing dense, deterministic specifications and sign-off criteria (e.g., test cases) rather than writing the implementation code directly.
Implement Stacked PRs: Replace massive, multi-day PRs with chains of tiny, independent diffs (under 200 lines) that humans can review in minutes.
Shift to Cycle Time Metrics: Eradicate story points. Measure success by cycle time (idea conception to production deployment) and revenue impact per engineer (this is still research realm)

This is where the transformation becomes personally consequential. The traditional separation between product owner, developer, and tester dissolves. The Scrum Master role expands into a floating coach serving multiple cells. Engineers who thrive with broad ownership will flourish. Engineers who have built careers around deep single-discipline specialisation may struggle. Leaders must provide genuine support — retraining, mentoring, and alternative career paths — rather than simply announcing the new structure.

Stage 3: Autonomous Orchestration (?)

The final frontier might involve designing infrastructure specifically for non-human entities. Deploy agent-friendly “devboxes” — isolated, predictable sandboxes where AI agents can iterate and fail safely without risking production data. Redefine the daily workflow along the lines of what Bloch calls the 10 AM rule: engineers spend mornings defining objective functions and aligning context, allowing agents to execute throughout the day and night.

But this is pretty much uncharted territory….

The Risks

The risks are real and demand honest acknowledgment:

The Cognitive Debt : Speed without comprehension is a fatal liability. Teams that ship faster without investing in shared understanding will build unmaintainable systems in weeks.
The Talent Polarization Problem: AI-proficient engineers will pull ahead while others stagnate. Leaders have an ethical obligation to invest in retraining and provide genuine pathways for adaptation, not merely optimize for the highest performers.
The Measurement Trap: Output metrics such as lines of code or PRs merged are actively harmful. Cycle time, defect escape rate, and revenue per engineer are imperfect but directionally better.
The Incumbent’s Dilemma: Legacy systems and cultural habits slow change. And waiting for certainty is itself a risk.
YOLO: Don’t get lost in tooling, MCP integrations, framework churn, etc. Focus on first principles, focus on what gets into your agent’s context.

Impact on us

For us engineers, the value proposition is shifting rapidly. Generating syntax is no longer a specialized profession; it is becoming a general business competency.

The value of writing code is falling by every measure — but the value of knowing what to build is rising.

Career longevity depends on mastering domains where AI struggles. Architecture, domain expertise, problem framing, verification, orchestration.

The developers who thrive will not be the ones who excel writing your-language-of-choice by heart; they will be the most rigorous editors, the best context engineers, and the clearest thinkers.

Conclusion

The transformation is profound and unavoidable and involves genuine risk to existing structures, career paths, and professional identities. It demands investment in engineering hygiene as much as in new LLM tooling.

Most importantly, it requires leaders who can deal with two truths simultaneously: the organizational change is profound and absolutely necessary, yet the human beings navigating this shift deserve support, retraining, and dignity throughout the process.

The future belongs to teams that are small, autonomous, spec-driven, and intensely focused on judgment over execution. Arriving there responsibly is the defining engineering challenge of our time.

References

Can AI really code? Study maps the roadblocks to autonomous software engineering” — MIT News (July 16, 2025)
others I forgot to bookmark..

#Development #Ai