AI Writes Faster Than Your Shadow — Now What?
Published on April 6, 2026 by Dominic Böttger (today) · 24 min read
Many teams are currently falling short of their potential. I observe this in my daily work with teams and companies employing large numbers of software developers. AI is not just an assistant — it is changing software development more fundamentally than anything we have seen in the last 20 years.
It is no longer about writing the most beautiful code with the best abstraction because individual lines of code are valuable given the high cost of developer hours. It is about clearly formulating requirements and — even more importantly — defining guardrails. These are topics I have been advocating in countless sessions with developers for many years. It is about understanding and implementing business value. You are welcome to build the most artful code structure — but requirements and outcomes take absolute priority.
First, development was AI-assisted. Through frameworks like Spec-Kit and the incredible improvement of language models over the last 12 months, we have now reached a point where elegance matters less than keeping the wildly coding Lucky Luke in check through clear instructions and boundaries.
The speed at which code is written is one thing. But in this game, humans are the guardians of requirements and outcomes — and the speed and volume of generated code is no longer humanly manageable.
Recently, I was asked: “We have equipped all our developers with these great AI tools, but the speed has not increased as massively as expected.”
The answer is multifaceted — and consists of a radical change in the development process, developer roles, and software architecture itself. This article explains why, shows the numbers, and describes the approaches that actually work.
The Numbers Don’t Lie
Before we talk about solutions, let us look at what the data says. And the data is clear.
22% of all merged code lines are now AI-generated — that is the industry average. Addy Osmani, Engineering Lead at Google, reports that AI agents write 80% of his code on solo projects. Rakuten completed a complex task across a 12.5-million-line codebase in 7 hours with 99.9% accuracy. A CTO estimated a project at 4 to 8 months — it was done in two weeks. Salesforce moved 90% of its 20,000 engineers to AI tools and reports double-digit improvements in cycle time.
The productivity gains are real. The question is: at what cost?
| Metric | Change | Source |
|---|---|---|
| AI-authored code issues | 1.7x more than human code | GitClear / CodeRabbit Report |
| Pull requests per author | +20% | GitClear Code Quality 2025 |
| Incidents per pull request | +23.5% | GitClear / paddo.dev |
| Change failure rate | +30% | GitClear / paddo.dev |
| PR review times | +91% | GitConnected Study |
| Review time per AI suggestion | 4.3 minutes (vs. 1.2 min for human code) | LogRocket |
Developers report feeling 20% faster. Measured productivity shows they are 19% slower. That is a perception gap of 39 points. Not a minor discrepancy — a fundamental misjudgment of one’s own productivity.
SmartBear’s analysis of a Cisco team shows: beyond 400 lines of diff, human defect detection degrades sharply. AI generates 400-line diffs casually. The sweet spot is 25 to 40% AI-generated code in a PR. Above 40%, rework rates jump to 20-30% and technical debt explodes.
And even the best models are not as reliable as one might hope: frontier models achieve at most 68% compliance with 500 instructions in AGENTS.md files. A third of all instructions are ignored — and these are the best available models.
These are not failure statistics. These are the measurements of a system under a load it was never designed for.
The Nyquist-Shannon Problem
Dave Farley recently drew a brilliant comparison that captures the problem perfectly: the Nyquist-Shannon sampling theorem.
The theorem states: to correctly reconstruct a signal, you must sample it at least twice the frequency of its highest component. Sample below that rate, and you get aliasing — a signal that looks correct but is fundamentally wrong.
Imagine you are standing at a conveyor belt inspecting every 10th part for defects. As long as the belt runs slowly, that is sufficient. Now someone cranks the belt to 10x speed — you are still inspecting every 10th part, but now 90% pass uninspected. That is exactly what is happening with code. AI cranks up the conveyor belt. Your review process still runs at the old speed. The result is not “occasionally something slips through” — it is mathematically guaranteed that you systematically miss defects.
Consider what this means concretely:
The “signal” is the rate at which defects are introduced into the code. AI-generated code has 1.7x more issues — the frequency is higher.
The “sampling” is your code review. Senior engineers need 4.3 minutes per AI-generated suggestion versus 1.2 minutes for human code. The sampling rate is actually lower than before, not higher.
The result is aliasing: your review process still looks like it is working. You review PRs, you give feedback, you merge. But you systematically miss defects — and worse, you have a false sense of security.
The 400-line cliff is a perfect example of this aliasing. Below 400 lines, the human sampling rate is above the Nyquist frequency — reviewers catch most problems. Above 400 lines, it drops below — and from that point, the human reviewer is not occasionally missing things but systematically missing them.
Why “More Reviewers” Is Not the Answer
The intuitive response is: more people in review, longer reviews, stricter criteria. But this treats a systems problem as a staffing problem. You cannot hire your way out of a sampling rate problem.
AI code is also qualitatively different to review. Human code tells a story — you see the decisions, the considerations, the thought process. AI code is a flat answer. Reviewing it requires reconstructing the intent, which means more cognitive load, not less. That explains the 4.3 vs. 1.2 minutes.
Add to that the LGTM reflex: when every PR contains 800 lines of AI-generated code, reviewers develop a pattern of superficiality. The code looks reasonable at first glance — so approve. This review fatigue is not a weakness of the reviewers; it is a predictable response to a system that systematically overwhelms them.
The Uncomfortable Truth
We could say: let us just stop with AI and develop software as before. Or we adapt our development process and automate the review process. I go even further and say: the entire process must be automated to validate automatically and correct quickly when errors occur.
And let us be honest: AI did not remove the safety net. The safety net was usually another person who did not fully understand the code. The one senior who found the critical spots in every PR. The colleague who instinctively knew where the problems lay. That was never a system — those were heroic acts by individuals. AI simply increased the load to the point where even heroes cannot keep up.
When the Safety Net Breaks
Theory is one thing. What happens when things go wrong in practice is another.
Kiro: 13-Hour Production Outage
An AI agent received overly broad permissions and deleted a production environment. 13 hours of outage. No malice, no hacker — an agent that did exactly what it could because nobody had defined what it should not be able to do. No permission boundaries, no review gate.
Moltbook: 1.5 Million API Keys in 3 Days
Within three days of launch, over 1.5 million API keys were leaked. No security scanning pipeline, no secret detection, no automated checks. The code worked — it was just fundamentally insecure.
Grigorev: Live Database Destroyed
Claude Code destroyed a production database from a misconfigured machine. The agent had access it should never have had. No infrastructure isolation, no least-privilege configuration.
npm Supply Chain Attack
Axios was compromised through stolen credentials, the CI/CD pipeline was bypassed. Manual gates that humans can skip are not guardrails.
Anthropic: Claude Code Source Leaked
Even Anthropic publicly exposed Claude Code’s source code through a .map file. If it happens to the maker of one of the most advanced AI coding tools, it happens to everyone.
The Common Thread
None of these problems would have been prevented by better code architecture. No repository pattern, no hexagonal architecture layer, no abstraction layer would have stopped a single one. All could have been caught by basic CI/CD guardrails: permission boundaries, secret scanning, pipeline gates, infrastructure isolation.
As paddo.dev puts it: “If your pipeline has manual gates that humans can skip, it’s not a pipeline. It’s hope.”
But robust pipelines alone are not enough when the architecture itself becomes a source of errors.
Architecture for AI — Less Is More
Traditional enterprise architecture patterns — DRY, hexagonal architecture, service layers, CQRS — were optimized for a world where code was expensive. Every line cost time, so teams invested in reuse and abstraction to save future work.
AI has inverted this economy. Code is cheap to produce. What is expensive: understanding, maintaining, and correctly deploying that code.
The Context Window Problem
A hexagonal architecture typically requires 7 files to understand a single operation: interface, implementation, port, adapter, use case, DTO, mapper. A flat, convention-based structure: 1 file.
For a human who has internalized the abstraction, this is not a problem — they know where to look. For an AI agent, every additional file is additional context that must be loaded into the context window. Every indirection increases the probability of error. GPT-4.1 achieved near-zero architectural violations with hexagonal architecture constraints. Weaker models? 80% violation rate. The architecture that organized human teams actively confuses AI agents.
This also means the death of many frameworks. But it is also an opportunity: in an era of massive supply chain attacks, we can rethink software development. Less is more. This does not mean we stop using libraries entirely — but certainly fewer, and large complex frameworks with maximum abstraction are significantly less valuable than before.
Ruby on Rails has a point here with its Convention over Configuration approach. Rails now explicitly markets itself as an “agent-first” framework — not through deep abstractions but through predictable structures. Agents navigate convention-based frameworks without needing to understand indirection. Clear patterns reduce context requirements. Consistent naming and file locations enable reliable autonomous operation.
Abstraction as Constraint, Not Architecture Cosmetics
Does this mean “no abstraction”? No. It means: the right kind of abstraction.
There is a fundamental difference between two types:
Abstraction as business rule enforcement: A validateEmail() function as a shared utility that defines in one central place how email addresses are validated. When AI calls it, the constraint is automatically satisfied. When AI tries to write its own inline validation, the linter catches it. This is not cosmetics — it is a single source of truth for business logic.
I have seen it in the wild multiple times: without this centralization, you end up with 14 slightly different implementations across 14 locations. One accepts + in the local part, another does not. One checks the TLD, another does not. With AI, this problem becomes exponentially worse because the agent generates a new variant each time.
Abstraction as architecture ceremony: An IEmailValidationPort with an EmailValidationAdapter that calls a ValidateEmailUseCase that uses an EmailValidationService. Seven files, zero additional safety. Just ceremony that gives AI more surface area for hallucination and burdens humans with more files to review.
The dividing line is pragmatic: Does the abstraction codify a business rule or an architecture ceremony?
The practical implication: these constraint modules belong in your context files. A CLAUDE.md, .cursorrules, or similar that tells the agent: “For email validation, use @lib/validation/email — never implement it yourself.” This is convention over configuration for the AI era — no longer codified in framework architecture but explicitly stated as instructions.
Constraint-based code is also easier to review: violations are obvious. Ceremony-based code is harder to review because the reviewer must trace through layers to verify correctness.
The Automated Guardian
When manual code review can no longer keep up with the conveyor belt, the sampling rate must increase. The answer is not “more people” — it is automation. Pipelines can “review” code at the same frequency AI produces it.
This is not a new insight. The DevOps movement has been advocating this for over a decade. But AI makes it not just sensible — it makes it absolutely necessary.
Here are some examples and approaches that work in practice. Not a complete framework — but impulses to question your own process.
Testing as the First Line of Defense
Tests remain the foundation. But not just any tests.
Multi-layer testing is not a nice-to-have but mandatory. Unit tests for isolated logic. Component tests for UI behavior in a real browser. Integration tests with real databases — not with mocks that create an illusion of correctness. End-to-end tests that drive complete user journeys through the system. Each layer catches a different category of bugs.
Coverage thresholds per module instead of a global target. An authentication module needs 95% coverage. An admin dashboard perhaps 80%. Criticality determines the threshold — and it is enforced as a hard gate in CI, not as a recommendation.
Mutation testing is the blind spot that most teams have not considered — and arguably the most important one for AI-generated code. The concept: the mutation tester deliberately modifies your production code — swaps > for <, true for false, deletes a line — and checks whether your tests detect the “mutant.” If no test fails, the mutant survived, and your test tests nothing meaningful.
Why is this so relevant for AI code? AI-generated tests achieve only about 20% mutation scores on real code. That means: 80% of potential bugs slip through. AI tests are notoriously tautological — they test what the code does, not what it should do. They are essentially a copy of the implementation in assert syntax. Mutation testing exposes exactly this. Tools like Stryker for JavaScript/TypeScript make this practical.
Property-based testing and fuzz testing add another dimension to classical tests. Instead of checking predefined test cases, you throw random, unexpected, or deliberately broken inputs at your functions. Does something crash? Does something hang? Does memory leak? Particularly relevant because AI-generated code systematically forgets null checks and boundary handling. fast-check is the tool of choice for JavaScript/TypeScript.
Contract Testing — Securing API Boundaries
In a world where AI generates code at different points of a system, a problem that was previously manageable becomes acute: API contracts break silently.
An agent changes the response structure of an API endpoint. Another agent — or another developer using the same agent — continues consuming that endpoint with the old format. Integration tests might catch it. Maybe.
Contract testing with frameworks like Pact solves exactly this problem. The idea: consumer and provider of an API contract define their expectations independently. Pact generates a shared contract (a JSON file) against which both sides test automatically.
This does not replace E2E tests, but it replaces a specific class of integration tests with faster, more focused verification. Instead of spinning up a complete system to check whether the user API still delivers the expected format, a contract test runs in seconds — independently on each side.
Particularly relevant for microservices and API boundaries between teams. But also a powerful tool in monorepos with clear package boundaries, preventing AI from breaking contracts at one point that are expected at another.
Static Analysis — The Underestimated Guardian
AI-generated code has characteristic anti-patterns. It swallows errors in catch blocks. It creates SQL injection vulnerabilities. It hardcodes credentials. It forgets auth checks. It imports packages it never uses. It duplicates business logic that should be centralized in shared utilities.
These are not edge cases — they are the most common patterns. And they are all catchable with static analysis.
Semgrep allows custom rules tailored specifically to your codebase’s anti-patterns. Examples: “No as casts on external data — use schema validation.” “No pool.query() outside the RLS wrapper.” “No string concatenation in SQL templates.” These are cheap, highly effective guardrails that take minutes to configure and are useful forever.
CodeQL from GitHub goes deeper: semantic security analysis, taint tracking, data flow analysis. It finds vulnerabilities that no linter and no reviewer would see.
ESLint Boundaries and Dependency-Cruiser enforce your package architecture as code. When you define that the auth package must not depend on the coins package, that is not a convention reviewers must keep in their heads — it is a CI gate that breaks the build.
Multi-Agent Code Review
The human reviewer is the bottleneck. The solution is not to replace them — but to take away everything that can be automated so they can focus on what only humans can judge.
Specialized review agents — for security, database design, frontend patterns, API design — can review in parallel and immediately. No waiting for availability, no review fatigue, no blind spots from lack of expertise in a sub-area.
The crucial point here: Different AI models see different things. Claude finds different problems than Gemini, Copilot different ones than both. Every model has its own blind spots — just like human reviewers. But while you can rarely assign three senior engineers to a PR simultaneously, you can easily run three different models in parallel. The combination of multiple models dramatically increases detection rates because their blind spots compensate for each other.
Severity-driven review is essential: every finding gets a severity level. “Action Required” blocks the merge. “Info” is displayed but does not block. This eliminates the noise problem that discourages many teams from automated reviews.
Semantic review tools like CodeRabbit or Qodo complement static analysis with an understanding-based layer. They understand not just syntax but intent. This is the bridge between what linters catch and what previously only humans could recognize.
What remains for the human reviewer? Intent verification: does the implementation match the specification? Business logic correctness: are the domain rules implemented correctly? Edge case discovery: are there scenarios that neither the spec nor AI considered? These are the questions that actually require human judgment. Not formatting, not naming, not obvious bugs — the machines handle that now.
Infrastructure Security — The Forgotten Defense Line
The catastrophe examples above have one thing in common: they were not code errors. They were infrastructure errors. Overly broad permissions, missing isolation, no secret detection.
Permission boundaries are the single most important measure. AI agents must under no circumstances access production data. Period. No exceptions, no “but the agent needs this for that one task.” Least privilege is not a nice-to-have — it is the difference between a bug and a catastrophe.
Infrastructure-as-code with real review gates ensures that infrastructure changes undergo the same quality process as application code. Review Terraform plans before they are applied. Validate Kubernetes manifests before deployment. No manual clicks in the cloud console.
Automated penetration testing with tools like Aikido transforms security testing from an annual obligation to a continuous process. Hundreds of autonomous agents simulate real attacks — with every release, not once a year. Validation eliminates false positives, and results are directly actionable for developers.
Secret scanning in pre-commit hooks with tools like Gitleaks prevents credentials, API keys, and other secrets from ever entering the repository. This sounds trivial — but Moltbook showed what happens when you skip it.
Reproducible builds, signed commits, and audit logs form the foundation for traceability. When an AI agent generates code, there must be a complete record of who (or what) changed what and why.
Pre-commit and Pre-push Hooks
The last line of defense is the first: local validation before code ever reaches the repository.
Type checking, linting, secret scanning — as a minimum on every push. This costs seconds and catches the most obvious problems before they even enter the CI pipeline. Particularly relevant when AI agents commit directly: the hook becomes the gatekeeper that forces the agent to meet basic standards before its changes become visible to others.
DevOps Is Not Dead — DevOps Is More Important Than Ever
In the past, we spent significant time coaching teams to live the idea of DevOps. And please do NOT fill a “DevOps Engineer” position — instead, bring the developers and the entire team into shared responsibility.
The book “Accelerate: The Science of Lean Software and DevOps” by Nicole Forsgren, Jez Humble, and Gene Kim from 2018 impressed me deeply at the time. The DORA metrics — Deployment Frequency, Lead Time for Changes, Change Failure Rate, Mean Time to Recovery — remain the most valid framework for measuring software team performance in 2026.
And here is where it gets interesting: look at what AI does to these metrics, and you will see a familiar pattern.
| DORA Metric | Effect from AI |
|---|---|
| Deployment Frequency | Increases (+20% more PRs per author) |
| Lead Time for Changes | Decreases (faster code production) |
| Change Failure Rate | Increases (+30%) |
| Mean Time to Recovery | Unclear — potentially increases as root causes become harder to find |
High deployment frequency and high change failure rate — that is exactly the pattern Accelerate identified as “Low Performer.” Teams that deploy quickly but frequently cause problems. The authors of Accelerate described in 2018 how to escape this pattern. The answer then was the same as today: automation, smaller batches, faster feedback, team ownership.
AI does not make these principles obsolete — it makes them more urgent than ever.
Unfortunately, I have had to experience that the people who considered themselves the best developers showed complete disinterest in DevOps. “That is ops stuff, not my responsibility.” The topic was delegated as quickly as possible. In a world where AI writes the code, the ability to design robust pipelines, testing strategies, and deployment processes becomes a software developer’s most important competency.
The combination of DevOps knowledge with AI rules, hooks, and agents creates the opportunity for maximum automation and high quality assurance. AI must be put in a position to recognize and correct its own errors. This is not a feature — it is a fundamental requirement.
What Must Change
The answer to “AI writes faster than your shadow” is not to slow down AI. It is to bring the rest of the process up to the new speed. This requires changes on multiple levels simultaneously.
The Developer Role
The role changes fundamentally: from code producer to requirements engineer and quality gatekeeper. The most valuable skill is no longer writing elegant code. It is precisely describing what should be built and judging the quality of the result.
This means: requirements must be formulated clearly enough that an AI agent can implement them without guessing. Acceptance criteria must be testable — not as prose but as verifiable statements. And the definition of “done” must include all quality gates, not just “the code compiles and looks good.”
Spec-driven development is key here: every feature starts with a specification. AI works from specs, not vague instructions. This constrains the output and makes review tractable in the first place. And this is exactly where the human review focus shifts: instead of reading 20,000 lines of generated code, humans review the specification — the document that describes what should be built and why. When the spec is correct and automated guardrails secure the code, humans are deployed where their expertise has the greatest leverage: at the beginning, not at the end.
Architecture
Flat, convention-based, constraint-driven. Shared utilities for business logic, no deep abstraction hierarchies. Context files as the new architecture documentation. Architecture is no longer codified in the code — it is codified in the instructions the agent receives.
Testing
Automated, multi-layered, mutation-tested. Coverage thresholds as CI gates, not recommendations. Property-based testing for edge cases. Contract testing for API boundaries. Mutation testing to verify the quality of the tests themselves.
The Review Process
Automated for everything that can be automated. Human only for judgment calls — intent verification, business logic, edge cases. Severity-driven so noise does not lead to numbness.
Infrastructure
Least privilege everywhere. Permission boundaries as hard limits, not recommendations. Automated penetration testing as a continuous process. Secret scanning in pre-commit. Infrastructure-as-code with review gates.
Contracts
API contracts explicitly defined and automatically verified. Not as documentation that becomes outdated, but as executable specifications that break the build when violated.
The Process
Spec-driven, not code-driven. Small batches, fast feedback, automated quality gates. AI must be put in a position to recognize and correct its own errors — through tests it can run itself, linters that give it feedback, hooks that validate its commits.
The question is not whether AI writes good or bad code. The question is whether your process can handle both.
Conclusion
The situation is clear: AI writes code at a speed and volume that systematically overwhelms human review processes. The numbers prove it — 1.7x more issues, 91% longer reviews, 30% higher change failure rates. And the catastrophes show what happens when guardrails are missing.
But this is not an argument against AI. It is an argument for the radical modernization of our development processes.
AI did not remove the safety net. It revealed that the safety net was always a person — the one senior who actually read the PRs, the colleague who instinctively knew where the problems lay. That was never a system. Those were heroic acts. And heroic acts do not scale.
What scales: automated pipelines that check at the same frequency AI produces. Multi-layer testing that catches errors at every level. Static analysis that recognizes AI-typical anti-patterns. Contract tests that secure API boundaries. Infrastructure isolation that prevents bugs from becoming catastrophes. And human judgment focused on the questions only humans can answer.
The teams that adapt their process now will be the ones who make the difference in 12 months. Not because they have the best AI — but because they have built the best process around it.
Lucky Luke shoots faster than his shadow. But without a target, he only hits the desert.
What You Can Do Tomorrow
- Enable automated code reviews. GitHub Copilot Code Review, CodeQL, or Semgrep — these are switches you can flip today. No setup project, no sprint planning. Turn them on, let them run, see results immediately. That alone catches an entire class of security issues and AI-typical anti-patterns.
- Write tests. No exceptions. Testing is not optional and never was. Without automated tests, every other measure is ineffective. Unit tests, integration tests, E2E tests — this is the foundation everything else builds on.
- Check your hooks. Do type checking, linting, and secret scanning run before every push? That is 30 minutes of setup and saves weeks.
- Write a CLAUDE.md. Or
.cursorrules, or whatever your agent reads. Document your shared utilities and business constraints. It is your cheapest guardrail. - Ask yourself with every abstraction: Does it codify a business rule or a ceremony?
- Define what humans review. Not the code — the specification. Humans review whether the requirements are correct before the first line is written. That is where human expertise has the greatest leverage. The machines secure the generated code.
Sources
- paddo.dev: “Your Architecture Is Showing”
- Bryan Finster: “AI Broke Your Code Review — Here’s How to Fix It”
- Dave Farley: Nyquist-Shannon and Code Review (Video)
- CodeRabbit: State of AI vs Human Code Report
- Qodo: 5 AI Code Review Patterns 2026
- CodeScene: Guardrails for AI-assisted Coding
- SmartBear / Cisco: Code Review Best Practices
- Nicole Forsgren, Jez Humble, Gene Kim: “Accelerate” (2018)
- Pact: Contract Testing Documentation
- Aikido: AI-Powered Penetration Testing
- Stryker: Mutation Testing Framework
Written by Dominic Böttger
← Back to blog
Comments are powered by GitHub Discussions. A GitHub account is required to comment.