• Software Letters
  • Posts
  • The Review Queue Nobody Warned You AboutAI coding assistants made writing code 60% faster. They also made your pull request queue 5x slower to process.

The Review Queue Nobody Warned You AboutAI coding assistants made writing code 60% faster. They also made your pull request queue 5x slower to process.

AI coding assistants made writing code 60% faster. They also made your pull request queue 5x slower to process. Here is what the numbers actually say - and what to do about the mismatch.

Everyone celebrated when AI made code generation fast. Output per engineer up 60% year over year. Task throughput up 33.7%. PRs merged per developer up 16.2%. The dashboards looked great.

Nobody measured what happened to the people on the other side of those pull requests.

Faros AI just published their 2026 Engineering Report, drawn from two years of telemetry across 22,000 developers and 4,000+ teams. The headline productivity numbers are real. So is the quietly catastrophic footnote buried a few sections down: median time a PR spends in review is up 441.5%. Time to first review is up 156.6%. Average time spent in code review across all reviewers is up 199.6%.

The engineers writing code got faster. The engineers reading it did not.

The asymmetry nobody modeled

There is a useful way to think about an engineering team as a system: code generation is the producer, code review is the consumer. For the system to work, consumption has to keep pace with production. When you accelerate one without changing the other, you get queue buildup.

This is not a metaphor. It is what is actually happening.

When a developer using an AI assistant generates code 60% faster, they produce substantially more pull requests per unit time than before. Those PRs still need to be reviewed by humans - humans whose reading and judgment speed has not changed. The backlog grows. Review latency rises. And because nobody redesigned the review process when they adopted the AI writing tools, the bottleneck just quietly shifted from "writing" to "waiting."

The Faros data makes this concrete. Teams in the top quartile of AI adoption are merging 98% more pull requests per developer compared to low-adoption teams. Those same teams have review times that are 91% longer. The math is uncomfortable: twice the output volume, almost twice the time to get each unit through the gate. You are generating faster and shipping at roughly the same pace - or slower, if your reviewers burn out.

Code churn - lines deleted relative to lines added across a quarter - is up 861% under high AI adoption. That is not a typo. The working hypothesis from the Faros team is that developers, generating code quickly, are also throwing away and regenerating more quickly. More attempts, more revisions, more PRs that represent work-in-progress rather than considered output. Each one lands in a reviewer's queue anyway.

Why AI code is harder to review, not easier

The volume problem would be bad enough on its own. The quality dimension makes it worse.

CodeRabbit analyzed 470 open-source pull requests to compare AI-co-authored PRs against human-only PRs. AI-assisted PRs contain approximately 1.7 times more issues overall. Broken down: logic and correctness issues are 75% more common. Security vulnerabilities are up to 2.74 times more frequent. Readability issues are 3 times more common. Error handling gaps are nearly 2 times higher.

That 1.7x multiplier is worth sitting with. It does not mean AI code is bad. It means AI code arrives with a systematically different defect profile than human code - one that requires more attention, not less, to catch before merge.

Sonar's 2025 State of Code survey puts numbers on the trust side: 96% of developers say they do not fully trust AI-generated code to be functionally correct. Yet 42% of the code those same developers commit is AI-assisted. The gap between trust and usage has closed almost entirely in the wrong direction - not by developers becoming more confident in AI output, but by review standards quietly dropping under the weight of volume.

The Faros report captures the consequence directly: PRs merged without any review - human or automated - are up 31.3%. The most likely explanation is not process laziness. It is that reviewers cannot keep up with the volume of AI-generated code arriving for their attention, so some of it starts sliding through unchecked. Code is reaching production systems without oversight at a meaningfully higher rate than before high AI adoption.

The asymmetry in what "review" means now

Before AI coding assistants, code review was mostly a correctness and style pass. The author wrote the code, the reviewer verified it was doing what it claimed, flagged edge cases, suggested better variable names. The cognitive load was real but bounded. You were evaluating one person's intent expressed in code.

AI-assisted code review is a different job.

Addy Osmani put it clearly: you are no longer primarily validating correctness. You are judging necessity. Does this abstraction earn its weight? Is this defensive check worth the complexity it adds? Would the team want to maintain this pattern six months from now? AI code often looks plausible at a glance - clean structure, reasonable naming, passing tests - but hides subtle mismatches between what the prompt asked for and what the code actually does. A reviewer skimming for obvious bugs will miss the kind of issue AI code reliably generates: logic that is locally coherent but globally wrong.

The 2025 DORA report describes AI as an amplifier rather than a universal accelerator. Teams with strong engineering practices - good test coverage, working in small batches, clear ownership of components - convert AI productivity gains into actual delivery improvements. Teams with fragmented processes or weak review cultures see AI accelerate their technical debt creation. The DORA authors identified seven capabilities that determine whether AI helps or hurts: a clear AI policy, strong version control hygiene, working in small batches, and a healthy internal platform among them. Most of those capabilities live in the review process, not the generation process.

What actually needs to change

The teams handling high AI adoption well are not reviewing more code faster through willpower. They have made structural changes to how code moves through their systems.

Shrink the unit of review. The most effective lever is PR size. AI makes it tempting to generate large changes - it is nearly effortless to ask for a complete feature implementation. The result is pull requests that represent hundreds of lines of unfamiliar context, which reviewers slow down on or skip past. Bottom-quartile AI teams in the Faros data take more than 35 hours to merge PRs on average. Top-performing teams complete merges in under 21 hours. The difference is not reviewer skill - it is PR discipline. Enforce PR size limits even when the AI can generate more. Small batches are the highest leverage constraint you can impose.

Add context as a first-class artifact. AI-generated code arrives without the reasoning that produced it. The reviewer sees the output but not the prompt, the iteration history, or the trade-offs that were considered and rejected. The fix is simple and under-used: include a brief context block in every PR description. This shifts review from "what does this code do?" to "does this code do the right thing?" - the question that actually catches AI failure modes.

Use AI review tools as a triage layer, not a replacement. CodeRabbit, Greptile, and Claude Code Review are not here to replace human reviewers. They are useful for catching the high-volume, low-judgment issues - formatting, obvious security patterns, missing error handling - that should not be consuming senior engineer attention. Treating them as a full substitute for human review is how you get the 31.3% no-review merge rate.

Measure the right thing. Most engineering teams adopted AI coding tools and kept measuring what they measured before: tickets closed, PRs merged, story points delivered. The thing to now measure is review health: time in review, reviewer load distribution, no-review merge rate, post-merge defect rate by PR origin. If you are not measuring these, you cannot tell whether your AI adoption is creating the acceleration whiplash or avoiding it.

The structural lesson

The AI writing assistant story has been told as a pure productivity win. More code, faster. It is not wrong - the generation speed gains are real. But software delivery is a system, and systems do not improve when you optimize one part and ignore the constraint it shifts to.

The 441% review time increase is not a user error or a maturity problem that will resolve itself as developers get better at prompting. It is a predictable consequence of accelerating production without redesigning consumption. Every team that has adopted AI coding tools and has not deliberately changed how code gets reviewed is running this experiment. Most of them do not know their results yet.

The data is now available to learn from. Whether you use it is a process decision, not a technology one.