The Productivity Paradox
The industry is celebrating a productivity revolution. The data tells a different story.
In a rigorous study published by METR — a research organisation focused on evaluating AI capabilities — experienced open-source developers were 19% slower when using AI coding tools. This was not a marginal finding buried in a footnote. The developers themselves predicted they would be 24% faster. The gap between perception and reality was 43 percentage points.
That single data point should stop every technology leader who has staked their roadmap on AI-assisted development velocity. It does not mean AI tools are useless. It means the dominant pattern of AI-assisted development — what the industry has come to call “vibe coding” — is producing measurably worse outcomes than the practice it claims to replace.
The Evidence Is Not Ambiguous
The METR study is not an outlier. Over the past twelve months, independent research teams have converged on the same conclusion from different angles.
Stack Overflow, the platform that has observed developer behaviour at scale for two decades, published a post in January 2026 titled “A new worst coder has entered the chat: vibe coding without code knowledge.” The editorial position of an organisation that has historically championed developer tooling innovation is not trivially dismissed.
And a research paper published in January 2026, “Vibe Coding Kills Open Source,” documented that increased vibe coding correlates with reduced engagement between developers and open-source maintainers. The act of generating code without understanding it erodes the community knowledge transfer that open source depends on.
The LOSS Curve
These findings are not independent phenomena. They are symptoms of a single underlying dynamic: the LOSS curve.
LOSS — Lines Of Stochastic Source — describes what happens when statistical inference substitutes for domain knowledge in code generation. I introduced the diagnostic framework in a previous article, anchored on a specific incident where an AI shortened standard GL Account Code Types from CL to C — a change that was structurally sound, passed every test, and would have broken every downstream financial report.
The LOSS curve describes the relationship between context precision and output quality in AI code generation. It is not a gentle slope. It is a threshold function — what engineers call a Schmitt trigger.
Above the precision threshold: the developer provides tight scope, explicit domain constraints, and specific institutional knowledge. The AI operates in a high-yield regime. Output quality can exceed what either human or machine would produce alone.
Below the precision threshold: the developer provides vague intent, broad scope, and relies on the AI’s training data to fill domain gaps. The AI crosses into a catastrophic negative-yield regime. Every line of output compounds inference on inference.
There is almost nothing between these two states. You are either getting extraordinary leverage or you are manufacturing defects at machine speed.
This is why the METR developers were 19% slower. They were not failing at every task. They were fast on the tasks that fell above the threshold and catastrophically slow on the tasks that fell below it — spending their “productivity gains” on debugging AI-generated code that looked right but was semantically wrong. The net effect was negative.
The Market Is Responding — To the Symptom
The investment community has noticed — though it has, characteristically, responded to the symptom rather than the cause.
These investments are rational. But they address the LOSS curve at the wrong point. Verification after generation is inherently more expensive than precision before generation. You are building two systems: one to produce code and one to determine whether that code should exist. The economics of this approach deteriorate as codebases grow — every new AI-generated line creates verification obligations for every line it interacts with.
When you need a $70 million company to verify the output of your $50 billion industry, something structural has gone wrong.
The Specification Question
There is an older idea in computing — one that predates the current AI acceleration by decades — that the specification is more valuable than the implementation.
The logic is straightforward. Code depreciates. It accumulates technical debt, it drifts from requirements, it becomes the property of whoever last modified it. A specification, by contrast, appreciates. Every refinement of a constraint, every clarification of an invariant, every enrichment of a business rule makes the specification more precise, more valuable, more durable.
The LOSS curve makes this asymmetry urgent. If AI can generate code at zero marginal cost but cannot guarantee that the code preserves domain meaning, then the code is not the valuable artifact. The domain knowledge is the valuable artifact. The specification that captures that knowledge is the durable asset. The code is disposable output — regeneratable at any time from a sufficiently precise specification.
The industry is generating Lines Of Stochastic Source at unprecedented speed. The evidence suggests the returns are negative at scale. The verification market is a $70 million acknowledgment that generation alone is insufficient.
The question the data is asking — the one the verification market is answering indirectly and expensively — is simpler than it appears:
If the code depreciates and the specification appreciates, why are we still treating the code as the artifact?