The General Ledger Incident
Last month, we asked an AI to refactor a financial codebase. Thousands of lines. Clean architecture. Elegant separation of concerns. Every unit test green.
One problem. It shortened the GL Account Code Types.
The standard code CL — Current Liability — became C. Throughout the entire codebase. Every lookup table, every unit test, every mapping. The AI inferred it was optimising: shorter codes, less memory, same meaning. Right reasoning. Wrong conclusion. No accountant would recognise C as a Current Liability code. Our spec was weak, and the AI filled the gap with inference.
Every test passed. The first accountant who opened the books would have caught it in three seconds.
This is what I call LOSS — Lines Of Stochastic Source — and it may be the defining challenge of AI-assisted software engineering.
What LOSS Actually Is
LOSS is not a metaphor. It is a diagnostic label for a specific failure mode.
When an AI generates code, it is performing statistical inference over patterns learned from vast corpora. It does not “know” your business rules, your regulatory obligations, your domain conventions, or your team’s institutional memory. It infers what is probably correct from what it has probably seen.
That inference gap — the distance between what the AI produces and what your domain actually requires — is LOSS.
The GL incident is a textbook case. The AI understood the structure of account code types (short alphabetic identifiers classifying account categories). It understood the purpose (classification, aggregation, reporting). It even had a plausible rationale for the change — single-character codes are more compact, reduce lookup overhead. What it could not do was distinguish between CL (a standard code that every auditor, every reporting tool, and every regulatory submission expects) and C (a code that exists nowhere outside this AI’s inference). The structure was preserved. The meaning was not.
The AI Knows Everything About Everything. It Focuses on Nothing.
Here is the uncomfortable asymmetry at the heart of AI-assisted development.
Your AI partner has consumed the internet’s entire knowledge of accounting, architecture, medicine, law, logistics, compliance, and a thousand other domains. It can generate plausible output in any of them. But it has no intrinsic focus. It cannot distinguish between what matters and what merely exists in its training data.
Focus is a human contribution. The human mind must transfer its domain context — the hard-won understanding of what is actually important in this specific situation — into the AI’s context window. And most teams are terrible at this.
They dump a vague requirement into a prompt. They marvel at the volume of code that comes back. They run the tests. Green. Ship it.
Volume is not value. Volume is LOSS.
The LOSS Accumulation Curve
LOSS does not accumulate gently. It behaves more like a Schmitt trigger — a threshold function with two stable states and almost nothing in between.
Below the threshold: precise context, tight scope, a human operator who knows exactly what the AI needs to know. The AI operates in a high-yield regime. It can produce a hundred lines of genuinely brilliant code. It can refactor with surgical precision. It can spot patterns a human would miss.
Above the threshold: vague context, broad scope, an overloaded context window full of tangentially relevant information. The AI crosses into a catastrophic negative-yield regime where every additional line of output compounds inference on top of inference.
There is no gentle slope between these states. You are either getting extraordinary leverage or you are manufacturing defects at machine speed.
The variables that push you across the threshold:
- Less focus → more LOSS. A prompt that says “refactor the accounting module” invites the AI to make thousands of micro-decisions about domain conventions it does not understand.
- More lines → more LOSS. Each generated line is an inference. A thousand lines is a thousand inferences, each building on the probabilistic foundation of those before it. Errors compound.
- Fuller context window → more LOSS. Counter-intuitively, giving the AI more context often degrades output quality. A context window stuffed with tangential information dilutes the signal the AI needs. It is reading the whole map when you needed it to read one street.
Don’t Blame the AI
This is not an anti-AI argument. The AI is extraordinary — genuinely, measurably extraordinary. The problem is how we deploy it.
The dominant pattern in AI-assisted development today is what I would call broadcast prompting: a human describes a goal in natural language, the AI generates a large volume of code, and the human reviews the output (or, more commonly, runs the tests and calls it done).
This pattern works when the domain is generic — standard CRUD operations, well-documented API integrations, boilerplate that follows established conventions. In these cases, the AI’s statistical inference aligns with what you actually need, because what you need is what everyone needs.
The pattern fails catastrophically when the domain is specific. When a GL account code type is not just “a classification label” but a code that connects to your chart of accounts, your reporting hierarchy, your audit trail, your tax obligations, and the muscle memory of every accountant in your organisation. The AI cannot infer this. No amount of training data contains your specific institutional knowledge.
The AI can only have laser focus if the human points it at precisely the right location on the map of knowledge.
That act of pointing — of transferring context — is the single most important skill in AI-assisted engineering. And it is almost entirely absent from the conversation about how to use AI effectively.
The Artisan Exception
Not everyone is accumulating LOSS.
There is a class of engineer — you know them, you may be one — who were already meticulous before AI arrived. They understood that software is a knowledge transfer problem, not a typing problem. They specify before they build. They model before they code. They know their domain deeply enough to know where the AI will stumble.
These artisans use AI the way a master craftsperson uses a precision tool: pointed at exactly the right spot, with exactly the right context, for exactly the right scope. They give the AI a narrow, well-specified task. They review the output against domain knowledge, not just against tests. They treat the AI as a collaborator that needs to be taught, not an oracle that needs to be prompted.
For these practitioners, AI is a genuine force multiplier. They are producing better work, faster, with fewer defects than they could alone.
For everyone else — the teams throwing traditional development practices at AI-generated code, reviewing volume instead of meaning, trusting test suites to catch domain violations they were never designed to detect — you are accumulating LOSS at a rate your infrastructure will never reveal until a human looks at the output.
The Test Suite Doesn’t Know Your Business
This is perhaps the most dangerous aspect of LOSS: it is invisible to automated quality gates.
The GL Account Code Types passed every test because the tests verified structure, not meaning. The tests confirmed that accounts mapped to a code type, that the type was a valid string, that the classification logic worked correctly. No test asked: “Is C a standard accounting code type that an auditor would recognise?” No test checked whether the AI’s optimisation broke every downstream report, every regulatory submission, every integration partner expecting CL.
Unit tests verify internal consistency. They do not verify domain fidelity. A codebase can be internally consistent — every function does what its name says, every type checks, every assertion passes — and still be semantically wrong in ways that only a domain expert would catch.
LOSS lives in the gap between what your tests verify and what your business requires.
So What Do You Do?
The answer is not to stop using AI. The answer is to stop pretending that prompting is engineering.
- Transfer context, don’t broadcast requirements. The quality of AI output is a direct function of the specificity of the context you provide. “Refactor the accounting module” is a LOSS generator. “Refactor the GL mapping layer; here are the standard account code types —
CLfor Current Liability,CAfor Current Asset — do not modify these codes” is a precision instrument. - Constrain scope ruthlessly. The Schmitt trigger is real. Stay below the threshold. Smaller tasks, tighter boundaries, explicit domain constraints. A hundred focused AI interactions will produce better results than one ambitious generation.
- Review for meaning, not just correctness. Your test suite catches structural defects. You catch domain defects. If you are not reading AI-generated code with domain-expert eyes, you are shipping LOSS.
- Treat the AI as a brilliant new hire. It knows the textbook. It does not know your organisation. You would not let a brilliant new graduate refactor your chart of accounts without supervision. Extend the same discipline to your AI.
The Line Between Leverage and Liability
AI-assisted development is not a spectrum. It is a threshold.
Below the line: an artisan with a precision tool, producing work that exceeds what either human or AI could achieve alone.
Above the line: a team manufacturing Lines Of Stochastic Source at industrial scale, invisible to their test infrastructure, accumulating semantic debt that will surface at the worst possible moment — when an accountant opens the books, when an auditor traces a transaction, when a regulator asks why the numbers do not match.
The question is not whether you are using AI. Everyone is using AI.
The question is whether you are at a LOSS.