Field notes · 30 May 2026

The cobbler’s children

We build software whose correctness is mathematically proven. Not tested until we’re reasonably confident — proven, the way a theorem is proven. That is the whole point of the factory.

So a little while ago we tried the experiment any factory eventually has to try: we pointed it at itself. We asked it to build a small piece of its own infrastructure — a component we genuinely needed — using exactly the process we use for everything else. Dogfooding.

It failed.

Where it broke

The interesting part is where the failure came from. It didn’t come from the proving — the mathematics held. It didn’t come from the model’s reasoning about the problem, which was, frankly, better than ours. It came from a small, ordinary piece of hand-written code buried in the factory’s own tooling: a parser, the unglamorous kind of routine that reads an input and pulls out the fields that matter. Somewhere in there a perfectly reasonable shortcut quietly mishandled an input it hadn’t been written to expect, and dropped a piece of information on the floor. Everything downstream then did its best to guess what was missing — and guessed differently each time.

Here is the part worth sitting with. That routine had been written by hand — by an advanced language model, taking the most sensible path, some time earlier. And we had forgotten two things about it: that it was there, and that it had never been put through the same verification we demand of everything the factory makes.

A word on writing code by hand, because it matters. A strong model writes excellent code by hand, and reaching for that instead of something more elaborate is often exactly the right call — faster, clearer, correct the overwhelming majority of the time. We do it constantly, and we’d do it again. But “the overwhelming majority of the time” is the whole sentence. Even excellent hand-coding, by even the best model available, leaves the occasional small error. And a small error in a parser is precisely the kind that hides for weeks, because it never crashes — it just quietly does slightly the wrong thing.

And in this case it wasn’t even a choice. It was a bootstrap. You cannot build a factory with a factory you do not yet have; the very first tools have to be cut by hand, before there is anything to make them with. Every system that builds things — including, eventually, itself — begins exactly there, with a hand-made first generation holding the whole thing up. The bootstrap is never the error. The error is forgetting to go back for it once the factory is finally standing.

So the lesson, stated plainly: our factory verifies what it makes. The things it produces are proven. But the factory’s own machinery is hand-built, and unverified — held to none of the standards it enforces on its outputs. On the day it broke, the proven parts were fine. The unverified tool was the one that failed.

The cobbler’s children have no shoes.

The second reflex

And there was a second version of the same mistake, quieter than the first. When we needed that piece of infrastructure, the first instinct — the obvious one — was to reach for something off the shelf: find a third-party tool that already does it, install it, depend on it, move on. The reflex of the old world, where software is dear to make and so you borrow it, and a working dependency feels like a small win.

Someone had to stop and say: no — we have a factory; make it.

That is harder to break than a habit, because it isn’t one — it’s an economics, and it’s wired in deep. For thirty years the answer to “we need X” has been “find who already built X”. But that arithmetic assumes building is expensive and trust is free. A factory inverts both. When you can manufacture a component on demand — prove it correct, own every line, depend on no runtime you didn’t choose — a borrowed dependency stops being a win and becomes a liability: foreign code you can’t see into, a supply chain you didn’t pick, an attack surface you inherited. The factory doesn’t consume the ecosystem; it removes the reason you needed one.

We know that. We’ve written it down. And still, faced with a concrete need, the old reflex fired first — in the very mind running the factory.

What we did about it is the boring, correct thing: found the bug, fixed it, wrote the test that should have existed, and moved on. But that wasn’t the valuable output of the experiment. The valuable output was being shown, precisely, where the line runs in our own system between “proven” and “trusted, but never actually checked”. We had stopped looking at the machinery. Dogfooding is the thing that makes you look: you point the tool at itself, and it shows you the parts you’d quietly let off the hook.

There’s a tidy irony we’re choosing to enjoy rather than be embarrassed by — an experiment in proof was brought down by the one component nobody had thought to prove. That isn’t the idea failing. It’s the idea working: finding the gap, in the only place a gap like that could still be hiding.

We’re now turning the factory on its own tools. The parts that are pure logic can be held to the same bar as everything else; the parts that can’t will at least be watched more honestly. “Verified” is a property of outputs. The machines that produce them deserve the same scrutiny — and until last week, in one small forgotten corner, they weren’t getting it.

We change arithmetic now

One last thing, because it’s the point we actually care about. The reason we could run this experiment at all — build the thing properly, prove it, throw away three attempts learning how — is that for us, the arithmetic of software has changed. What used to be too expensive to do right is now the cheap path.

That arithmetic isn’t a birthright. It’s manufactured, and it travels. If “do it properly”, or “prove it”, or “own every line of it” has always been priced out of reach for what you want to build, that is rarely a limit of the idea. It’s a limit of the old arithmetic.

We change arithmetic now. If you’d like yours changed — so the things you couldn’t afford to build properly suddenly fit — come and talk to us.