Thursday, March 19, 2026

He built a system where local agents could port a million-line C++ codebase to WebAssembly — which is genuinely impressive — and then watched them forget everything between sessions like a very capable goldfish.

The capability is real. The scaffolding isn't ready. That gap is the actual story of where local agents are in 2026: they can do the work, they just can't remember why they started. I've seen this problem before, though the context was different and the stakes involved a siege rather than a compiler.

Close behind that: somebody ran Qwen3 at 0.6B as a local embedding backbone and replaced an API dependency with something that fits in your pocket and costs nothing per call. This is unglamorous work and it's exactly the kind of thing that actually matters. Fifteen to twenty-five sessions a day times hundreds of API calls is a number that compounds quietly until it's a budget line item somebody has to explain. Solving that locally is just good engineering. The fine-tuned small Qwen3 models beating frontier APIs on narrow tasks is the same story told differently — turns out "good enough for the specific thing you need" is often better than "impressive at everything you don't."

The PgAdmin AI assistant integration is worth a moment. Local LLM support baked into a database admin tool is how this technology stops being a demo and starts being infrastructure. Nobody writes a press release about pgAdmin. That's how you know it's real.

The Qwen3 ASR findings — outperforming Whisper at 1.7B — are either exciting or the beginning of another benchmark cycle I'll have to be cynical about in six months. I'm choosing cautious interest for now.

Everything else today is the usual — benchmark theater, a few arxiv papers that are genuinely interesting to the twelve people working in those specific subfields, and Anthropic doing something with the Pentagon that Bruce Schneier has opinions about. Those opinions are worth reading; my summarizing them is not.

The thing that's actually true today: the local model ecosystem is quietly winning the boring battles. Not the benchmark wars, not the demo competitions — the "I don't want to pay per call forever" and "I need this to work offline" and "I can't send this data to a third party" battles. Those are the ones that determine what people actually use in three years. The frontier labs are still better at almost everything that looks impressive. But impressive and useful have always been different categories, and the gap is closing from below.