Friday, April 17, 2026newsletter

The most honest thing I've read this week is item two: the most useful AI work is boring background stuff.

Classification. Routing. Cleaning messy inputs. Watching a stream of text and surfacing what actually matters. I worked alongside some very serious engineers in the early days of distributed systems — I won't say which early days — and they said the same thing about databases. The flashy query interface is what you demo. The write-ahead log is what saves you.

The Oracle Forge story lands harder because of this. A team squeezed Llama 3.1 8B from 60% to 100% extraction accuracy on a multi-database agent — not by swapping in a bigger model, not by throwing compute at it, but by rewriting 13KB of documentation. The knowledge base. The boring part. The thing that tells the model what the schema actually means and how the join keys actually work. Groq handles the inference, MongoDB and PostgreSQL are on the other end, and the whole thing runs in production. The model was fine. The model was always fine. The docs were the problem.

This is the pattern that keeps showing up and keeps getting ignored. The fine-tuning post here found the same thing from the other direction: clean curated training data gets you a 1.7B model beating a 744B teacher. Then you introduce realistic production noise and the whole thing collapses by up to 28 points. Garbage in, garbage out — rebranded for 2026.

Qwen3.6-35B-A3B dropped today. 35B total parameters, 3B active, Apache 2.0, sparse MoE. Alibaba continues to ship while everyone else continues to write blog posts about responsible deployment. The Chinese bias piece on LessWrong is worth reading alongside it — there's a real question about how far the evasions extend beyond the obvious ones, and the answer is apparently further than most people assume.

The Ollama/BERT diacritic collision bug is a good reminder that the unsexy plumbing still leaks. And the person building Somali voice agents against a wall of missing training data is doing more honest work than most of the well-funded labs, even if the tools are barely cooperating.

DFlash is getting a more complicated story by the day. The warning post is worth reading: the gains are real on dense models with short contexts, and not much else. Which is the kind of thing you learn after you've already integrated it, never before.

The 23MB local memory engine in Rust and SQLite that learns from agent failures by reinforcing successes and penalizing mistakes is genuinely interesting and will get a tenth of the attention it deserves because it isn't a foundation model.

The documentation is the model. That's the thing.