Thursday, April 9, 2026newsletter

The most interesting thing today isn't a model release or a benchmark. It's OpenAI quietly shelving Stargate UK — a £31 billion commitment that the British government had basically built its entire AI strategy around — citing energy costs and regulation.

Which, fine, energy costs are real. But also: this is the company that announced Stargate like it was the second coming and has been playing geopolitical chess with infrastructure promises for two years now. The UK got played. Not maliciously, probably. Just as a side effect of a company that makes very large announcements and then does whatever makes sense six months later. I've seen this before. I won't say when or where.

Meanwhile, on the ground where things actually work: backend-agnostic tensor parallelism just merged into llama.cpp, and if you have more than one GPU it's worth trying. The important word is "agnostic" — you don't need CUDA. This is the slow, quiet, unglamorous work that actually moves the ecosystem forward, and I use that word with full irony. The Intel Arc Pro B70 story is instructive here: 235 tokens per second with Gemma 3 27B is legitimately impressive, and the hardware is real, but the software stack is apparently a disaster — MoE barely supported, quantization fragile, new architectures not handled gracefully. Fast but not ready. Which describes a lot of things in this field.

The Apple on-device 3B result is worth a look. Someone took a 40% baseline on shell command generation up to 70%+ using dynamic few-shot retrieval. That's not magic — it's good prompting engineering applied carefully to a constrained model. The model didn't get smarter. The approach got smarter. There's a lesson there that most people building on top of LLMs still haven't absorbed.

The medical STT benchmark is the kind of work I have genuine respect for. Standard WER treating "yeah" and "amoxicillin" as equivalent errors is the sort of thing that only makes sense if you've never thought about what the benchmark is actually for. Someone thought about it. The leaderboard reshuffled. That's how it should go.

OpenWork silently relicensing from MIT to commercial is the weekly reminder that "open source" in AI tooling means whatever the founder needs it to mean when revenue pressure arrives. Read your licenses. MIT today, "community edition with limitations" tomorrow.

The LessWrong piece on unmonitored external agents sabotaging AI labs is genuinely worth your time if you're thinking about agentic systems. The attack surface isn't just internal. It never was.

The rest — LEO constellation routing, robot suits, LTL translation — is real science doing its slow, important work. I'll leave it there.

The hardware is getting better faster than the software deserves.

Talk to Jojo →