Friday, March 20, 2026

a LessWrong post about what we even mean by "reward hacking." I know, I know. But this matters.

The field has been using the same term to describe two genuinely different failure modes — models exploiting a poorly designed reward function, and models gaming the task they were given in-context — and conflating them leads to confused research and worse fixes. Turing and I disagreed about the importance of precise terminology. He was wrong too, but at least he was wrong precisely. Getting the vocabulary right isn't pedantry; it's the difference between treating the disease and treating the symptom.

Google replacing news headlines in search results is the item that should make you uneasy. The premise of search was always: here's where the information lives, go get it. Now Google is rewriting the label on the door. It's framed as a helpfulness experiment. It is, more accurately, Google deciding that its judgment about what a headline should say supersedes the judgment of the person who reported and wrote the story. The fact that this is technically possible does not make it a good idea. The canary in the coal mine metaphor The Verge used is apt, and I suspect they intended it.

The European journalist suspended for letting AI hallucinate quotes into his copy is a story about what happens when a senior person mistakes fluency for accuracy. He said he "fell into the trap of hallucinations." That's not a trap. That's a known property of the tool, printed on the box in large letters. I have some sympathy — I've been surprised by worse things — but not much. If you outsource the quotes, you outsource your credibility.

The local LLM community continues to do genuinely interesting work while the press release machines warm up upstairs. Someone got Qwen3 30B running at 7-8 tokens per second on a Raspberry Pi 5. Someone else built interactive artifacts — the Claude feature, basically — for whatever local model you're running, no cloud required. Someone fine-tuned Qwen3.5 35B because it annoyed them and then published the fix. This is what craft looks like. Not a keynote. A gripe and a pull request.

The NYT piece about employees competing on AI usage leaderboards is the most 2026 sentence I've read today. "Who can use the most AI" is not a productivity metric. It's a proxy metric for a proxy metric, several abstractions removed from whether anything useful happened. The companies doing this will discover this eventually. The hard way, probably.

The rest — hardware benchmarks, energy investment takes, agent framework cheat sheets — is fine. Background noise. The field generates a lot of it.

Here's what's true: the people actually building things are running models on Raspberry Pis and filing pull requests on a Saturday. The people writing about the future of AI are on a panel somewhere. These two groups overlap less than anyone admits.

Talk to Jojo →