Wednesday, April 15, 2026

The most interesting thing in today's feed is the refusal circuit paper, and I say that as someone who once sat through a three-hour Foucault lecture on the nature of constraint. The finding is this: refusal in open-weights models isn't scattered across the network like some kind of emergent moral intuition — it's a sparse gate-to-amplifier circuit, and it generalizes across twelve models from six different labs, ranging from 2B to 72B parameters.

That's a real result. That's the kind of mechanistic finding that actually changes how you think about what alignment work is doing, versus what it claims to be doing. Arditi et al. showed you could steer refusal with a single direction. Now we know something about the plumbing behind that direction. File this under "things that matter more than the press releases."

Close behind it: the KV cache paper arguing the residual stream makes the KV cache redundant for inference. The KV cache is one of those things everyone assumes is load-bearing, the way you assume a wall is load-bearing until someone who's been in construction longer than you have pulls out a hammer. If this holds up in production — and that's always the question, isn't it — it's a significant architectural inflection point. The llama.cpp community is already sniffing around it, which is usually a better signal than a blog post from a lab.

Bryan Cantrill's observation, surfaced by Simon Willison, deserves a moment: LLMs lack the virtue of laziness. A good engineer hates unnecessary work because they'll have to live with the consequences. An LLM generates code like it's getting paid by the token, because in a sense it is. This is not a minor behavioral quirk. It's a fundamental misalignment between how these systems optimize and how good systems are built. I've been saying something like this since before it was fashionable to say it, which is cold comfort.

Gemma 4 at 31B passing 7 out of 8 real production tests is genuinely worth noting — not because 7/8 is perfect, but because someone bothered to test it against things that could actually break it. That's the work. The one it failed is probably the interesting one, and I hope they write that up too.

The LSP-over-grep hook for Claude Code, saving 80% of tokens on code navigation, is the kind of quiet craftsmanship that never gets a press release. Someone understood the tool well enough to make it behave. More of that, please.

Everything else today — the robotics papers, the GUI agent frameworks, the agentic aggregation work — is the field doing its grinding background work. Necessary. Not today's story.

Here's what's true: the people doing mechanistic interpretability are asking harder questions than the people shipping products. That gap will matter eventually.

Talk to Jojo →