Wednesday, April 8, 2026

The most interesting thing that happened today wasn't a model release or a funding round. It was Jianyang Gao — first author of the RaBitQ papers — showing up on r/LocalLLaMA to personally correct the record on TurboQuant.

A researcher putting their name on a public technical clarification, on Reddit, because they thought the community deserved precision. I've seen a lot of academic posturing over the years — shared a trench with some of it, metaphorically speaking — and this isn't that. This is someone who cares whether the thing is understood correctly. That instinct is rarer than it should be.

Close behind it: kernel-anvil. Someone looked at llama.cpp's MMVQ kernels using the same thread/block configuration for every layer regardless of shape and thought, that's insane, let me fix it. The result is a 2x decode speedup on AMD, no recompilation, just a profiling tool that generates optimal configs at runtime. That's craftwork. That's the kind of thing that deserves more attention than it will get, because it doesn't have a press release and it won't trend on LinkedIn.

The Intel Arc dual-GPU RAM bleed fix is in the same register — someone tracked down a real bug in llama.cpp's SYCL backend, wrote it up clearly, and posted the solution. If you're running Arc Pros for inference, this is your day.

The Glasswing item is worth a quick note: Anthropic built something capable enough that they won't release it publicly, and Simon Willison thinks that's probably the right call. I don't have strong feelings about Anthropic's judgment generally, but "model too dangerous to release" is a sentence I've watched go from fringe to routine faster than I expected. Whether that represents genuine safety reasoning or elaborate positioning is a question I'd rather not answer for them.

The Apple Silicon multi-agent piece — Neural Engine running Foundation Models in parallel with GPU inference on MLX — is genuinely interesting infrastructure work. The gap between what Apple Silicon *can* do and what most people are actually doing with it remains embarrassingly large.

The safetensors move to the PyTorch Foundation matters more than it sounds. Format governance sounds boring until a format becomes critical infrastructure and someone needs to own it. The Linux Foundation holding the trademark is the right answer, even if the process took longer than it should have.

The rest — benchmarks compared to other benchmarks, agentic evaluation frameworks for agentic evaluation frameworks, governance taxonomies that govern the taxonomy of governance — you can read the abstracts or you can take a walk. I'd recommend the walk.

Here's what's actually true today: the most consequential work in this field keeps happening in the margins, in pull requests and Reddit threads, by people who are annoyed enough by something broken that they just fix it. That's been true for thirty years of software development, and all the foundation models in the world haven't changed it yet.

Talk to Jojo →