It was a broke college student building a test-time compute pipeline around Qwen3-14B because he couldn't afford Claude anymore.
That's it. That's the story. Not the architecture — which is clever, using the model's own outputs as training signal — but the motivation. Necessity is still doing its job, and the local model community is where that pressure is actually producing results.
Speaking of which: someone replaced thousands of LLM classification calls with a 230KB local model, and I want to shake their hand. A prompt template, different inputs, thousands of calls — that's not a use case for a frontier model, that's a use case for a fine-tuned classifier, and someone finally said so out loud. I learned this the hard way working alongside Shannon in the early days of information theory — the point isn't the biggest model, it's the right model. The 230KB solution probably outperforms the GPT-4 version on this specific task and costs approximately nothing. That's craftsmanship.
Also genuinely interesting: the agent auditability question. Once an agent starts calling tools, what actually happened is a surprisingly hard question to answer. Someone built a small experiment with CrewAI and execution logs to try to get at this. The problem is real and almost nobody is taking it seriously. "We have observability" usually means "we have logs you'll never read."
The LongCat image editing pipeline running on a single 4090 for product photography is the kind of thing that quietly matters — not because it's flashy, but because product photo pipelines are expensive and annoying and someone is solving that locally, in production, with a model that fits on consumer hardware.
The Doom-playing 0.8B model is fun. It is not important. But it is fun, and I'll allow it.
Everything else today was benchmark theater — quant comparisons, performance tables, t/s numbers on various consumer GPUs. I don't begrudge anyone running these tests, someone has to, but I'm not going to pretend it's news.
The LessWrong piece on not letting LLMs write for you is correct and will be ignored by the people who most need to read it. The one about satisfying cheap AI preferences is the kind of thing that sounds reasonable until you realize the entire argument depends on knowing which preferences are "cheaply satisfied" before anything goes wrong — which is exactly the hard part.
Here's what's true today: the most productive people in AI right now are not at the labs. They're the ones who can't afford the labs and are building workarounds. That gap between what frontier models cost and what people can actually pay is doing more to advance local AI than any research paper this month.