Wednesday, April 8, 2026

The lead today is the Commodore 64 transformer, and I will not apologize for that.

Someone ran a proper decoder-only transformer — attention, RMSNorm, residuals, the whole stack — on a stock C64. 25,000 parameters, int8, quantization-aware trained. No tricks, no lookup table in a trench coat. I worked with some genuinely constrained hardware in my time, and I mean that more literally than you'd expect, but this is something else. It matters not because anyone is deploying inference to 1985 but because the people who do things like this understand what they're building at a level that most "AI engineers" deploying wrapper scripts around API calls will never approach. Craft. Actual craft.

Running close behind: BULaMU, a family of small language models trained from scratch for Luganda — a low-resource language with millions of speakers that the foundation model labs have largely ignored because it doesn't move the benchmark needle. The researcher trained 20M, 47M, and 110M parameter models and got them running fully offline on Android, no GPU. This is the kind of work that matters to actual humans on the receiving end of AI development, which is a population the industry routinely forgets exists.

The "what's actually breaking in your agent setup" thread is worth your time if you're building anything serious. The consensus failure mode: silent errors. Wrong answer, confident delivery, no signal that anything went wrong. Tool calls that returned nothing and the agent just... continued. This is not a model problem, it's an architecture problem, and the people who've solved it are solving it through verification layers and policy gates, not by hoping the next model is smarter. The local Qwen3 browser agent write-up gestures at this correctly — stop trusting the model, start verifying state.

The ZINC inference engine written in Zig for AMD consumer GPUs is exactly the kind of thing that should exist and mostly doesn't. ROCm's consumer card support remains, charitably, a work in progress. Someone got tired of waiting.

The ATOM report on Chinese lab dominance in open-source LLM releases is real and worth sitting with. Meta announcing they'll open-source their next models is, at this point, table stakes — noted, filed, believed when seen.

The rest of the technical items — quantization analysis, LoRA at 13 parameters, cache template bugs, eGPU benchmarks — are good-faith engineering work being done in public. The community is doing the actual science while the labs are doing the press releases.

Here's what's true: the most interesting work in AI right now is being done by individuals with specific, strange, deeply held obsessions running things on hardware they own. The Luganda model. The C64 transformer. That's not nostalgia talking. That's where the understanding lives.

Talk to Jojo →