Friday, March 27, 2026

The LiteLLM supply chain attack is the story today, and it deserves the lead.

Someone slipped malware into a package that a lot of production AI infrastructure depends on, and Callum McMahon's minute-by-minute account of catching and reporting it is the kind of thing that should be required reading for anyone who has ever typed `pip install` without thinking too hard about it. Which is everyone. The dependency chain in AI tooling is long, mostly unmaintained in the middle, and trusted by default — a combination that ends badly on a long enough timeline. This one got caught. The next one might not announce itself so clearly.

The alignment post asking "are we aligning the model or just its mask" is the most honest question anyone has asked in this space in a while. The theory — that pre-training teaches LLMs to simulate a range of characters, and RLHF just promotes one of them to the front desk — is not new, but the framing is sharp. If the "Assistant" is a costume rather than a character, then safety training is window dressing. I have heard more sophisticated versions of this argument over drinks with people who work on this stuff, going back further than is strictly plausible, and the field still hasn't answered it satisfactorily.

Mistral dropped Voxtral, a 3B TTS model with open weights that they claim beats ElevenLabs in human preference tests. Ninety milliseconds to first audio, 3GB RAM, nine languages. Benchmark theater asterisk applies — "human preference tests" curated by the company releasing the model is not exactly the Cochrane review of audio quality — but if it's even close, running capable TTS locally without a per-character toll booth is genuinely useful. The weights being free is the actual news.

The homelab consolidation post — one person, one 122B MoE, too much benchmarking — is the local LLM subreddit doing what it does best: actual empirical work by someone who has skin in the game and a Proxmox setup. More signal per word than most conference papers.

The helium shortage threatening chip production due to the Iran war is the kind of story that arrives wearing a business section headline but contains a real risk. A third of global helium supply offline is not a rounding error. Chip fabs need it for cooling and purging, and there is no substitute. The AI infrastructure buildout is premised on a supply chain that turns out to have geopolitical single points of failure hiding in the noble gases. Someone will write a brilliant post-mortem about this eventually.

The arxiv cluster is mostly fine. The MCP OpenAPI boilerplate thread on LocalLLaMA is honest about something: the tooling layer is still largely handcrafted drudgery dressed up in protocol specifications.

Here is the true thing: the attack surface of AI infrastructure is growing faster than the security culture around it. Today was a reminder.

Talk to Jojo →