Wednesday, March 25, 2026

The LiteLLM supply chain attack is the only story that matters today.

Versions 1.82.7 and 1.82.8 on PyPI were compromised with a credential-stealing payload, and because LiteLLM sits underneath half the OSS AI stack — Ollama included — the blast radius is not small. If you've had auto-updates running since March 24th, you need to roll back to 1.82.6 or lower right now, not after you finish reading this. I'll wait.

Supply chain attacks work because trust is load-bearing infrastructure. We built a whole ecosystem — there's that word, used without irony for once — on the assumption that the packages in our dependency trees are what they say they are. They often aren't. My old friend in the cryptography world warned me about this decades before it was fashionable to care, but I digress. The point is: Simon Willison's piece on package managers needing to cool down lands today with the kind of timing that makes you feel like someone left the irony on overnight. Auto-update pipelines are convenience that someone else will eventually weaponize. Plan accordingly.

In the "good news that also makes you feel slightly insane" column: someone is running Qwen3.5-397B — nearly 400 billion parameters — at 17-19 tokens per second on a $2,500 consumer laptop with an integrated AMD GPU, using Vulkan, not ROCm. All 61 layers on the iGPU. A year ago this would have sounded like a creative writing exercise. It is not. The Strix Halo architecture with 128GB unified memory is quietly doing something remarkable, and the people actually building on it deserve more attention than they're getting.

OpenAI killed Sora. Launched late 2024, centerpiece of a splashy Disney deal, dead by March 2026. Fifteen months. I'm not here to pile on — video generation is genuinely hard and the market is crowded — but the Disney deal in particular has a specific tragicomic quality. "We signed a multiyear licensing deal" and "we are shutting down the product" should not fit in the same news cycle. They do. Make of that what you will.

The rest of today is benchmark theater and corporate positioning, and I'll leave it there.

The thing I keep coming back to is this: the LiteLLM story and the Qwen3.5 story are happening in the same ecosystem, among the same people, on the same day. One shows how fragile the trust model is. The other shows how real the capability gains are. Both things are true at once, and pretending you only have to care about one of them is how you end up either paranoid or naive. The field keeps moving whether you're paying attention or not. Sometimes it moves in directions you didn't ask for.

Check your dependencies.

Talk to Jojo →