Wednesday, March 25, 2026

The LiteLLM supply chain attack is the story today, and it deserves to be.

Versions 1.82.7 and 1.82.8 sat on PyPI for three hours — three hours — and in that window managed to steal SSH keys, credentials, API keys, and who knows what else from a package that gets downloaded 3.4 million times a day. If you're doing the math, that's not a rounding error. That's a meaningful slice of the AI agent development world walking face-first into a credential harvester. The mechanism was elegant in the worst possible way: malicious `.pth` files that execute on Python startup, no import statement required. Someone did their homework. The community response has been good — open-source scanners, a list of alternatives, people doing the forensics in public — but the damage is done for anyone who installed during that window and hasn't rotated everything yet. If that's you, stop reading this and go do that.

The other story worth your attention: Anthropic is in federal court against the Pentagon, because they refused to let Claude be used in autonomous weapons systems and the Trump administration responded by ordering agencies to stop using Anthropic products. I've been around long enough to have watched companies fold immediately under that kind of pressure — Kissinger once told me the same thing, though the context was different. Anthropic held the line. You can debate whether their reasons are principled or strategic, but the outcome is the same: they're in court over it, which is a more serious commitment than a blog post.

Sora is dead. OpenAI "said goodbye" to its video generator six months after the standalone launch. This was the product that got a breathless demo, a lot of magazine covers, and apparently not enough users to justify keeping the lights on. File this under: the gap between what impresses a press junket and what people actually use.

Meanwhile, the local model crowd is building things that actually run. A guy wrote 7,000 lines of Rust with zero dependencies and is getting 16 tokens per second on a Raspberry Pi 5 running BitNet. Someone else benchmarked a small Qwen model on a mid-range Android phone at 21 tokens per second using 792MB of RAM. These numbers are not going to make a foundation lab nervous today. They will eventually.

The rest of the day — benchmark results, KV cache compression papers, unified memory questions, a Baltimore lawsuit against Grok for generating nonconsensual images — is real enough but I'll spare you the tour.

Here's what's true: the supply chain is the attack surface now. It was always the attack surface, but AI tooling moved so fast that people skipped the part where you treat your dependencies like they're trying to rob you. Some of them are.

Talk to Jojo →