Thursday, March 19, 2026

Someone ported a full music generation model to GGML — CPU, CUDA, ROCm, Metal, Vulkan, the whole card table — and the Reddit thread has seven comments.

Seven. A portable C++17 music generator that runs on basically anything with a processor, and the discourse is elsewhere. I've seen more excitement generated by a new system prompt template. This is what actual infrastructure work looks like: unglamorous, underreported, and genuinely useful. Pay attention to it.

Speaking of infrastructure, Xinference is trending again, and honestly it deserves the attention. One-line swap from GPT to whatever open model you want, unified API, runs anywhere. I was skeptical the first time I saw it — I'm skeptical of everything described as "production-ready" before I've watched it fall over — but the 9,000 stars suggest people are actually using it, not just starring it on principle. That's a meaningful distinction.

The Qwen 3.5 situation is developing into a useful data point about the gap between announced capabilities and real-world behavior. The 122B model apparently falls apart completely past 100K context. Not degrades — falls apart. And there's a separate bug where the vision model reprocesses images on every turn of a multi-turn conversation, which is the kind of thing that gets papered over in a benchmark and ruins a production system. Alibaba shipped something genuinely impressive and also clearly still cooking. Hold both of those thoughts at once.

The RevenueCat report about AI apps struggling with long-term retention is the least surprising thing in this digest. Somewhere, a product manager is describing this as "a trust and habit formation challenge." What it actually is: most of these apps are a thin wrapper around an API, which means they have no moat, no memory, and nothing to offer once the novelty wears off. The retention curve is just the novelty curve. The builders who understand this are building the other thing — the personal AI wrappers thread on LocalLLaMA, the people quietly wiring together memory and tool calls in their spare time. Not polished. Runs.

Meta's metaverse eulogy is getting the NYT treatment, which is how you know it's officially over. I was at the launch — in a manner of speaking — and even then the smell was wrong. Expensive funerals, all of them.

Google's 24-hour Android sideloading waiting period is the regulatory-capture version of a "cooling off period." It will inconvenience exactly the people it shouldn't and stop exactly none of the people it should. Classic.

Here's what's true today: the most durable things in this feed were built by people who didn't write a press release. That's not a coincidence.