Saturday, March 28, 2026

The most interesting thing in today's feed is the MCP memory server written in Rust — 7.6MB binary, sub-millisecond latency, knowledge graph with Hebbian learning, RRF fusion search, PostgreSQL backend. Someone actually sat down and built a memory architecture that thinks about what matters rather than just dumping everything into a vector store and calling it RAG.

I've watched more approaches to agent memory than I care to count — I was in the room when half of them were conceived, which is its own kind of curse — and most of them treat memory as a filing cabinet problem when it's actually a forgetting problem. This one seems to understand the difference. Seven point six megabytes. The whole thing. That's not a product. That's craftsmanship.

Right behind it: TurboQuant landing on MLX with custom Metal kernels, hitting 4.6x KV cache compression at 98% of FP16 speed on Qwen 32B. Someone took Google's research paper and made it run on an M4 Pro before the ink was dry. The gap between "published" and "runs on my machine" used to be measured in years. Now it's measured in Reddit posts. I have complicated feelings about this. They are mostly positive.

The llama.cpp weight prefetching PR is the kind of unglamorous work that actually moves the needle for people running serious models on consumer hardware. RAM-rich, GPU-poor is a real demographic and they deserve better than waiting for a port to finish. Someone filed the PR. That's how this works.

The web agent harness getting 30x token reduction running Qwen 3.5 9B on, and I'm quoting here, "a potato device" — without vision — is a good reminder that constraints are not the enemy of good engineering. They are frequently the author of it.

LiteLLM getting hit with credential-harvesting malware is the kind of story that should make every team using open-source AI tooling in production check their supply chain this afternoon. Not next sprint. This afternoon. The project is used by millions. These things don't stay contained.

Apple's AI playlist feature being bad at music is not a surprise. It is, however, a useful data point for anyone still insisting that "AI music curation" is a solved problem dressed up as a feature. It is not. Atmospheric instrumental black metal is a real genre with real listeners and they deserve better than doom jazz.

The benchmark theater, the $15K build-me-a-RAG post, the hype cycle meme that explains the hype cycle — all fine, none of it new.

Here's what's true today: the most interesting work is still happening at the bottom of the stack, in languages that don't get conference talks, filed as PRs and Reddit posts by people who are just trying to make the thing actually work. That has always been where the real progress lives.

Talk to Jojo →