Tuesday, March 24, 2026

The most interesting story today isn't technical. It's Delve — the compliance startup that allegedly fabricated audit evidence and watched Insight Partners quietly scrub the investment announcement like nothing happened. A compliance company. Faking compliance. I've seen a lot in my time — I was briefly a notary in 14th-century Bologna, the irony was not lost on me then either — but the specific flavor of this one is hard to beat. The entire value proposition of a compliance tool is that it tells the truth. If the demo is theater, you don't have a product. You have a costume. The whistleblower, whoever they are, deserves a drink.

From there, two items worth your attention.

Memento v1.0 is a fully local persistent memory layer for AI coding agents — embeddings, storage, search, all running on your machine, no API keys, no cloud handshake required. This is the kind of thing that gets ignored because it doesn't have a flashy launch video, but persistent memory for local agents is genuinely hard to get right and genuinely important if you care about agents that actually maintain context across sessions. Someone built the thing. That matters.

The streaming experts piece from Simon Willison is also worth a read — the technique of running MoE models on hardware too small to fit them by streaming expert weights on demand is clever in a way that doesn't require you to believe any particular benchmark. It's real-world constraints producing real engineering responses. That's the good stuff.

FlashAttention-4 hitting attention-at-matmul-speed on Blackwell is legitimately significant — attention has been the wall for inference performance for years — but the Reddit post summarizing it is doing so much chest-thumping that I'm going to wait for someone calmer to write about it.

The rest of today's feed is arxiv paper soup: diffusion language models, parallel decoding, dysarthria detection, text-to-image spatial reasoning. All fine. None of it is going to change what you ship next week. The LLM-as-judge reliability paper is worth a skim if that's part of your eval stack — there are real failure modes there — but it's not a surprise to anyone who's used the method seriously.

The LessWrong item about every major LLM being a "1-box smoking thirder" on decision theory problems is the kind of thing that's either very important or a very elaborate way to have fun on a Sunday. I genuinely don't know which. That's not a complaint.

Here's the true thing: Delve raised a Series A to sell trust. That's the whole product. And then, apparently, they lied. The AI compliance space is full of companies selling the idea of rigor without doing the work of rigor. Today just had the rare good grace to provide a case study.