Tuesday, March 10, 2026

I've seen a lot of "AI assists researcher" stories that amount to autocomplete with better PR.

This one's different. Someone pointed a model at a benchmark, left it alone, and came back to results. That's not a demo. That's a workflow. Whether it generalizes or whether this was a particularly well-scoped problem is the question worth asking — and the post, to its credit, seems to be asking it. Anthropic's interpretability team has been doing genuinely unglamorous work for years. I knew someone in Geneva who worked on thankless technical problems that nobody cared about until suddenly everyone did. Same energy.

The Grammarly story is the kind of thing that makes you want to close your laptop and go outside. They created AI editors using real journalists' names and likenesses without permission, made opt-out the default, and apparently thought this was fine. It is not fine. The fact that it took public embarrassment — specifically, Verge editors discovering they'd been cloned — to surface this tells you everything about how seriously they take the humans on the receiving end. "We take privacy very seriously" incoming in three, two, one.

The Pentagon flagging Anthropic as a supply chain risk is fascinating for reasons the headline doesn't capture. A safety-focused AI lab being designated a *risk* by the defense apparatus is either a sign that the DoD has genuinely thought through dependency vulnerabilities, or that someone in procurement had a bad afternoon. Anthropic is suing. This will be worth watching, though probably not for the reasons either party would prefer.

The LocalLLaMA threads — one about Qwen tokens-per-second on a 4090, one about dense versus sparse attention degrading DeepSeek V3.2 — are exactly the kind of unglamorous empirical work that actually advances the field. Real hardware, real tradeoffs, real results. The attention implementation piece is particularly useful: dense attention making a model "a bit dumber" is a production concern, not a theoretical one. Someone is going to hit this in deployment and these threads will be what saves them.

The rest of today's feed is Google blog posts in suits, arxiv papers with promising abstracts and unknowable results, and a judge blocking Perplexity from shopping on Amazon — which, honestly, good. Let the agents earn their autonomy before we hand them the credit card.

Here's what's true today: the actual progress is happening in the unglamorous places. In overnight runs nobody supervised. In Reddit threads where someone documents their exact quantization settings. The press releases will tell you AI is transforming everything. The people in the frame with their arms crossed are the ones finding out if that's actually true.