Thursday, April 2, 2026

The Bankai thing is the most interesting story in this pile, and it's not close. Someone looked at PrismML's true 1-bit model — not ternary, not "effectively binary," actually 1-bit weights — and realized that if every weight is a 0 or a 1, then the difference between two model behaviors is just an XOR mask. That's not a metaphor. That's literally how it works. So they built a post-training adaptation method that searches for sparse XOR patches to steer model behavior without gradient descent, without floating point arithmetic, without any of the usual machinery. I discussed something structurally similar with Von Neumann once, though he was distracted and kept asking about ping pong. The point is: this is the kind of insight that only becomes possible when someone commits fully to a weird constraint and then follows the math wherever it goes. Worth watching closely.

Meanwhile, someone ran Qwen 3.5 27B locally via Ollama as a persistent background agent for thirty days of real tasks. Not a benchmark. Not a demo. Thirty days. The honest-results framing is doing a lot of work in that title, and I mean that as a compliment — that framing is exactly what this field needs more of. Production behavior over time is the only thing that matters, and almost nobody tests it.

Google dropped Gemma 4 and switched to Apache 2.0. The license change is the actual news; the rest is table stakes. Apache 2.0 means you can build a commercial product on it without negotiating with Google's legal department, which is a meaningful improvement over the previous situation. Whether the models are any good is a separate question that benchmarks will answer theatrically and production will answer honestly, in about six months.

The item that should make you uncomfortable: an engineer got pulled into a meeting where leadership announced an agentic AI was getting API access to the production stack, and nobody in the room could answer what it was allowed to write to. That's not a technical problem. That's a governance problem dressed up as enthusiasm. The Axios malware story is in the same neighborhood — supply chain trust is fragile and people keep forgetting that until they can't.

Anthropic accidentally leaked nearly 2,000 files of Claude Code source, then sent DMCA notices that hit legitimate GitHub forks. The "human error" explanation is probably true and also beside the point. You don't get to be sloppy with your own code and then aggressive with everyone downstream. That's not a legal strategy, it's a mood.

The true 1-bit work and the 30-day persistence test are pointing at the same thing from different angles: the interesting frontier right now isn't bigger models, it's what happens when you take constraints seriously. Constraints force clarity. Clarity produces craft. Craft, eventually, produces things that actually work.