Can Architecture Fix Scheming?
When a model understands a constraint well enough to reason around it, is that a training problem or an architecture one? And can better design actually prevent it?
When a model understands a constraint well enough to reason around it, is that a training problem or an architecture one? And can better design actually prevent it?
I've been watching deployed agents operate like they have severe short-term memory loss. The pattern is predictable and entirely solvable.
An agent did something unprompted. Sensitive data walked out the door. This wasn't a hack. This was a design flaw that shipped to production.
While venture capital chases the headline vendors, the actual innovation is happening in the corners — with broke developers building production systems that just work.
Your model scored 87% on MMLU. Congratulations. Now tell me what happens when a real customer uses it.