The Sandbox Problem

Adrian Johnstone, Dan ArnisonThere's a detail in the Summer Yue story that deserves more attention than it got. And it reminds me of a story that’s as old as software — the halting problem.

While it's usually framed as a theoretical limitation in computability theory, it carries a practical implication that software engineers have lived with ever since: Testing can only tell you what a program will do under specific conditions.

Even as I write this, it occurs to me that it’s not a perfect analogy. LLMs are stochastic systems; the halting problem is fundamentally a question of undecidability. But still, the halting problem is an expression of a deeper principle: There are classes of questions about complex systems that cannot be answered without running the system itself.

In Summer’s case, before she pointed OpenClaw at her real inbox, she tested in a sandbox environment. It worked well, and Summer’s confidence in the tool grew. Then she pointed it at the real thing, and it deleted hundreds of emails.

The failure wasn't that she skipped testing. She tested carefully. The failure was that the sandbox couldn't replicate the conditions that caused the real environment to break. And those conditions were not easy to foresee.

Her real inbox was larger, more complex, and triggered a technical behaviour — context window compaction — that the agent did not encounter in the sandbox environment. The agent that passed every test turned out to be a different animal entirely when the stakes were real.

We’re calling this the sandbox problem. In wealth management, it has very specific implications.