October 5, 2025
How AOM started: a weekend with browser agents, tokens, and a question for the web
A personal account of watching a browser agent burn through tokens, hitting limits, and asking what the web should look like if agents could design their own standard.
The narrative below was written personally by the founder. No AI drafting was used for this story.
Over one weekend I’d blocked out about four hours to experiment: how far could I push automation if I let a browser-controlling agent (Comet and Perplexity) run end-to-end tasks for me? I gave it control of my browser and watched it work.
It was genuinely impressive to see in action—screenshots, reading the screen, deciding the next step, clicking, waiting for the page, repeating. For the task I gave it, it did fine. What surprised me was the cost in time and tokens to get through that one flow. I could have done the same thing faster by hand, but I was deliberately letting it finish to learn.
Then I thought: another site would mean another layout for the same stack to interpret. Same class of work, same kind of friction—again and again.
By the third or fourth task that day I’d burned through my quota and had to wait for usage to reset—roughly two hours. With that unexpected pause I started thinking: what could I have done differently to use fewer tokens? Beyond tightening prompts, there wasn’t much. And most of us were never trained to write “god-tier” prompts anyway.
So I sat there with time and no tokens, half laughing: we have powerful agents in our hands, but we don’t yet know how to use them efficiently at scale—especially for browser-driven work.
If agents are supposed to make us faster, what would “faster” actually require?
The next day I dug into how browser-controller agents actually operate—how they move through layers of the web: design, markup, DOM, timing. I asked Perplexity plainly: don’t you need a better substrate for these tasks? We looked at existing standards, specs, and protocols—anything that would squarely address “help agents do this with fewer steps and less waste.” Nothing fit. There was no obvious, web-native answer to “how do we help agents perform better, use fewer tokens, or take fewer fragile steps on arbitrary sites?”
So I changed the question. I asked the model to assume it was the agent and answer: If you could redesign what you see on web surfaces, what would you want? If you could define your own standard for this experience, what would it look like?
That thought experiment is where the idea germinated: a layer the web could publish so that AI agents and browser agents could operate with far less guesswork—orders of magnitude faster than screenshot-and-click loops on unstructured pages.
We kept coming back to JSON: something structured, validatable, and easy for agents to consume—alongside the human-facing site, not instead of it.
From that weekend’s friction to “what would agents want to read?” to a concrete push toward Agent Object Model™ (AOM) was a short path in calendar time and a long one in conviction. The spec, tools, and community you see now are the attempt to make that weekend’s question answerable for every site—not just the ones that happen to render in a way a model finds easy today.
If you want the technical companion to this story—why structured surfaces and policies beat HTML-only automation—see How to design websites for AI agents (not just humans). For site owners focused on policy and etiquette, see Your Site. Your Rules.