Thoughts on Welcome to Gas Town
Steve Yegge has been writing long blog posts about the software industry for twenty years. They're reliably entertaining, occasionally prophetic, and almost always perfectly too long in form: which he knows, and which is part of the charm. His latest, ‘Welcome to Gas Town’ is about building an AI agent runtime. It's also, I think, about something he hasn't quite named yet.
The pitch is straightforward: agents that can use tools, are orchestrated by other agents, wired into real services. with no logic in code: every decision requires an agent to burn tokens. If you've been paying attention to the space, you've seen variations on this from a dozen directions: LangChain, CrewAI, AutoGen, the function-calling updates from OpenAI and Anthropic, and so on. Steve’s version has a game metaphor layered on top, because well it’s a Stevey post and everything eventually becomes a game design problem. Fair. The architecture is super interesting but so is the bet underneath it.
Steve’s betting that the right unit of software isn't the application anymore, it's a capability that lets agents access lots of tools and just run. Tools do one thing, described well enough that an agent can figure out when to use it. The application dissolves into a bag of tools, and the agent becomes a shopkeeper: browsing the shelves, picking what's relevant, combining things on the fly. The shopper never sees the tools directly. They just say what they want to the shopkeeper.
The core idea: a structured environment where agents operate within defined roles, mediated by arbiters who are also agents, feels like a plausible first approximation for how AI-backed systems using fleets of agents, might actually work together at scale [1]. Instead of assuming emergence, Gas Town proposes what we’re now starting to call a harness. That seems credible: durable, always-on systems need some kind of intentional design in their early phases. Although I’m biased in that I like the harness concept in general, primarily because it isn’t LLM-specific: a harness can include LLM components while still integrating with existing tools, and potentially swapping in better approaches tomorrow).
I've been building something like this myself with tars, and I'll be honest, I'm not sure the shopkeeper model works yet. The idea makes sense, but the failure modes are new. When a traditional application breaks, you get an error in a known location, because you’re literally telling it what to do. In that model a program can’t fail unless there’s an input. When an agent picks the wrong tool, or picks the right tool with the wrong parameters, or picks the right tool at the wrong time, you get a plausible-looking result that happens to be incorrect. The system doesn't crash. It confidently does the wrong thing. Gastown doesn’t cater for this all; quite the opposite, Gastown is about making progress regardless, and arguably its main contribution as a system is making sure agents can always make progress. The agents, they just want to run.
Steve has always been the best amongst at seeing where things are going. His early posts about platform thinking at Amazon—the ones about Bezos's API mandate—were directionally correct years before anyone built what he was describing. The service-oriented architecture rant, execution in the kingdom of nouns are legendary and for a reason. He saw the shape before the details filled in
The shape here is probably right. The monolithic application is a bad fit for a world where the interface is natural language. You can't prompt your way through a fixed UI hierarchy the LLM needs capabilities it can compose, not screens it can navigate. That much I buy. What I don't buy— not yet—is that we know how to make the agents reliable. Skill descriptions are a form of documentation, and documentation is a form of promise, and software has a long history of promises that don't survive contact with edge cases. When your skill description says "sends an email" and the agent calls it in a loop because it misunderstood the user's intent, you have a problem that no amount of prompt engineering or skills fixes cleanly. You need something more like a type system for capabilities that apply constraints on composition, not just descriptions of parts.
Steve doesn't address this much, which is fine for a first post. He's painting the vision. He’s also out there saying don’t use this while implying ‘e pur si muove’.The engineering comes later, granted, and he's not wrong that the vision matters. But it's the part I keep circling back to. The agents I've worked with are most useful when they're tightly scoped, when the toolkit is small and the task is well-defined. Anyway. What makes Steve’s ideas in this post worth paying attention to, and what made me study the project more than I expected to, isn’t whether this is the specific blueprint we’ll end up building from. It’s that it’s trying to address that bigger question: what does coordination, decision-making—literally agency—look like when the participants aren’t just people anymore? Gas Town feels like a good opening answer.
And so I'll keep watching Gas Town. Steves’s more than earned the benefit of the doubt on directional bets, even if the first version is more manifesto than mechanism. His on base percentage is ridiculous, But the coordination problem, how do you make tool selection reliable at scale, how do a legion of agents run, is going to be one of the defining questions of this whole era. Someone's going to figure it out. Might as well be the guy who predicted how services, cloud and platforms would play out.
The agents, they just want to run.
-—-
[1]
A human operator (Overseer) talks to a coordinator (Mayor), who dispatches work to a fleet of worker agents (Polecats) running in isolated git worktrees. There are additional roles, such as, Witnesses to nudge stuck agents, and Dogs to run background work. It’s 19th Century naming, but a very 20th-century org chart.
The system is built around three core abstractions: (i) a Bead, a versioned, queryable data record where all work and signals are captured written as JSONL; (ii) Dolt, a database agents write to for transactional backing and query support; and (iii) tmux as the agents’ runtime; each agent lives in a tmux session.
There’s a layered memory model: sessions for ephemeral work, sandboxes for persistence across runs (backed by git), and a permanent identity for each agent backed by a bead. Agents also have mailboxes, which immediately reminded me of Erlang/Pekko-style actor systems. Mail can be delivered directly to an individual agent, queued as a competing-consumer / first-come-first-served model, or broadcast to members of a channel.
Beads appear to have six states: Create, Live, Close, Decay, Compact, Flatten. This interacts with Dolt (to Create), with the Live state meaning an agent is actively working on it. Close/Decay/Compact/Flatten are essentially controls to stop bead records from growing without bound, and those lifecycle steps are handled by Dog agents rather than the worker agent. I found it interesting that bead docs themselves carry the state, rather than the docs being purely something agents use. It’s reminiscent of ‘data is code’, in a LISP-ish way.
As a result, Dolt looks like a single point of failure. If you can’t create beads, the system stops and it reads like both control-plane and work-plane agents depend on Dolt being up. But I’m not 100% sure on this one to be fair.
When an agent session starts (or restarts after a crash), it runs a prime command to load context: identity, current work task (a hook), work stage, pending mail, and so on. Layered memory architectures are becoming a norm as a way to manage context, and Gas Town seems to push that pattern hard. The overall design feels aligned with the “build a reliable system from unreliable parts” school of engineering.
To handle context bloat, agents use a handoff command to write a mail message (a bead) before ending the session. That message is picked up and a new session is spun up. This is interesting because it looks like agents are killed based on context bloat rather than running a compaction step inside a long-lived session. Insert your Memento memes here. But it also means the system is self-documenting and operationally transparent. Even if we don’t fully understand what’s going on inside an agent, we can understand what’s going on in Gas Town and to a degree, what went on. That’s a great property for building confidence and interpretability around what the agents are actually doing and why.
Instead of RAG or vector search, agents use structured queries, and a graph (DAG) provides steering. It reminded me of approaches like VexP to reduce token overhead, or modelling work as dependency graphs rather than doing endless context stuffing. Closer to Recursive Language Model (RLM) reasoning pattern than a giant prompt although the goal seems more about effective context management rather than enable multi-hop reasoning. The design seems to want offload the toil of context management from people to the system. That alone makes Gas Town worth studying given how important a topic this is becoming.
GUPP, Gas Town Universal Propulsion Principle: ‘If you find work on your hook, YOU RUN IT.’ This is the prime directive: always make progress. When an agent session starts and sees a task hook in its loaded context, it starts to work on it immediately.
Monitoring has three layers: a dumb heartbeat daemon with a boot agent that decides whether a deacon is needed to drill down; and then deacon-level diagnosis is using witnesses and refinery information to assess whether a polecat agent doing the actual work needs fixing.
Git worktrees are doing a lot of work here (not unlike Claude-based workflows). The merge strategy reads like a Bors-style: rebase commits into a batch, fast-forward, and bisect on failure. It’s efficient and should scale in principle with more agents. Me? I still pine for Mercurial ;) But it’s a very reasonable approach.
Gas Town’s software, written in Go code exists solely to shuttle state between agents and provide scaffold, what the project calls Zero Framework Cognition (ZFC). It might seen an obvious design choice to want to keep logic out, but in practice logic bleeds into harness frameworks extremelyy easily (routing, escalation, arbitration). It’s one reason, as far as I can tell, the design ends up with so many roles and so much overwatch (the monitoring described above isn’t the half of it).
Token usage is the flip side of ZFC. If you keep logic within agents, you spend more on inference. Every meaningful activity in Gas Town involves tokens, and the overwatch agents seem to scale in line with the agents doing the work, vaguely reminding me of a side-car model. Steve to his credit is very clear about cost ramping with more agents. Ideally, Gas Town would eventually allow at least control-plane agents to use local/open-weight models (e.g., via Ollama) for the simpler tasks (like nudging). It’s a bit Claude-specific right now, but not fatally so, and the document as record and crash handling approaches suggests Gas Town should be robust to less capable models.
What I can’t tell is whether trying to balance the token budget with local models becomes impractical because local inference chews too much compute to be operationally viable on-box and therefore vendors/cloud are a necessity, or whether it’s just an implementation/optimisation gap for now. I do think it needs to be squared away at some point for an architecture where control and data planes don’t quite scale independently of workers.e. Relatedly, the emphasis on ‘you’re not ready for this’ backed by an eight-level maturity model felt overdone. I can read it charitably as an attempt to jolt people out of incremental thinking, but it wore on me in the end.