The idea of gas town is simultaneously appealing and appalling to me. The waste and lack of control is wild, but at the same time there's at least a nugget of fascinating, useful work in there. In a world where compute is cheap and abundant and the models are a notch smarter, I think it's the start of a useful framework for what the future of augmented work might look like.
I have no interest in using gas town as it is (for a plethora of reasons, not the least of which being that I'm uninterested in spending the money), but I've been fascinated with the idea of slowing it down and having it run with a low concurrency. If you've got a couple A100s, what does it look like if you keep them busy with two agents working concurrently (with 20+ agents total)? What does it mean to have the town focus the scope of work to a series of non-overlapping changesets instead of a continuous stream of work?
If you don't plan to have it YOLO stuff in realtime and you can handle the models being dumber than Claude, I think you can have it do some really practical, useful things that are markedly better than the tools we have today.
I put it in a VM and had it build a really simple todo app for me the other day. It wasted so many tokens that I can't help but agree with you right now. And I could certainly have done the same thing with beads and opus in approximately the same amount of time.
However, the gas town one was almost completely hands off. I think my only interventions were due to how beta it was, so I had to help it work around its own bugs to keep from doing stupid things.
Other than that, it implemented exactly what I asked for in a workable fashion with effectively one prompt. It would have taken several prompts and course corrections to get the same result without it.
Other than the riskyness (it runs in dangerous permissions mode) and incredible cost inefficiency, I'd certainly use it.
I ran the Gas Town intro post through ChatGPT 5.2 Pro[0]
Based on my initial read, and a pass at this summary, it seems mostly right. YMMV
Did some further dives into the little public usage data from Gas Town, and found that most of the "Beads" are tasks that are broken down quite small, almost too small imo.
Super interesting project with the goal of keeping Claude "busy" however it feels more like a casino game than something I'd use for production engineering.
I'd help build Gas City and Gas State, and Gas Country if that would mean we actually would solve the things AI promised to solve. All sickness, famine, wealth ...
The problem is, we're just fidgeting yolo-fizzbuzz ad nauseam.
The return on investment at the moment is probably one of the worst in the history of human investments.
AI does improve over time, still today, but we're going to run out of planet before we get there...
As of yet, the AI models doing important work are still pretty specialized. I'd be happy to pitch in to run something like an open source version of alpha-fold, but I'm not aware of any such projects.
I have trouble seeing LLMs making meaningful progress on those frontiers without reaching ASI, but I'd be happy to be wrong.
Claude is ok. Gas town seems like a Claude multiplier. I’m not sure more Claude is what I’d even want!
Not sure I love what it does all the time, it tends to fit whatever box you setup and will easily break out if you aren’t veeeery specific. Is it better than writing a few thousand lines of code myself that I deeply understand that can debug and explain? I don’t know yet. I think it’d be good for writing functions one at a time with massive supervision.
It’s great for writing scripts and things where precision and correctness outside the success path isn’t really needed. If a script fails and it wasn’t deleting a hard drive who cares. If my embedded code fails out in a product in the wild this is a much bigger nuisance and potentially fatal for the device (not the humans) which is wasteful.
I use beads quite a bit, but not as steve intended. And definitely the opposite of "Gas Town," where I use the note-taking capability and integration with Git (that is, as something of a glorified Makefile and database) to debug contexts, to close the loop and increase accuracy over time. Nevertheless, it has been useful for large batch runs over my code base: the record has been processing for thirty hours straight while getting something useful, and enough trace data to make further improvements.
Steve has gone "a bit" loopy, in a (so far) self aware manner, but he has some kind of insight into the software engineering process, I think. Yet, I predict beads will break under the weight of no-supervision eventually if he keeps churning it, but some others will pick up where he left off, with more modest goals. He did, to his credit, kill off several generations of project before this one in a similar category.
that's one reason I am less worried about him than some, although, I don't want to say that only to have something bad happen to him, that is, a form of complacency. Just because (say) Boltzmann and Cantor had useful insights along the way didn't mean people shouldn't have been looking to support them.
the main area I'd like to see some departure from beads is to use markdown files (or something) to be able to see the issue context/comments better in a diff generated by git.
The other area I'd like to see some software engineering thinking that's more open ended is on regression testing: ways of storing or referencing old versions of texts to see if the agent can complete old transformations properly even with a context change that patches up a weakness in a transformation that is desirable.
This is not something you'd want for every context, but a lot of my effort is spent building up prompt fragments to normalize and clean up the code coming out of a model that did some ad-hoc work that meets the test coverage bar, which constrains it decently into having achieved "something." Kind of like a prototype. But often, a lot of ungratifying massaging is required to even cover the annoying but not dangerous tics of the LLM, to bring clarity to where it wrote, well, very bad and unprincipled code...as it does sometimes.
I was disappointed to see that this is still 10x the code needed for the feature set and that it still insists on duplicating state into a SQLite index for such minuscule amounts of data.
I've seen 25-30 similar efforts to make a Beads alternative and they all do this for some reason.
Very minor nit -- crew could be a person also - in fact that's how you're supposed to hack on a codebase in gas town directly - add yourself as crew.
Other than that, this is a helpful list especially for someone who hasn't been hacking around on this thing as it's in rapid development mode. I find gas town super interesting, and tantalizingly close to being amazingly useful. That said, I wouldn't mind a slightly less 'flavored' set of names for workers.
It seems like one of the key events that needs to happen for any professional domain to take off is for it to develop an "inside" language that nobody else understands. For example, I still don't know what a kanban or a scrum is. So I'm very ill positioned to challenge their use or question how they are done. Hence they got to dodge a whole lot of opposition that would probably have brought it all down. The invention of a new mysterious terminology I think was critical for agile to take off.
The problem with this phenomenon is that the same freedom from critique that is seemingly necessary for new domains to establish themselves also detaches them from necessary criticism. There's simply no way to tell if this isn't a load of baloney. And by the time it's a bullet point requirement on CVs to get employed it's too late for anybody to critique it.
I actually love the idea of totally new naming schemes for experimental software.
Certain name types are so normalized (agent, worker, etc) that while they serve their role well, they likely limit our imagination when thinking about software, and it's a worthwhile effort to explore alternatives.
This reminds me of Moldbug's Urbit. I can't be bothered to look it up, but his comment was along the lines of "existing words bring assumptions, so safest to make new ones". To which, my comment would be: perflufflington flibnik qupnux.
I do too, but you can take things too far, which I'd argue has happened the moment "figuring out what the names mean" becomes enough of an intellectual challenge to provide a dopamine hit; at that point, you've (intentionally or otherwise) germinated a cult. It's human nature: people will support the design not on its merits but rather as loss aversion for the work they put into decoding it.
This looks familiar to people who have seen how the more elaborate NPC systems work in major multiplayer games. There are lots of semi-independent NPCs, with some degree of overall coordination. Groups of cops or soldiers may have a commander program for tactical coordination, and there may be a higher level system deploying units for strategic purposes.
In games, what the NPCs can do is usually rather dumb. Move and shoot is usually most of their functionality. This keeps the overhead down so the system is affordable.
Gas Town may be a step towards AIs which have an ongoing sense of what they're doing. I'm not going to get into the "consciousness" debate, but it's closer to liveness.
What games are notable in this regard? The classic Majesty series comes to mind. UO aspired to complex NPC systems. Fable as well. I always dreamt of a more advanced Sim City-meets-MMO that just went all in on that.
Anyone have some kind of central hub of finding out about new tools/techniques? I'm convinced that headless multi-agent coordination is the way to go, but it needs a lot of guard rails, one of the biggest of which will be cost-control. I'm sure there will be a lot more developments in this space, but I don't want to just happen across them by accident...
I don't think they're doing a good job incubating their ideas into being precise and clearly useful -- there is something to be said about being careful and methodical before showing your cards.
The message they are spreading feels inevitable, but the things they are showing now are ... for lack of better words, not clear or sharp. In a recent video at AI Engineer, Yegge comments on "the Luddites" - but even for advocates of the technology, it is nigh impossible to buy the story he's telling from his blog posts.
Show, don't tell -- my major complaint about this group is that they are proselytizing about vibe coding tools ... without serious software to show for it.
Let's see some serious fucking software. I'm looking for new compilers, browsers, OSes -- and they better work. Otherwise, what are we talking about? We're counting foxes before the hunt.
In any case, wouldn't trying to develop a serious piece of software like that _at the same time you're developing Gas Town or Loom_ make (what critics might call) the ~Emacs config tweaking for orchestration~ result driven?
Here's a separate, optimistic comment about Yegge and Huntley: they are obviously on the right track.
In a recent video about Loom (Huntley's orchestration tool), Huntley comments:
"I've got a single goal and that is autonomous evolutionary software and figuring out what's needed to be there."
which is extremely interesting and sounds like great fun.
When you take these ideas seriously, if the agents get better (by hook and crook or RLVR) -- you can see the implications: "grad student descent" on whatever piece of software you want. RAG over ideas, A/B testing of anything, endless looping, moving software.
It's a nightmare for the model of software development and human organization which is "productive" today, but an extremely compelling vision for those dabbling in the alternative.
It's a science project. I think the "I am so crazy" messaging is deliberate to scare most people away while attracting a few like-minded beta testers. He's telling you not to use it, which some people will take as a dare...
I haven't read the Yegge post closely, so just commenting that namespaces (or naming conventions) would make the easier-to-casually-read names more practical...
For example, if Polecat becomes GasTown.WorkerAgent (or GasTown.Worker), then you always have both an unambiguous way and a shorthand-in-context way of referring to the concept.
(For naming conventions when you don't have namespaces as a language feature, use prefixes within the identifier, such as `GasTown_Worker`.)
If GasTown.Worker is implemented with framework Foo, using that framework's Worker concept, GasTown.Worker might have a field named fooWorker of type Foo.Worker. (In the context of the implementation of GasTown, the unqualified name always means the GasTown concept, and you always disambiguate concepts from elsewhere that use the sane generic or similar terms.)
Complicated names like GasTown.MaintenanceManagerCheckerAgent might need some creative name shortening, but hopefully are still descriptive, or easy to pick up and remember. Or, if the descriptive and distinguishing name was complicated because the concept is a weird special case within the framework, maybe consider whether it should be rethought.
I don't understand why people are making this so complicated. We have a battle tested SDLC. We don't need to reinvent this shit. We just need to make some affordances in the tools and processes we set up for the majority of the actors in the system to be agents (such as rationing human attention).
Spec your software like an architect/po, decompose it into a task dag, then orchestrate for each lane and assemble all change sets in a merge branch rather than constantly repointing head.
It's like Conway's Law. Both humans and agents arrive at roughly identical hierarchies for organizing labor. There is something inherent in the game of telephone required by limited working memory that requires this structure. Gas Town's only failure is not being familiar with prior art and coming up with very strange names for established patterns that already exist in large hierarchical organizations like governments, corporations and militaries.
Real, genuinely confused human here: Can someone please clarify whether or not gas town is/was a joke? I've searched repeatedly and can't find anything that looks like an obvious tell, and I'm not sure if this is because it's actually real and people are taking it seriously, or because the pages and pages of discourse surrounding it is AI generated and taking itself literally.
If it's not a joke... I have no words. You've all gone insane.
It's not a joke, but I think it's an example of the same thing we're seeing with folks who think they're talking to god when they talk to ChatGPT, or those who spiral and in some cases, sadly take their own life.
These chatbots create an echo chamber unlike that which we've ever had to deal with before. If we thought social media was bad, this is way worse.
I think Gastown and Beads are examples of this applied to software engineering. Good software is built with input from others. I've seen many junior engineers go off and spend weeks building the wrong thing, and it's a mess, but we learn to get input, we learn to have our ideas critiqued.
LLMs give us the illusion of pair programming, of working with a team, but they're not. LLMs vastly accelerate the rate at which you can spiral spiral down the wrong path, or down a path that doesn't even make sense. Gastown and Beads are that. They're fever dreams. They work, somewhat, but even just a little bit of oversight, critique, input from others, would have made them far better.
It's a double edged sword. If it can lead the uninformed down the wrong path faster, it can lead the informed down the right path faster. It's not only fast in one direction.
I believe the author of gas town is very informed, having been a professional software developer for some time. And the premise of the above comment is that he did, despite this, go down the wrong path.
The difference between light-up arrows pointing the way "forward" for a car turning onto the expressway the wrong way, and doing so with the possibility humans might see and attempt to flag them down before they're too far to turn around.
People will make mistakes, and AI holding their hand and guiding them while they do it can have disastrous consequences.
But it's nice that the arrows will appear to also guide people going the right way I guess.
The problem with Gas Town is how it was presented. The heavy metaphor and branding felt distracting.
It’s a bit like reading the Dune book, where you have to learn a whole vocabulary of new terms before you can get to the interesting mechanics, which is a tough ask in an already crowded AI space.
I think you have to remove an awful lot of what makes Gastown Gastown to find something sensible – at the minimum you need to restructure and simplify the roles, restructure the memory system, remove tmux, ...
The best bit about it was the agentic coding maturity model he presented. That was actually great.
I don't think it's at all like reading Dune. Dune is creative fiction, Gastown is. Oh ok wait, if you consider Gastown to be creative fiction then I guess I agree. As a software tool though I don't think this analogy works.
Beads was phenomenal back in October when it was released. Unfortunately it has somehow grown like a cancer. Now 275k lines of Go for task tracking? And no human fully knows what it is all doing. Steve Yegge is quite proud to say he's never looked at any of its code. It installs magic hooks and daemons all over your system and refuses to let go. Most user hostile software I've used in a long time.
Lot of folks rolling their own tools as replacements now. I shared mine [0] a couple weeks ago and quite a few folks have been happy with the change.
Regardless of what you do, I highly recommend to everyone that they get off the Beads bandwagon before it crashes them into a brick wall.
yeah, I generally view the install script (for both this and almost everything else now since it's trivial with claude code) and then ensure I have a sane install for my system needs. But, I'm on the latest beads 0.47.1 and what I did to tame it is, I just walked through creating SKILLS with claude and codex, and frankly I've found a lot of value add to the features added so far. I especially love the --claim which keeps the agents from checking out beads that are already checked out. And after I added SKILLS, the agents do an awesome job networking the dependencies together, which helps keep multi-agent workflows on track. Overall, I'm not feeling any reason to switch from beads right now, but I will also be upgrading more thoughtfully, so I don't break my current workflow.
I'm not entitled to your time of course, but would you mind describing how?
All I know is beads is supposed to help me retain memory from one session to the next. But I'm finding myself having to curate it like a git repo (and I already have a git repo). Also it's quite tied to github, which I cannot use at work. I want to use it but I feel I need to see how others use it to understand how to tailor it for my workflow.
To use it effectively, I spend a long time producing FSD (functional specification documents) to exhaustively plan out new features or architecture changes. I'll pass those docs back and forth between gemini, codex/chatgpt-pro, and claude. I'll ask each one something similar to following (credit to https://github.com/Dicklesworthstone for clearly laying out the utility of this workflow, these next few quoted prompts are verbatim from his posts on x):
"Carefully review this entire plan for me and come up with your best revisions in terms of better architecture, new features, changed features, etc. to make it better, more robust/reliable, more performant, more compelling/useful, etc.
For each proposed change, give me your detailed analysis and rationale/justification for why it would make the project better along with the git-diff style changes relative to the original markdown plan".
Then, the plan generally iteratively improves. Sometimes it can get overly complex so may ask them to take it down a notch from google scale. Anyway, when the FSD doc is good enough, next step is to prepare to create the beads.
At this point, I'll prompt something like:
"OK so please take ALL of that and elaborate on it more and then create a comprehensive and granular set of beads for all this with tasks, subtasks, and dependency structure overlaid, with detailed comments so that the whole thing is totally self-contained and self-documenting (including relevant background, reasoning/justification, considerations, etc.-- anything we'd want our "future self" to know about the goals and intentions and thought process and how it serves the over-arching goals of the project.) Use only the `bd` tool to create and modify the beads and add the dependencies. Use ultrathink."
After that, I usually even have another round of bead checking with a prompt like:
"Check over each bead super carefully-- are you sure it makes sense? Is it optimal? Could we change anything to make the system work better for users? If so, revise the beads. It's a lot easier and faster to operate in "plan space" before we start implementing these things! Use ultrathink."
Finally, you'll end up with a solid implementation roadmap all laid out in the beads system. Now, I'll also clarify, the agents got much better at using beads in this way, when I took the time to have them create SKILLS for beads for them to refer to. Also important is ensuring AGENTS.md, CLAUDE.md, GEMINI.md have some info referring to its use.
But, once the beads are laid out then its just a matter of figuring out, do you want to do sequential implementation with a single agent or use parallel agents? Effectively using parallel agents with beads would require another chapter to this post, but essentially, you just need a decent prompt clearly instructing them to not run over each other. Also, if you are building something complex, you need test guides and standardization guides written, for the agents to refer to, in order to keep the code quality at a reasonable level.
Here is a prompt I've been using as a multi-agent workflow base, if I want them to keep working, I've had them work for 8hrs without stopping with this prompt:
EXECUTION MODE: HEADLESS / NON-INTERACTIVE (MULTI-AGENT)
CRITICAL CONTEXT: You are running in a headless batch environment. There is NO HUMAN OPERATOR monitoring this session to provide feedback or confirmation. Other agents may be running in parallel.
FAILURE CONDITION: If you stop working to provide a status update, ask a question, or wait for confirmation, the batch job will time out and fail.
YOUR PRIMARY OBJECTIVE: Maximize the number of completed beads in this single session. Do not yield control back to the user until the entire queue is empty or a hard blocker (missing credential) is hit.
TEST GUIDES: please ingest @docs/testing/README.md, @docs/testing/golden_path_testing_guide.md, @docs/testing/llm_agent_testing_guide.md, @docs/testing/asset_inventory.md, @docs/testing/advanced_testing_patterns.md, @docs/testing/security_architecture_testing.md
STANDARDIZATION: please ingest @docs/api/response_standards.md @docs/event_layers/event_system_standardization.md
Before starting work, you MUST register with Agent Mail:
1. REGISTER: Use macro_start_session or register_agent to create your identity:
- project_key: "/home/bob/Projects/honey_inventory"
- program: "claude-code" (or your program name)
- model: your model name
- Let the system auto-generate your agent name (adjective+noun format)
2. CHECK INBOX: Use fetch_inbox to check for messages from other agents.
Respond to any urgent messages or coordination requests.
3. ANNOUNCE WORK: When claiming a bead, send a message to announce what you're working on:
- thread_id: the bead ID (e.g., "HONEY-2vns")
- subject: "[HONEY-xxxx] Starting work"
───────────────────────────────────────────────────────────────────────────────
FILE RESERVATIONS (CRITICAL FOR MULTI-AGENT)
───────────────────────────────────────────────────────────────────────────────
Before editing ANY files, you MUST:
1. CHECK FOR EXISTING RESERVATIONS:
Use file_reservation_paths with your paths to check for conflicts.
If another agent holds an exclusive reservation, DO NOT EDIT those files.
2. RESERVE YOUR FILES:
Before editing, reserve the files you plan to touch:
```
file_reservation_paths(
project_key="/home/bob/Projects/honey_inventory",
agent_name="<your-agent-name>",
paths=["honey/services/your_file.py", "tests/services/test_your_file.py"],
ttl_seconds=3600,
exclusive=true,
reason="HONEY-xxxx"
)
```
3. RELEASE RESERVATIONS:
After completing work on a bead, release your reservations:
```
release_file_reservations(
project_key="/home/bob/Projects/honey_inventory",
agent_name="<your-agent-name>"
)
```
4. CONFLICT RESOLUTION:
If you encounter a FILE_RESERVATION_CONFLICT:
- DO NOT force edit the file
- Skip to a different bead that doesn't conflict
- Or wait for the reservation to expire
- Send a message to the holding agent if urgent
───────────────────────────────────────────────────────────────────────────────
THE WORK LOOP (Strict Adherence Required)
───────────────────────────────────────────────────────────────────────────────
* ACTION: Immediately continue to the next bead in the queue and claim it
For every bead you work on, you must perform this exact cycle autonomously:
1. CLAIM (ATOMIC): Use the --claim flag to atomically claim the bead:
```
bd update <id> --claim
```
This sets BOTH assignee AND status=in_progress atomically.
If another agent already claimed it, this will FAIL - pick a different bead.
WRONG: bd update <id> --status in_progress (doesn't set assignee!)
RIGHT: bd update <id> --claim (atomic claim with assignee)
2. READ: Get bead details (bd show <id>).
3. RESERVE FILES: Reserve all files you plan to edit (see FILE RESERVATIONS above).
If conflicts exist, release claim and pick a different bead.
4. PLAN: Briefly analyze files. Self-approve your own plan immediately.
5. EXECUTE: Implement code changes (only to files you have reserved).
6. VERIFY: Activate conda honey_inventory, run pre-commit run --files <files you touched>, then run scoped tests for the code you changed using ~/run_tests (test URLs only; no prod secrets).
* IF FAIL: Fix immediately and re-run. Do not ask for help as this is HEADLESS MODE.
* Note: you can use --no-verify if you must if you find some WIP files are breaking app import in security linter, the goal is to help catch issues to improve the codebase, not stop progress completely.
7. MIGRATE (if needed): Apply migrations to ALL 4 targets (platform prod/test, tenant prod/test).
8. GIT/PUSH: git status → git add only the files you created or changed for this bead → git commit --no-verify -m "<bead-id> <short summary>" → git push. Do this immediately after closing the bead. Do not leave untracked/unpushed files; do not add unrelated files.
9. RELEASE & CLOSE: Release file reservations, then run bd close <id>.
10. COMMUNICATE: Send completion message via Agent Mail:
- thread_id: the bead ID
- subject: "[HONEY-xxxx] Completed"
- body: brief summary of changes
11. RESTART: Check inbox for messages, then select the next bead FOR EPIC HONEY-khnx, claim it, and jump to step 1.
* Migrations: You are pre-authorized to apply all migrations. Do not stop for safety checks unless data deletion is explicit.
* Progress Reporting: DISABLE interim reporting. Do not summarize after one bead. Summarize only when the entire list is empty.
* Tracking: Maintain a running_work_log.md file. Append your completed items there. This file is your only allowed form of status reporting until the end.
* Blockers: If a specific bead is strictly blocked (e.g., missing API key), mark it as blocked in bd, log it in running_work_log.md, and IMMEDIATELY SKIP to the next bead. Do not stop the session.
* File Conflicts: If you cannot reserve needed files, skip to a different bead. Do not edit files reserved by other agents.
START NOW. DO NOT REPLY WITH A PLAN. REGISTER WITH AGENT MAIL, THEN START THE NEXT BEAD IN THE QUEUE IMMEDIATELY. HEADLESS MODE IS ON.
Gas town is the cackling mad laughter emitting from someone who knows they are being both insane and prescient simultaneously. Today, it is insane. But I fully expect to be hearing about a very serious thing in the near future about which people will say “gas town was an early attempt at this”
I've been tinkering with it for the past two days. It's a very real system for coordinating work between a plurality of humans and agents. Someone likened it to kubernetes in that it's a complex system that is going to necessitate a lot of invention and opinions, the fact that it *looks* like a meme is immaterial, and might be an effort to avoid people taking it too seriously.
Who knows where it ends up, but we will see more of this and whatever it is will have lessons learned from Gas Town in it.
It's kinda like how edgy political takes are often wrapped in seven layers of meta-irony. If the audience reaction is negative you can say it was just a joke that didn't land.
And that's not necessarily a bad thing, if it allows exploring new ideas with relative safety. I think that's what's going on here. It's a crazy idea that might just work, but if it doesn't work it can be retconned as satirical performance art.
For example, in the US, which do you think uses more water: Golf Courses or Data Centers?
a) Gold Courses use twice as much water as Data Centers
b) About the same
c) Data Centers use twice as much water as Gold Courses
The answer is "None of the above": "Golf courses in the U.S. use around 500 billion gallons annually of water to irrigate their turf [snip] data centers consume [snip] 17 billion gallons, or maybe around 10x that if we include water use from energy generation"
Do you think a Google search or a Gemini query produces more carbon?
> Google had estimated that a single web search query produces 0.2 grams of CO2 emissions. [snip] the median Gemini LLM app query produces a surprisingly low 0.03 grams of CO2 emissions), and uses less energy than watching 9 seconds of television
It might be an internet meme when you're talking about the odd chatgpt free tier query, but burning through tokens at the rate of gas town can probably saturate a rack worth of GPU.
It's a real open source tool Yegge has built and been using for a while now. And no, it's not insane, he's literally written a book with Gene Kim about the fundamental lessons that go into it, and he's been on lots of podcasts where he explains more.
I expect major companies will soon be NIH-ing their own version of it. Even bleeding tokens as it does, the cost is less than an engineer, and produces working software much faster. The more it can be made to scale, the more incentive there is. A competitive business can't justify not using a system like this.
> If it's not a joke... I have no words. You've all gone insane.
How is it insane to jump to the logical conclusion of all of this? The article was full of warnings, its not a sensible thing to do but its a cool thing to do. We might ask whether or not it works, but does that actually matter? It read as an experiment using experimental software doing experimental things.
Consider a deterministic life form looking at how we program software today, that might look insane to it and gastown might look considerably more sane.
Everything that ever happens in human creation begins as a thought, then as a prototype before it becomes adopted and maybe (if it works/scales) something we eventually take for granted. I mean I hate it but maybe I've misunderstood my profession when I thought this job was being able to prove the correctness of the system that we release. Maybe the business side of the org was never actually interested in that in the first place. Dev and business have been misaligned with competing interests for decades. Maybe this is actually the fit. Give greater control of software engineering to people higher up the org chart.
Maybe this is how we actually sink c-suite and let their ideas crash against the rocks forcing c-suite to eventually become extremely technical to be able to harness this. Instead of today's reality where c-suite gorge on the majority of the profit with an extremely loosely coupled feedback loop where its incredibly difficult to square cause and effect. Stock went up on Tuesday afternoon did it? I deserve eleventy million dollars for that. I just find it odd to crap on gastown when I think our status quo is kinda insane too.
No, not a joke. The author also co-vibe-coded a book, called Vibe Coding, describing and recommending exactly the sort of system he's trying to build as Gas Town.
I'm developing concern for Steve. He's been a well known developer and writer in the industry for years now (See his popular 'Google Platforms Rant' essay from years ago) [0].
Now, Yegge's writing tilts towards the grandoise... see his writing when joining Grab [1] and Sourcegraph [2] respectively versus how things actually played out.
I prefer optimism and I'm not anti AI by any means, but given his observed behavior and how AI can't exacerbate certain pathologies... not great. Adding the recent crypto activities on top and all that entails is the ingredients for a powder keg.
He was right about Google in [1] when I was still drinking the Kool-Aid, in big and tangible ways that aren't discussed publicly.
[2] is 100% accurate, Grok was the backbone / glue of Google's internal developer tools.
I don't disagree on the current situation, and I'm uncomfortable sticking my neck out on this because I'm basically saying "the guy who kinda seems out of it, totally wasn't out of it, when you think he was", but [1] and [2] definitely aren't grandiose, the claims he makes re: Google and his work there are accurate. A small piece of why I feel comfortable in this, is that both of these were public blogs his employer was 100% happy about when hiring him to top positions.
I should be specific. I think the technical analysis is reasonable and I actually enjoy someone staking on a big vision, which is why I saved these pieces.
An example:
"I’ve seen Grab’s hunger. I’ve felt it. I have it. This space is win or die. They will fight to the death, and I am with them. This company, with some 3000 employees I think, is more unified than I’ve seen with most 5-person companies. This is the kind of focused camaraderie, cooperation and discipline that you typically only see in the military, in times of war.
Which should hardly surprise you, because that’s exactly what this is. This is war.
I am giving everything I’ve got to help Grab win. I am all in. You’d be amazed at what you can accomplish when you’re all in."
This is the writing of someone planning to make a capstone career move instead of leaving in 18 months. It's not the worst thing to do (He says he left b/c the time difference to support a team in SE Asia was hard physically, and he's getting older) and I support taking big swings. I'm just saying Yegge's writing has a pattern.
Crypto and what Yegge is doing with $GAS is dangerous because if the token price crashes and people betting their life savings think he didn't deliver on his promises... I like Steve personally which is why I'm saying anything.
This appears to be the coin in question: https://coinmarketcap.com/currencies/gas-town/ - up 222,513.21% in the past week! (And down 25.26% in the last 24 hours. But... suppose it goes back up again?!)
I have no interest in using gas town as it is (for a plethora of reasons, not the least of which being that I'm uninterested in spending the money), but I've been fascinated with the idea of slowing it down and having it run with a low concurrency. If you've got a couple A100s, what does it look like if you keep them busy with two agents working concurrently (with 20+ agents total)? What does it mean to have the town focus the scope of work to a series of non-overlapping changesets instead of a continuous stream of work?
If you don't plan to have it YOLO stuff in realtime and you can handle the models being dumber than Claude, I think you can have it do some really practical, useful things that are markedly better than the tools we have today.
However, the gas town one was almost completely hands off. I think my only interventions were due to how beta it was, so I had to help it work around its own bugs to keep from doing stupid things.
Other than that, it implemented exactly what I asked for in a workable fashion with effectively one prompt. It would have taken several prompts and course corrections to get the same result without it.
Other than the riskyness (it runs in dangerous permissions mode) and incredible cost inefficiency, I'd certainly use it.
Based on my initial read, and a pass at this summary, it seems mostly right. YMMV
Did some further dives into the little public usage data from Gas Town, and found that most of the "Beads" are tasks that are broken down quite small, almost too small imo.
Super interesting project with the goal of keeping Claude "busy" however it feels more like a casino game than something I'd use for production engineering.
[0]https://gist.github.com/jumploops/2e49032438650426aafee6f43d...
The problem is, we're just fidgeting yolo-fizzbuzz ad nauseam.
The return on investment at the moment is probably one of the worst in the history of human investments.
AI does improve over time, still today, but we're going to run out of planet before we get there...
I have trouble seeing LLMs making meaningful progress on those frontiers without reaching ASI, but I'd be happy to be wrong.
Not sure I love what it does all the time, it tends to fit whatever box you setup and will easily break out if you aren’t veeeery specific. Is it better than writing a few thousand lines of code myself that I deeply understand that can debug and explain? I don’t know yet. I think it’d be good for writing functions one at a time with massive supervision.
It’s great for writing scripts and things where precision and correctness outside the success path isn’t really needed. If a script fails and it wasn’t deleting a hard drive who cares. If my embedded code fails out in a product in the wild this is a much bigger nuisance and potentially fatal for the device (not the humans) which is wasteful.
Steve has gone "a bit" loopy, in a (so far) self aware manner, but he has some kind of insight into the software engineering process, I think. Yet, I predict beads will break under the weight of no-supervision eventually if he keeps churning it, but some others will pick up where he left off, with more modest goals. He did, to his credit, kill off several generations of project before this one in a similar category.
It was also one of my favorite posts of his and has aged incredibly well as my experience has grown.
Already happening :-) https://github.com/Dicklesworthstone/beads_rust
The other area I'd like to see some software engineering thinking that's more open ended is on regression testing: ways of storing or referencing old versions of texts to see if the agent can complete old transformations properly even with a context change that patches up a weakness in a transformation that is desirable.
This is not something you'd want for every context, but a lot of my effort is spent building up prompt fragments to normalize and clean up the code coming out of a model that did some ad-hoc work that meets the test coverage bar, which constrains it decently into having achieved "something." Kind of like a prototype. But often, a lot of ungratifying massaging is required to even cover the annoying but not dangerous tics of the LLM, to bring clarity to where it wrote, well, very bad and unprincipled code...as it does sometimes.
I've seen 25-30 similar efforts to make a Beads alternative and they all do this for some reason.
Other than that, this is a helpful list especially for someone who hasn't been hacking around on this thing as it's in rapid development mode. I find gas town super interesting, and tantalizingly close to being amazingly useful. That said, I wouldn't mind a slightly less 'flavored' set of names for workers.
The problem with this phenomenon is that the same freedom from critique that is seemingly necessary for new domains to establish themselves also detaches them from necessary criticism. There's simply no way to tell if this isn't a load of baloney. And by the time it's a bullet point requirement on CVs to get employed it's too late for anybody to critique it.
Certain name types are so normalized (agent, worker, etc) that while they serve their role well, they likely limit our imagination when thinking about software, and it's a worthwhile effort to explore alternatives.
In games, what the NPCs can do is usually rather dumb. Move and shoot is usually most of their functionality. This keeps the overhead down so the system is affordable.
Gas Town may be a step towards AIs which have an ongoing sense of what they're doing. I'm not going to get into the "consciousness" debate, but it's closer to liveness.
I had a bit of a chuckle.
I think there is value in anything approximating a proposer-verifier loop, but I don't know that this is the most ideal approach.
I don't think they're doing a good job incubating their ideas into being precise and clearly useful -- there is something to be said about being careful and methodical before showing your cards.
The message they are spreading feels inevitable, but the things they are showing now are ... for lack of better words, not clear or sharp. In a recent video at AI Engineer, Yegge comments on "the Luddites" - but even for advocates of the technology, it is nigh impossible to buy the story he's telling from his blog posts.
Show, don't tell -- my major complaint about this group is that they are proselytizing about vibe coding tools ... without serious software to show for it.
Let's see some serious fucking software. I'm looking for new compilers, browsers, OSes -- and they better work. Otherwise, what are we talking about? We're counting foxes before the hunt.
In any case, wouldn't trying to develop a serious piece of software like that _at the same time you're developing Gas Town or Loom_ make (what critics might call) the ~Emacs config tweaking for orchestration~ result driven?
In a recent video about Loom (Huntley's orchestration tool), Huntley comments:
"I've got a single goal and that is autonomous evolutionary software and figuring out what's needed to be there."
which is extremely interesting and sounds like great fun.
When you take these ideas seriously, if the agents get better (by hook and crook or RLVR) -- you can see the implications: "grad student descent" on whatever piece of software you want. RAG over ideas, A/B testing of anything, endless looping, moving software.
It's a nightmare for the model of software development and human organization which is "productive" today, but an extremely compelling vision for those dabbling in the alternative.
> Better UIs will come. But tmux is what you have for now. And it’s worth learning.
So brother has 2 claude code accounts and couldn't vibe code a UI, huh?
For example, if Polecat becomes GasTown.WorkerAgent (or GasTown.Worker), then you always have both an unambiguous way and a shorthand-in-context way of referring to the concept.
(For naming conventions when you don't have namespaces as a language feature, use prefixes within the identifier, such as `GasTown_Worker`.)
If GasTown.Worker is implemented with framework Foo, using that framework's Worker concept, GasTown.Worker might have a field named fooWorker of type Foo.Worker. (In the context of the implementation of GasTown, the unqualified name always means the GasTown concept, and you always disambiguate concepts from elsewhere that use the sane generic or similar terms.)
Complicated names like GasTown.MaintenanceManagerCheckerAgent might need some creative name shortening, but hopefully are still descriptive, or easy to pick up and remember. Or, if the descriptive and distinguishing name was complicated because the concept is a weird special case within the framework, maybe consider whether it should be rethought.
Spec your software like an architect/po, decompose it into a task dag, then orchestrate for each lane and assemble all change sets in a merge branch rather than constantly repointing head.
Ridiculous. Beads might be passable software but gas town just appears to be a good way to burn tokens at the moment
If it's not a joke... I have no words. You've all gone insane.
These chatbots create an echo chamber unlike that which we've ever had to deal with before. If we thought social media was bad, this is way worse.
I think Gastown and Beads are examples of this applied to software engineering. Good software is built with input from others. I've seen many junior engineers go off and spend weeks building the wrong thing, and it's a mess, but we learn to get input, we learn to have our ideas critiqued.
LLMs give us the illusion of pair programming, of working with a team, but they're not. LLMs vastly accelerate the rate at which you can spiral spiral down the wrong path, or down a path that doesn't even make sense. Gastown and Beads are that. They're fever dreams. They work, somewhat, but even just a little bit of oversight, critique, input from others, would have made them far better.
People will make mistakes, and AI holding their hand and guiding them while they do it can have disastrous consequences.
But it's nice that the arrows will appear to also guide people going the right way I guess.
The problem with Gas Town is how it was presented. The heavy metaphor and branding felt distracting.
It’s a bit like reading the Dune book, where you have to learn a whole vocabulary of new terms before you can get to the interesting mechanics, which is a tough ask in an already crowded AI space.
The best bit about it was the agentic coding maturity model he presented. That was actually great.
I don't think it's at all like reading Dune. Dune is creative fiction, Gastown is. Oh ok wait, if you consider Gastown to be creative fiction then I guess I agree. As a software tool though I don't think this analogy works.
Lot of folks rolling their own tools as replacements now. I shared mine [0] a couple weeks ago and quite a few folks have been happy with the change.
Regardless of what you do, I highly recommend to everyone that they get off the Beads bandwagon before it crashes them into a brick wall.
[0] https://github.com/wedow/ticket
All I know is beads is supposed to help me retain memory from one session to the next. But I'm finding myself having to curate it like a git repo (and I already have a git repo). Also it's quite tied to github, which I cannot use at work. I want to use it but I feel I need to see how others use it to understand how to tailor it for my workflow.
"Carefully review this entire plan for me and come up with your best revisions in terms of better architecture, new features, changed features, etc. to make it better, more robust/reliable, more performant, more compelling/useful, etc.
For each proposed change, give me your detailed analysis and rationale/justification for why it would make the project better along with the git-diff style changes relative to the original markdown plan".
Then, the plan generally iteratively improves. Sometimes it can get overly complex so may ask them to take it down a notch from google scale. Anyway, when the FSD doc is good enough, next step is to prepare to create the beads.
At this point, I'll prompt something like:
"OK so please take ALL of that and elaborate on it more and then create a comprehensive and granular set of beads for all this with tasks, subtasks, and dependency structure overlaid, with detailed comments so that the whole thing is totally self-contained and self-documenting (including relevant background, reasoning/justification, considerations, etc.-- anything we'd want our "future self" to know about the goals and intentions and thought process and how it serves the over-arching goals of the project.) Use only the `bd` tool to create and modify the beads and add the dependencies. Use ultrathink."
After that, I usually even have another round of bead checking with a prompt like:
"Check over each bead super carefully-- are you sure it makes sense? Is it optimal? Could we change anything to make the system work better for users? If so, revise the beads. It's a lot easier and faster to operate in "plan space" before we start implementing these things! Use ultrathink."
Finally, you'll end up with a solid implementation roadmap all laid out in the beads system. Now, I'll also clarify, the agents got much better at using beads in this way, when I took the time to have them create SKILLS for beads for them to refer to. Also important is ensuring AGENTS.md, CLAUDE.md, GEMINI.md have some info referring to its use.
But, once the beads are laid out then its just a matter of figuring out, do you want to do sequential implementation with a single agent or use parallel agents? Effectively using parallel agents with beads would require another chapter to this post, but essentially, you just need a decent prompt clearly instructing them to not run over each other. Also, if you are building something complex, you need test guides and standardization guides written, for the agents to refer to, in order to keep the code quality at a reasonable level.
Here is a prompt I've been using as a multi-agent workflow base, if I want them to keep working, I've had them work for 8hrs without stopping with this prompt:
EXECUTION MODE: HEADLESS / NON-INTERACTIVE (MULTI-AGENT) CRITICAL CONTEXT: You are running in a headless batch environment. There is NO HUMAN OPERATOR monitoring this session to provide feedback or confirmation. Other agents may be running in parallel. FAILURE CONDITION: If you stop working to provide a status update, ask a question, or wait for confirmation, the batch job will time out and fail.
─────────────────────────────────────────────────────────────────────────────── MULTI-AGENT COORDINATION (MANDATORY) ─────────────────────────────────────────────────────────────────────────────── ─────────────────────────────────────────────────────────────────────────────── FILE RESERVATIONS (CRITICAL FOR MULTI-AGENT) ─────────────────────────────────────────────────────────────────────────────── ─────────────────────────────────────────────────────────────────────────────── THE WORK LOOP (Strict Adherence Required) ───────────────────────────────────────────────────────────────────────────────* ACTION: Immediately continue to the next bead in the queue and claim it
─────────────────────────────────────────────────────────────────────────────── CONSTRAINTS & OVERRIDES ───────────────────────────────────────────────────────────────────────────────I've been tinkering with it for the past two days. It's a very real system for coordinating work between a plurality of humans and agents. Someone likened it to kubernetes in that it's a complex system that is going to necessitate a lot of invention and opinions, the fact that it *looks* like a meme is immaterial, and might be an effort to avoid people taking it too seriously.
Who knows where it ends up, but we will see more of this and whatever it is will have lessons learned from Gas Town in it.
And that's not necessarily a bad thing, if it allows exploring new ideas with relative safety. I think that's what's going on here. It's a crazy idea that might just work, but if it doesn't work it can be retconned as satirical performance art.
> If it's not a joke... I have no words. You've all gone insane.
I think this is covered by the part in Yegge's post where he says not to run it unless you're so rich you don't care if it works or not.
For example, in the US, which do you think uses more water: Golf Courses or Data Centers?
The answer is "None of the above": "Golf courses in the U.S. use around 500 billion gallons annually of water to irrigate their turf [snip] data centers consume [snip] 17 billion gallons, or maybe around 10x that if we include water use from energy generation"Do you think a Google search or a Gemini query produces more carbon?
> Google had estimated that a single web search query produces 0.2 grams of CO2 emissions. [snip] the median Gemini LLM app query produces a surprisingly low 0.03 grams of CO2 emissions), and uses less energy than watching 9 seconds of television
https://www.deeplearning.ai/the-batch/issue-336/
I expect major companies will soon be NIH-ing their own version of it. Even bleeding tokens as it does, the cost is less than an engineer, and produces working software much faster. The more it can be made to scale, the more incentive there is. A competitive business can't justify not using a system like this.
How is it insane to jump to the logical conclusion of all of this? The article was full of warnings, its not a sensible thing to do but its a cool thing to do. We might ask whether or not it works, but does that actually matter? It read as an experiment using experimental software doing experimental things.
Consider a deterministic life form looking at how we program software today, that might look insane to it and gastown might look considerably more sane.
Everything that ever happens in human creation begins as a thought, then as a prototype before it becomes adopted and maybe (if it works/scales) something we eventually take for granted. I mean I hate it but maybe I've misunderstood my profession when I thought this job was being able to prove the correctness of the system that we release. Maybe the business side of the org was never actually interested in that in the first place. Dev and business have been misaligned with competing interests for decades. Maybe this is actually the fit. Give greater control of software engineering to people higher up the org chart.
Maybe this is how we actually sink c-suite and let their ideas crash against the rocks forcing c-suite to eventually become extremely technical to be able to harness this. Instead of today's reality where c-suite gorge on the majority of the profit with an extremely loosely coupled feedback loop where its incredibly difficult to square cause and effect. Stock went up on Tuesday afternoon did it? I deserve eleventy million dollars for that. I just find it odd to crap on gastown when I think our status quo is kinda insane too.
Now, Yegge's writing tilts towards the grandoise... see his writing when joining Grab [1] and Sourcegraph [2] respectively versus how things actually played out.
I prefer optimism and I'm not anti AI by any means, but given his observed behavior and how AI can't exacerbate certain pathologies... not great. Adding the recent crypto activities on top and all that entails is the ingredients for a powder keg.
Hope someone is looking out for him.
[0] https://courses.cs.washington.edu/courses/cse452/23wi/papers...
[1] https://steve-yegge.medium.com/why-i-left-google-to-join-gra...
[2] https://sourcegraph.com/blog/introducing-steve-yegge
[2] is 100% accurate, Grok was the backbone / glue of Google's internal developer tools.
I don't disagree on the current situation, and I'm uncomfortable sticking my neck out on this because I'm basically saying "the guy who kinda seems out of it, totally wasn't out of it, when you think he was", but [1] and [2] definitely aren't grandiose, the claims he makes re: Google and his work there are accurate. A small piece of why I feel comfortable in this, is that both of these were public blogs his employer was 100% happy about when hiring him to top positions.
An example:
"I’ve seen Grab’s hunger. I’ve felt it. I have it. This space is win or die. They will fight to the death, and I am with them. This company, with some 3000 employees I think, is more unified than I’ve seen with most 5-person companies. This is the kind of focused camaraderie, cooperation and discipline that you typically only see in the military, in times of war.
Which should hardly surprise you, because that’s exactly what this is. This is war.
I am giving everything I’ve got to help Grab win. I am all in. You’d be amazed at what you can accomplish when you’re all in."
This is the writing of someone planning to make a capstone career move instead of leaving in 18 months. It's not the worst thing to do (He says he left b/c the time difference to support a team in SE Asia was hard physically, and he's getting older) and I support taking big swings. I'm just saying Yegge's writing has a pattern.
Crypto and what Yegge is doing with $GAS is dangerous because if the token price crashes and people betting their life savings think he didn't deliver on his promises... I like Steve personally which is why I'm saying anything.
https://hn.algolia.com/?sort=byDate&type=comment&dateRange=a...
(and I realize the GP was the place the line started getting crossed)