ZH version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
66% Positive
Analyzed from 3222 words in the discussion.
Trending Topics
#gas#town#code#beads#something#claude#agent#thing#every#software

Discussion (77 Comments)Read Original on HackerNews
The important audit at my company is conducted by the FDA.
I have a feeling when they ask what processes we followed to mitigate any user harm that could be caused by software changes that "I told an AI-mayor in the form of a cartoon fox what to do and he spit out a bunch of vibecode software written by AI-driven virtual cartoon characters" is not among the answers they want to hear.
And those cartoon foxes didn't even do anything! I guess these ones do?
Don't put it past the masses. These are crazy times.
> the thing must be in the place where it should be
With no further information e.g. what place, where, how, when, who facilitates that?
> the person who facilitates it, is the person who facilitates it.
Yea thanks. So their ISO accredited process was basically no process. Would have been way better with a talking fox.
So I feel like humans are capable of just as bad. I'd be interested in what answer the Fox could spit out and I kinda wonder where it might fit on the bell curve of all non Gas-Town "auditable" processes. I'm all for skepticism but I feel like it would be more tangible if we instead criticised the response instead of just conjuring it as "definitely awful" because it happens to be on top of a generated stack.
I mean: I don't want it to work, but maybe we're not as good as we think we are, or the stuff we rate as super important is actually way less important with a generated context. As much as I love good code, the thought that gnaws at the back of my head is the truism that some of the most profitable code in history has been some of the "worst" code (e.g. MySpace's janky code base ontop of ColdFusion or Twitter's "Fail Whale" era).
So I'm happy that someone is exploring this space in an open way. I'm just glad I'm not the one finding that out with my face first.
Could work
The sanatorium from American Horror Story Asylum comes to mind.
Dominique, nique, nique…
Let's assume that managing context well is a problem and that this kind of orchestration solves it. But I see another problem with agents:
When designing a system or a component we have ideas that form invariants. Sometimes the invariant is big, like a certain grand architecture, and sometimes it's small, like the selection of a data structure. Eventually, though, you want to add a feature that clashes with that invariant. At that point there are usually three choices:
* Don't add the feature. The invariant is a useful simplifying principle and it's more important than the feature.
* Add the feature inelegantly or inefficiently on top of the invariant. Hey, not every feature has to be elegant or efficient.
* Go back and change the invariant. You've just learnt something new that you hadn't considered that put things in a new light, and there's a better approach.
Often, only one of these is right. Usually, one of these is very, very wrong, and with bad consequences.
But picking among them isn't a matter of context. It's a matter of judgment and the models - not the harnesses - get this judgment wrong far too often (they go with what they know - the "average" of their training - or they just don't get it). So often, in fact, that mistakes quickly accumulate and compound, and after a few bad decisions like this the codebase is unsalvageable. Today's models are just not good enough (yet) to create a complete sustainable product on their own. You just can't trust them to make wise decisions. Study after study and experiement after experiment show this.
Now, perhaps we make better judgment calls because we have context that the agent doesn't. But we can't really dump everything we know, from facts to lessons, and that pertains to every abstraction layer of the software, into documents (and even if we could, today's models couldn't handle them). So even if it is a matter of context, it is not something that can be solved with better context management. Having an audit trail is nice, but not if it's a trail of one bad decision after another.
It is pretty magical to go from brainstorming an idea in the evening, having ChatGPT Pro spit out a long list of beads to implement it, leaving it running over night in a totally empty repo and waking up to a mostly-implemented project.
So I slapped together my own Beads implementation (https://codeberg.org/mutablecc/dingles) over a day or two. Probably has bugs, and I'm sure race conditions if you tried to use with Gas Town, and likely does not scale. But it has the minimum functionality needed to create and track issues and sync them (locally and remotely, either via normal merge, or a dedicated ticket branch). No SQL, no extra features, just JSONL and Git. Threw a whole large software project at it, and the AI took to it like a duck to water, used it to make epics for the whole project, methodically worked through them all, dependencies first, across multiple context sessions. The paradigm of making tools the AI wants to use is clearly a winner.
At this point it should be clear that Gas Town has done something we can evaluate the value of.
There should be no shortage of examples the creator could provide, unless of course...
This all being said, I do find the idea interesting, but heeded it's advice when it said it's hideously expensive and risky to use. So yes, I do want someone braver, richer, and stupider than me to take the first leap
Just because he's operating in the realm of smart nerds doesn't mean he is immune to the value-inverting effects of social media.
I imagine it doesn't run very cheaply.
But LLMs are trying to mimic people. So if confusion is the human response, what's to stop the llm from acting confused?
Interesting:
> Kubernetes asks “Is it running?” Gas Town asks “Is it done?” Kubernetes optimizes for uptime. Gas Town optimizes for completion.
https://embracingenigmas.substack.com/p/exploring-gas-town
The real distinction is of scale - whether you want a REST endpoint or a fully functional word processor.
But real, actual, complex software is at least half spec (either explicit, or implicitly captured by its code), the question is, can LLMs specify software to the same degree with Gas Town, that you get something functioning?
You provided a quote from someone who seems to be an AI-boosting influencer who claimed to use it, but where's the output in the form of code we can look at, or in the form of an app someone can use today?
I'm not an AI-denier. I use LLMs and agentic coding. They increase my productivity.
...but there is still a very real problem with people claiming that some new way of using AI is earth shattering, and changes everything based on vague anecdotes that don't involve a tangible released output that they can point to.
I wound up building my own with Claude, I made it SQLite first, syncs to GitHub, can pull down from GitHub, and I added "Gates" to stopgap Claude or whatever agent from marking things complete if they've not been: compiled, unit tests run, or simple human testing / confirmation. The Gates concept improved my experience with Claude, all too often it says it finished something, when in fact it did not. Every task must have a gate, and gates must pass before you can close a task. Gates can be reused across tasks, so if "Run unit tests" is one gate, you can reuse it for every task, when it passes, it passes for that one task <-> gate combination.
Anyway, I'm happy for Beads, Gas Town not so much my wheelhouse on the other hand.
How did you implement gates? Are they simply tasks Claude itself has to confirm it ran, or are they scripts that run to check that the thing in question actually happened, or do they spawn a separate AI agent to check that the thing happened, or what?
In a nutshell, a gate is a entry in the DB with arbitrary text, Claude is good about following whatever it is. Claude trying to close a task will force it to read it.
Life's gotten slightly busy, but you can see more on the repo. I've been debating giving it a better name, I feel like GuardRails implies security, when the goal is just to validate work slightly.
https://github.com/Giancarlos/GuardRails
It seems like a lot of coding agent features work that way?
I think Yegge's instincts that making a programmable / editable coordination layer (he calls this gas city) is a great idea. Gas town early days was definitely a wild experience in terms of needing to watch carefully lest your system be destroyed, and then I put that energy into OpenClaw - I'll probably spin up Gas City and see what it can do soon though. Very cool.
So now you have agents of type mayor, polecats, witnesses, deacons, dogs etc plus a slew of Unneeded constructs with incomprehensible names.
In one of the blog post for gas town I remember reading something by the author along the lines of « it’s super inefficient, but because you burn so many tokens, you still get what you want at the end! » clearly this is also the design philosophy behind this project, just (get your ai to) throw more random abstractions and more agent types until you feel like it kinda works, don’t bother asking yourself if they actually contribute anything.
This gave me the very clear feeling that most of the complexity of gas town is absolutely not needed and probably detrimental.
Ended up building my own thing that is 10x simpler, just a simple main agent you talk to, that can dispatch subagents, they all communicate, wake each other up and keep track of work through a simple CLI. No « refinery » or « wasteland » or « molecule » or « convoys » or « deacons » or …
You won't get 10k stars and a blog post out of that. Obviously you need some Stoats who have Conferences with the Stump Lord to determine whether they are needed at the Silo or the Bilge. They'll then regroup at the appropriate Decision Epicenter and delegate to the Weasels and Chipmunks who actually do the coding (antiquated term) in the Salt Mine.
The Stump Lord is an owl.
Seems like I'm back to obscurity.
:)
I tried tracking down where those numbers came from and the sources were a bit sketchy. Can anybody who has used Gas Town confirm those numbers, or report their personal numbers?
Lines of code per hour is a bad metric.
Can it solve a problem using production quality code that doesn’t take four times as long to review? That sounds like something I would pay $100 for.
Cost per line of code is not an amazing metric, but at least it's an attempt to come up with a figure.
(I would also be interested to find out how much it costs to run...)
Unfortunately I think things are moving so fast that by the time such a study was done, we would already be on to newer models and newer versions of gas town.
I mean, under the same logic couldn't we kinda argue that TV has ruined the planet? A lot of energy for something of debatable physical value. OR Motor racing, football, The Olympic games? All that energy and waste just to find out who can throw a stick the furthest every four years.
At least, that’s what I would do, if I had any interest in testing out gastown with my own money. If my employer wants to pay for the testing, that’s another question entirely.
This is a desirable end state for highly social but perhaps slightly sociopathic extroverts who want to spend all day talking even though they aren't talking to a person.
For anyone else, it's hard to imagine considering that a desirable way to spend eight hours a day.
Thoughtful critique is of course fine but there's no need to be personal, and it should be something we can learn from.
https://news.ycombinator.com/newsguidelines.html
[0] https://en.wikipedia.org/wiki/Steve_Yegge#Vibe_coding_and_cr...