ZH version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
60% Positive
Analyzed from 4099 words in the discussion.
Trending Topics
#code#more#claude#llm#llms#don#still#agents#something#using

Discussion (81 Comments)Read Original on HackerNews
https://www.trigosec.com/insights/mob-programming-for-one/
The short version is that I don’t let AI agents work unsupervised on my code. I treat them like participants in a mob programming session instead of autonomous developers. Different agents get different roles (implementer, reviewer, architect, security reviewer, etc.), and I stay involved throughout the process.
I also agree with your point about architecture. Generating isolated components is relatively easy; preserving and evolving the architectural boundaries across a larger codebase is much harder.
We’re still missing a good way to express and measure architectural quality. Until then, architecture heavy work requires much closer supervision than implementation heavy work
Architectural complexity[1]! There’s several really good papers on this.
Unfortunately it never caught on and we don’t have great automated tools to spit out a number. Also the majority of people just don’t care enough. Research in this field kinda died out when we invented microservices and started treating those as a silver bullet to The Architecture Problem (it’s not [2])
[1] https://swizec.com/blog/why-taming-architectural-complexity-...
[2] https://youtu.be/y8OnoxKotPQ
Yet! It is the next frontier and we will need it for having agent as described in the post to really work
I wonder if OS maintainers would have a leg up in defining workflows to better leverage this. Of course, OS contributors are autonomous developers, but maybe a trick or two might transfer across
i would not want to go down the "take myself out of the loop" path because yes, i do have to micromanage the claude session, often course-correcting every commit and then doing large scale refactoring every so often. but i'm perfectly happy doing that - i see claude as more of a tool than a coder i can hand work off to.
then i looked at the code and asked it to benchmark, hinting that it looked like it was doing a lot in the inner loop. and sure enough, adding a few simple graphics to every page more doubled the time it took to generate the largest size of document (~1s -> ~2.2s for ~400 pages). without any more prompting claude figured out that it had an accidentally-quadratic loop, and fixed that.
i then had to tell it "look, we are using a template to avoid regenerating boilerplate with every page. you can add a placeholder to the template and replace it with graphics using xml patching code you already wrote for another part of the doc generation". the final code was a lot cleaner and ran in ~1.2s, which claude (again unprompted, to its credit) did fine-grained benchmarking to prove was the unavoidable overhead of simply inserting all those large chunks of xml into the document.
i wouldn't even say it was a coincidence that i ran into this right after writing my comment about having to micromanage the LLM, because this sort of thing happens all the time. i can say that i had a much easier time doing this because i looked at the code generated in a single commit and could easily see that it smelt off. i would have not have wanted to do this at the end of 20 commits all building on each other.
The complete log of all prompts and commits is here: https://demo.buildermark.dev/projects/u020uhEFtuWwPei6z6nbN
https://demo.buildermark.dev/projects/u020uhEFtuWwPei6z6nbN/...
still show content of page 1
I clicked that link first even though it’s listed second bc I wanted to see the prompts. I didn’t expect the level of detail or mapping to each commit. It is rad!
That being said the landing page is soooo obviously “vibe coded” (read: AI generated).
It has that design style that Claude likes to ~ab~use. & if I’m being honest, had I clicked on the website link first, I would never have gotten to the demo bc I would’ve just dismissed it as AI slop.
1. We've done it by hand for another route already, which the LLM uses as reference
2. Theres a strong validation setup/harness I've setup for it with storybooks, and component tests
3. It's a _mostly_ mechanical transform. Not entirely, as the two environments/APIs are not 1:1, but it's close enough
But! I and my team are still reviewing everything shrug it is "faster" because I get to have this running while I'm in meetings planning other more interesting projects
And this isn't really that many agents in parallel. Yeah, plenty of fan-out subagents, but that IMO doesn't count/isn't really the same as what others are talking about
Your team could have done it pre-AI, but you just thought it was hard so you didn't try.
I remember migrating a code base from MySQL to SQL Server in the 2010s. I thought it would take me weeks, if not months. It took me a couple of days.
Immediately made me sour on the "hot" idea in the 2010s that your data layer should be provider agnostic so you could switch if you needed to. That was never a real thing, it was a made up justification for unnecessary over-engineering, by people who had clearly never tried to port an app from one data source to another. There are other reasons for a clear separation, but switching a few hundred SQL statements is not it.
In reality, mechanical ports are not that hard, you can sit down, put some music on and blitz it in a few days. Programmers just over-estimate how hard they will be.
Its genuinely weird to have you say that so confidently lol
Using LLMs in a larger scope can sometimes work, but it has the real risk of turning a project into a mess after which you will have to undo the work and lose a lot of time.
Also, using LLMs this way with less clear boundaries will make reading and maintaining the code more cumbersome.
Me when meeting management expectations, agent orchestration tools like Boomi and Workato calling into tools, doing with AI what a few years ago would be done with BPEL.
It's taken _a lot_ of time and effort, but this is an example of what can be developed using LLMs alone.
You have to have dedication and a goal to reach, but you can absolutely build anything if you're building with the right foundations in mind.
What do you think the productivity gain was from using an LLM? This question assumes you’re already an experienced developer.
In fact, it's far beyond what I would even attempt, because I've just spent two decades building up a data bank of how hard things are supposed to be.
He doesn't know it's supposed to be hard, so he just does it.
In terms of velocity, let me offer some numbers. In 6 months I generated >150k lines of code and merged 10k PRs to ship and iterate on https://plotalong.app
I follow best practices and isolate agents to continuously deployed dev environments, semi-manually review PRs and gate the release process between multiple protected envs. The project is getting close to 500 end-to-end tests in Playwright.
That’s just working nights and weekends. Before AI, it took my team at the office 4 years to produce this much work. There are some qualitative differences but the speed and results are real
I'm from a hardware / networking / infrastructure background. I've had extensive exposure to (web) application development as I'm working closely with development teams and I do have the bash/powershell scripting knowledge.
But honestly, if I tried this "the old fashioned way" it probably would have taken me about 6 to 7 years to develop that application, that's an optimistic estimate. You really do have to have a passion for what you're building, I didn't know that voice transcription and local LLMs would be such a driving force for me, but it's all I think about, so much that I find it hard to go to sleep sometimes.
I also think that writting large codebases into a sort of functional transformer tree as information compression stage would allow them to easier reason about large code bases by having a large lossless overview with minimal token usage.
The pipelines and data serving design was all human since it did have to deal with some data scale but the javascript/api layer was all slop, and it seems fine and good.
If you have a really high quality piece of code that needs to meet a high bar of quality/reliability, then I think the risk of letting the AI loose on it is very high and I wouldn't do it. If you have a pile of code you already know is a pile of garbage despite being human written, well, it can't get much worse :)
I also built an agent orchestration meta harness that runs on k8s and uses the k8s agents sandbox for running codex/claude code in the cloud. This was almost entirely just handed over to Fable and I have not asked a single architectural detail. The quality of this product is mediocre, but the fact that it largely works after I went through a few iterations of clicking around is impressive. I would have preferred to buy something off the shelf, but nothing even really came close (though maybe now I would have forked Omnigent)
https://youtu.be/-QFHIoCo-Ko?is=FYYdukWluYX3vdQL
Worth a watch.
I sign off the code I merged, part of company policy but also just to be sure it is actually decent. But reviewing has become the real draining bottleneck: even stacked PRs, if that total 5-6k lines is not a 5min job. Even if I brainstormed and set the plan, that's really the part that doesn't scale right now for me in this. But the author is very shy about that: either the changes arent that big in the end or they trust the process enough to review in a more casual manner. Being equally untrusting I can't do that ...
They can be still be applied now using coding agents, if you're willing to push back against the default setup and change your mode of thinking a little bit. Of course it doesn't help that an entire industry is dedicated to persuading us that maximizing token spend is the only way to get shit done.
I appreciate this probably seems like an extremist take, but I wrote some more about it here in case there's anybody out there who identifies with it:
https://philbooth.me/blog/agentic-coding-and-mental-models
Yeah the problem is the executives and managers around us are demanding we ship massive features as quickly as possible, and I like having a job and dread having to find a new one in this market...
I want less code to maintain not more that I don't even fully understand.
I think research and very supervised coding with lots of guardrails is the way to actually gain productivity from these tools.
There rarely is a single correct way of implementing some requirement or feature. It’s a trade-off between compromises, not binary correct or incorrect like a Sudoku puzzle. The insights that the exploration give you may even lead you to implement something significantly different from what you originally set out to.
This is not about LLMs, by the way. It’s about reviewing any code, including by a fellow human. It’s just that many people mistakenly feel like with LLMs they can lower their guard and accept even if they have not gone through the steps of themselves coming up with their solution and comparing it to the one suggested by the LLM.
The reason is that many correctly see proper review as duplicate work, and while it is justified with another human (because it is (A) instructive and (B) reducing bus factor) with LLMs most people simply can’t be bothered. If you personally can, you are a minority.
My current process is also using Github projects in a normal scrum style way, with many tickets written or fleshed out and state managed by the LLM, and it doubling as the memory system
Completely leapfrogging all these other open and closed source concoctions and being more effective
But its effective enough that I don’t need OP’s final form state of still approving everything
Auto-mode is fine. Worktrees are built into Claude Code now. I just tell it to classify tickets as sequential or parallel possible and spawn subagents to tackle all of the tickets in the todo list
They all get their own context window its pretty perfect now
in the meantime I work in a couple tabs of Claude Design for different flows of any client side app. My philosophy has been that devs could pick up graphic and UI/UX design easily, its just still a full time job to make variations of layouts and portray their states.
UI/UX is not a full time job anymore.
And I use Claude chat to flesh out aspects of the overall idea
I think you may be overcomplicating your workflow in the concluding state.
Overall I agree that planning and intention is now most of the time, before a 10 subagent precision strike is initiated
I shudder when I hear about some people's (wildly overcomplicated) setups. I get the allure but there's something nice about pair programming with an LLM in a singular chat.
Both are giving me skillsets to excel in the other domain
I watch the subagents, push back on some choices, look at commits and glance at pull requests
It is not control freak behavior to want to be in control when you are the one accountable for it if it breaks.
Sorry access to an LLM (even if it could center a div reliably and make a responsive designs, it can't) does not give you taste, intuition or make you good at building user interfaces. You people/sloppers have no idea the amount of sweat that gets poured into great UX.
Its insulting when you people say these things and Im not even a designer or frontend dev.
I actually think UI/UX designers and devs will be the last to fall. I will want beautiful products that were built by beautiful minds, thats how you will set yourself apart from the slop. And fortunately it will be even easier when 80% of everything is half assed cranked out UI by llm design tools. The contrast is already glaring.
It will be shrinking. Less grunt work.
Internal projects can get done with less of either.
Nobody really cares about great UX or about how great someone can implement a CRUD app.
So there will be less need/fighting over such resources.
If I can just generate a usable UI for a hobby project I don't need to find some company to build it out. Sure, it will miss out on a lot of stuff but it's a trade off.
If someone else can build a product and needed a basic web shop / crud app, they don't need to find someone to implement that at a massive overcharge.
Claude Design has barely been out for a month
And it’s fulfilled my needs better than v0, lovable, playwright via LLM or just iterating in the coding LLM. I’ve worked with graphic designers my whole career and have also contracted design agencies to do style guides and collaborate on branding and layouts. I’ve gotten the output that I’m looking for with Claude Design
eventually you’ll see examples but its not in my purview to publicly link any of my projects as being vibe coded
>I want to start by saying that I’m neither an AI-fanatic
Kind of like saying you are a fanatic before saying you aren't.
I don't think theres too much here (e.g. "spec driven development") I haven't seen elsewhere.
Isn't that the rhyme here. I can't think of any article or discussion on AI here that contains anything new or noteworthy. And yet all those articles we've read before and all those "discussions" we've had before keep coming and coming. I have gotten bored and I'm just waiting for anything decisive to happen.
First nobody sane want to give their domain IP to OpenAI/Anthropic. That's why local AI will eventually prevail and flourish because people who actually have some IP will have no problem to buy 10k+ EUR machine to run some pretty good models on it. However if your main job is just doing CRUD stuff, then you are screwed.
Secondly hallucination is really Achilles heel of every LLM. Sure you can recreate an application which exists in thousand of variations on the internet, but the moment you will try to go more into domain knowledge you will start struggling more and more.
Try to make CAN driver for ESP32, easy it is probably going to work. Try to make CAN driver for STM32F7xx now the AI will start having a problem but probably will be able to produce something what is working after a lot of debugging. Now let's make CAN driver for MPC5555. AI will start writing fairy tales about registers which do not exist. All of processor above have reference manuals and sometimes example git repositories available on open internet.
Pretty much the whole industry has zero problem giving OpenAI/Anthropic full access to their systems and codebases.
You're putting way more thoughts into it than the vast majority, most companies seem to go with the momentum
Replace OpenAI/Anthropic with AWS and this is not too dissimilar to the arguments in 2009 about cloud providers.
It’s not that there's nobody for whom this is true, it’s just that there’s enough of everyone else to build an empire with.
But those everyone else are racing to the bottom because all their ideas are being soaked up by AI and then being given to their competitors on a silver platter as AI output.
There are problems when you rely too much on AI generated code, but these shallow dismissals are quite annoying.
1. There can be massive differences between chips which sounds plausibly same and thanks to the way how LLM is working, models are mangling these variations together
2. Registers are often named in very way similar across different manufacturers so models are making up registers in MPC5555 which are coincidentally registers in Renesas processors doing same thing.
3. There are no standard in reference manuals, sometimes there are literally missing chunks of knowledge thanks to translation to English or there are pieces which you can only get from Application Notes which has code as a screenshot.
And then you will find out that all those descriptions are wrong and through trial and error you will get it working in 2 weeks time.
Bonus point: Random people having public Git repositories for obscure processors, but with bad or completely non working implementation of drivers for them. However LLM will just output variation of this garbage on you, because there are 3 public repositories on the whole internet. Sometimes I have a feeling that this must be on purpose to poison the well.
okay? then give those reference manuals and git repositories? I haven't heard something know LLMs can't get around and figure out?