FR version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
58% Positive
Analyzed from 9036 words in the discussion.
Trending Topics
#code#more#software#llm#coding#don#going#engineering#llms#same

Discussion (197 Comments)Read Original on HackerNews
Makes me want to just give up programming forever and never use a computer again.
If LLMs stop improving at the pace of the last few years (I believe they already are slowing down) then they will still manage to crank out billions lines of code which they themselves won’t be able to grep and reason through, leading to drop in quality and lost revenue for the companies that choose to go all-in with LLMs.
But let’s be realistic - modern LLMs are still a great and useful tool when used properly so they will stay. Our goal will be to keep them on track and reduce the negative impact of hallucinations.
As a result software industry will move away from large complex interconnected systems that have millions of features but only a few of them actively used, to small high quality targeted tools. Because their work will be easier to verify and to control the side effects.
Depending on how you measure "improvement" they already have or they never will :-/
Measuring capability of the model as a ratio of context length, you reach the limits at around 300k-400k tokens of context; after that you have diminishing returns. We passed this point.
Measuring capability purely by output, smarter harnesses in the future may unlock even more improvements in outputs; basically a twist on the "Sufficiently Smart Compiler" (https://wiki.c2.com/?SufficientlySmartCompiler=)
That's the two extremes but there's more on the spectrum in between.
[0] https://en.wikipedia.org/wiki/Wirth%27s_law
The system that makes it have an opinion about good vs bad architecture or engineering sensibilities will be something on top of the transformer and probably something more deterministic than a prompt.
Second, LLM code can be less of a hot mess than human written code if you put in the time to train/prompt/verify/review.
Generating perfect well patterned SOLID and unit tested code with no warnings or anti-patterns has never been easier.
It's not immediate, it still takes weeks if you want to actually do QA and roll out to prod, but it's definitely better than the pre-LLM alternatives.
With the right investment, we could certainly have tooling that creates and maintains very good designs out of the box. My bet is that we'll continue chasing quick and hacky code, mostly because that's the majority of the code that it was trained on, and because the majority of people seem to be interested in a quick result vs a long-term maintainable one.
a) The stuff output by the existing LLMs is too unwieldy even for them to handle , even if the product itself is a glorified chatbot.
b) If all software is throwaway, then the value of all software drops to, effectively, the price of an AI subscription. We'll all be drowning in a market of lemons (https://en.wikipedia.org/wiki/The_Market_for_Lemons), whilst also being producers in said market.
https://en.wikipedia.org/wiki/Dune:_The_Butlerian_Jihad
We are used to thinking about software like in the article, a program that runs deterministically in an OS. Where we are headed might be more like where the LLM or AI system is the OS, and accomplishes things we want through a combination of pre-written legacy software, and perhaps able to accomplish new things on the fly.
Whether that happens or not is a different question, but I believe that's what they're suggesting.
How many of us remember that VSCode is actually a browser wrapped inside a native frame?
The new standard, Web Apps. Why update 3 seperate binaries for Win/Lin/Mac when you can do 1 for a web framework and call it a day?
As a piece of meat, I look forward to charge rates of $10,000 an hour, to fix code out the vibe code generation.
With such a low baseline, there is an optimistic perspective that LLMs could improve the situation. LLMs can produce excellent code when prompted or reviewed well. Unlike human employees, the model does not worry about getting a 'partially meets expectations' rating or avoid the drudgery of cleaning up other people's code.
--
It's just as likely that people will be surprised that we used to have billions of lines of human generated code, that no LLM ever approved.
To make my comment more on-topic: why do you think this is going to be the case? What newer LLMs will be trained on?
LLMs aren’t the first thing to come along and change how people develop applications.
You had the rise of frameworks like Django, Rails, etc. Also the rise of SPAs. And also the rise of JS as a frontend+backend language.
In a 3-5 yeats we’ll have adapted to the new norm like we have in the past
Also, companies are pressuring employees towards adoption in novel ways. There was no such industry-wide pressure by employers in the 90s, 2000s or 2010s for engineers to use a specific tech.
I use AI tools daily (because they feel like they're helping me) but it's not exactly hard to imagine scenarios where an explosion of slop piling up plus harm to learning by outsourcing all thinking results in systemic damage that actually slows the pace of technological progress given enough time.
History of new technologies tend to average into a positive trend over a long enough time scale but that doesn't mean there aren't individual ups and downs. Including WTF moments looking back at what now seems like baffling decision-making with benefit of hindsight.
If it is, the fall out will be way worse than if AI ends up living up to (reasonable) expectations.
If it doesn’t, we are going to see over a trillion dollars of capital leave the tech sector, which I think will have worse impacts on the livelihood of tech workers than if AI ends up panning out.
This is something the naysayers need to grapple with. We’ve crossed a line where this tech needs to work simply because of the amount of money depending on that fact.
Plenty of engineers have loose (or no!) standards and practices over how they write coee. Similarly, plenty of engineering teams have weak and loose standards over how code gets pushed to production. This concept isn't new, it's just a lot easier for individuals and teams who have never really adhered to any sort of standards in their SDLC to produce a lot more code and flesh out ideas.
I personally don’t know any colleagues who were good engineers just because they wrote code faster. The best engineers I know were ones who drew on experience and careful consideration and shared critical insights with their team that steered the direction of the system positively.
> Claude, engineer a system for me, but do it good. Thanks!
Same, if anything, the opposite seems to be true, the ones that I'd call "good engineers" were slower, less panicked when production was down and could reason their way (slowly) through pretty much anything thrown at them.
Opposite experience, I've sit next to developers who are trying their fastest to restore production and then making more mistakes to make it even worse, or developers who rush through the first implementation idea they had for a feature, missing to consider so many things and so on.
Unfortunately, a lot of workplaces are ignoring this, believing their engineers are assembly line workers, and the ones who complete 10 widgets per minute are simply better than the ones who complete 5 widgets per minute.
However, the best engineers I know are usually among the quickest to open an editor or debugger and use it fluently to try something out. It's precisely that speed that enables a process like "let's try X, hmm, how about Y, no... ok, Z is nice; ok team, here are the tradeoffs...". Then they remember their experience with X, Y, and Z, and use it to shape their thinking going forward.
Meanwhile, other engineers have gotten X to finally mostly work and are invested in shipping it because they just want to be done. In my experience, this is how a lot of coding agents seem to act.
It's not obvious to me how to apply the expert loop to agentic coding. Of course you can ask your agent to try several different things and pick the best, or ask it to recommend architectural improvements that would make a given change easier...
> Of course you can ask your agent to try several different things and pick the best, or ask it to recommend architectural improvements that would make a given change easier
The ideal solution increasingly seems to be encoding everything that differentiates a good engineer from a bad engineer into your prompt.
But at that point the LLM isn’t really the model as much as the medium. And I have some doubts that LLMs are the ideal medium for encoding expertise.
The Pragmatic Programmer book has whole chapters about this. Ultimately, you either solve the problem analogously (whiteboard, deep thinking on a sofa). Or you got fast as trying out stuff AND keeping the good bits.
Vibe coding: one shot or few shot, smoke test the output, use it until it breaks (or doesn't). Ideal for lightweight PoC and low stakes individual, family or small team apps.
Agentic engineering: - You care about a larger subset of concerns such as functional correctness, performance, infrastructure, resilience/availability, scalability and maintainability. - You have a multi-step pipeline for managing the flow of work - Stages might be project intake, project selection, project specification, epic decomposition, d=story decomposition, coding, documentation and deployment. - Each stage will have some combination of deterministic quality gates (tests must pass, performance must hit a benchmark) and adversarial reviews (business value of proposed project, comprehensiveness of spec, elegance of code, rigor and simplicity of ubiquitous language, etc)
And it's a slider. Sometimes I throw a ticket into my system because I don't want to have to do an interview and burn tokens on three rounds of adversarial reviews, estimating potential value and then detailed specification and adversarial reviews just to ship a feature.
I've been using Opus, GPT-5.5, and some lesser models on a daily basis, but not having them handle entire tasks for me. Even when I go to significant effort to define and refine specs, they still do a lot of dumb things that I wouldn't allow through human PR review.
It would be really easy to just let it all slide into the codebase if I trusted their output or had built some big agentic pipeline that gave me a false sense of security.
Maybe 10 years from now the situation will be improved, but at the current point in time I think vibe coding and these agentic engineering pipelines are just variations of a same theme of abdicating entirely to the LLM.
This morning I was working on a single file where I thought I could have Opus on Max handle some changes. It was making mistakes or missing things on almost every turn that I had to correct. The code it was proposing would have mostly worked, but was too complicated and regressed some obvious simplifications that I had already coded by hand. Multiply this across thousands of agentic commits and codebases get really bad.
If the code doesn't compile, that's easy to spot. If the code compiles but doesn't work, that's still somewhat easy to spot.
If the code compiles and works, but it does the wrong thing in some edge case, or has a security vulnerability, or introduces tech debt or dubious architectural decisions, that's harder to spot but doesn't reduce the review burden whatsoever.
If anything, "truthy" code is more mentally taxing to review than just obviously bad code.
The current fever pitch mandates from above seem to want it applied liberally, and pushing back against that is so discouraging and often career-limiting as to wear the fabric of one's psyche threadbare. With all the obvious problems being pointed out to people, there are just as many workarounds; and these workarounds, as is often revealed shortly thereafter, have their own problems, which beget new solutions, ad infinitum.
At some point it genuinely seems like all this work is for the sake of the machine itself. I suppose that is true: The real goal has become obscured at so many firms today, that all that remains is the LLM. Are the people betting the farm and helping implement the visions of those who have done so guaranteed a soft exit to cushion them from the consequences, or is rationality really being discarded altogether?
Sure, sound engineering principles can help work around these problems, but what efficiency is truly gained, in terms of cognitive load, developer time, money, or finite resources? Or were those ever an earnest concern?
The degenerate side is clueless upper management and fad-driven engineering. We have talked extensively about this.
There is a more rational side to it that I've seen in my org: some engineers absolutely refuse to use AI and as a consequence they are now, clearly and objectively, much less productive than other engineers. The thing is, you still need to learn how to use the tool, so a nontrivial percentage of obstinate engineers need to be driven to use this in the same way that some developers have refused to use Docker or k8s or whatever.
If I get pwned because my AI agent wrote code that had a security vulnerability, none of my users are going to accept the excuse that I used AI and it's a brave new world. I will get the blame, not Anthropic or OpenAI or Google but me.
The same goes for if my AI generated code leads to data loss, or downtime, or if uses too many resources, or it doesn't scale, or it gives out error messages like candy.
The buck stops with me and therefore I have to read the code, line-by-line, carefully.
It's not even a formality. I constantly find issues with AI generated code. These things are lazy and often just stub out code instead of making a sober determination of whether the functionality can be stubbed out or not.
You could say "just AI harder and get the AI to do the review", and I do this a lot, but reviewing is not a neutral activity. A review itself can be harmful if it flags spurious issues where the fix creates new problems. So I still have to go through the AI generated review issue-by-issue and weed out any harmful criticism.
It is so embarrassing that LOC is being used as a metric for engineering output.
It's still useful, however, because that is the only metric that is instantly intuitively understandable and comparable across a wide variety of contexts, i.e. across companies and teams and languages and applications.
As we know, within the same team working on the same product, a 1000 LoC diff could take less time than a 1 line bug fix that took days to debug. Hence we really cannot compare PRs or product features or story points across contexts. If the industry could come up with a standard measure of developer productivity, you'd bet everyone would use it, but it's unfeasible basically for this very reason.
So, when such comparisons are made (and in this case it was clearly a colloquial usage), it helps to assume the context remains the same. Like, a team A working on product P at company C using tech stack T with specific software quality processes Q produced N1 lines of code yesterday, but today with AI they're producing N2 lines of code. Over time the delta between N1 and N2 approximates the actual impact.
(As an aside, this is also what most of the rigorous studies in AI-assisted developer productivity have done: measure PRs across the same cohorts over time with and without AI, like an A/B test.)
I have worked with code where 1000s of lines are very straightforward and linear.
I’ve worked on code where 100 lines is crucial and very domain specific. It can be exceptionally clean and well-commented and it still takes days to unpack.
The skills and effort required to review and understand those situations are quite different.
One is like distance driving a boring highway in the Midwest: don’t get drowsy, avoid veering into the indistinguishable corn fields, and you’ll get there. The other is like navigating a narrow mountain road in a thunderstorm: you’re 100% engaged and you might still tumble or get hit by lightning.
So I’m pretty skeptical that reviewing 2000 lines of code won’t take any more time than reviewing 200 lines of code.
Furthermore how do you know the AI generated lines are the open highway lines of code and not the mountain road ones? There might be hallucinations that pattern match as perfectly reasonable with a hard to spot flaw.
I rewrote the same program using my own brain and just using ChatGPT as google and autocomplete (my normal workflow), I produced the same thing in 1500 LOC.
The effort difference was not that significant either tbh although my hand coded approach probably benefited from designing the vibe coded one so I had already though of what I wanted to build.
My experience was the same as you when I started using agents for development about a year ago. Every time I noticed it did something less-than-optimal or just "not up to my standards", I'd hash out exactly what those things meant for me, added it to my reusable AGENTS.md and the code the agent outputs today is fairly close to what I "naturally" write.
We should have gone the other way; generated a lot of code and demanded pay raises; look at the LOC I cranked out! Company is now in my debt!
If they weren't going to care enough as managers to learn and line go up is all that matters to them, make all lines go up = winning
You all think there's more to this than performative barter for coin to spend on food/shelter.
Although this requires you to take pride in your profession and what you do.
Do you reject all stats that treat the number of people involved (eg. 2 million pepole protested X) as "embarrassing" ... because they lump incredibly varied people together and pretend they're equal?
We're also assuming LOC vibe coded by competent engineers who should be able to tell when something is overengineered.
If we shift the paradigm of how we approach a coding problem, the coding agents can close that gap. Ten years ago every 10 or 15 minutes I would stop coding and start refactoring, testing, and analyzing making sure everything is perfect before proceeding because a bug will corrupt any downstream code. The coding agents don't and can't do this. They keep that bug or malformed architecture as they continue.
The instinct is to get the coding agents to stop at these points. However, that is impossible for several reasons. Instead, because it is very cheap, we should find the first place the agent made a mistake and update the prompt. Instead of fixing it, delete all the code (because it is very cheap), and run from the top. Continue this iteration process until the prompt yields the perfect code.
Ah, but you say, that is a lot of work done by a human! That is the whole point. The humans are still needed. The process using the tool like this yields 10x speed at writing code.
You could get to "something that works" rather fast but it took a long time to 1) evaluate other options (maybe before, maybe after), 2) refine it, 3) test it and build confidence around it.
I think your point stands but no one really knows where. The next year or so is going to be everyone trying to figure that out (this is also why we hear a lot of "we need to reinvent github")
I believe the llm providers went with the wrong approach from the off - the focus should’ve been on complementing labour not displacement. And I believe they have learned an expensive lesson along the way.
Because most of the complexity in software comes from interfacing with external components, when you don't need to adapt to this you can write simpler and better code.
Rather than relying on an external library, you just write your own and have full control and can do quality control.
Linux kernel is 30 000 000 LOC. At 100 tokens /s, let's say 1 LOC per second produced for a single 4090 GPU, in one year of continuous running 3600 * 24 * 365 = 31 536 000 everyone can have its own OS.
It's the "Apps" story all over again : there are millions of apps, but the average user only have 100 max and use 10 daily at most.
Standardize data and services and you don't need that much software.
What will most likely happen is one company with a few millions GPUs will rewrite a complete software ecosystem, and people will just use this and stop doing any software because anything can be produced on the fly. Then all compute can be spent on consistent quality.
But using an agentic LLM to complete boilerplate is attractive simply because we've created a mountain of accidental and intentional complexity in building software. It's more of a regression to the mean of going back to the cognitive load we had when we simply built desktop applications.
Opus 4.7 built it about 90% the same way I would, but had way more convenience methods and step-validations included.
It's great, and really frees me up to think about harder problems.
Just having ~13yrs experience heavily weighted in one language with some formal studying of others makes directing llms a lot simpler.
Learning syntax, primitives, package managers, testing, etc isn't that much of a lift compared to how I used to program.
Was helping a non-dev colleague who's using claude cowork/code to automate reporting the other day. They understand the business intelligence side well, but were struggling with basic diction to vibe code a pyautogui wrapper to pull up RDP and fill out a MS Access abstraction on a vendor DB.
Think we'll be fine for another 5-10 years as a profession
It's seriously the thing that worries (and bothers) me the most. I almost never let unedited LLM comments pass. At a minimum.
Most of the time, I use my own vibe-coded tool to run multiple GitHub-PR-review-style reviews, and send them off to the agent to make the code look and work fine.
It also struggles with doing things the idiomatic way for huge codebases, or sometimes it's just plain wrong about why something works, even if it gets it right.
And I say this despite the fact that I don't really write much code by hand anymore, only the important ones (if even!) or the interesting ones.
Also, don't even get me started on AI-generated READMEs... I use Claude to refine my Markdown or automatically handle dark/light-mode, but I try to write everything myself, because I can't stand what it generates.
"Ugh, no! Why would you say it like that? That's not even how it works! Now, I need to write a full paragraph instead of a short snippet to make sure that no future agents get confused in the same way."
Property-based testing in particular has uncovered a number of invariants in every code base I've introduced it to.
tbf depending on the agent/model a lot of the tests end up being thrown out so it's possible I _should_ handwrite more tests, but having better prompts and detailed plans seems to mitigate that somewhat
It's the bad, semi-coherent submissions that eat up your time, because you do want to award some points and tell students where they went wrong. It's the Anna Karenina principle applied to math.
Code review is the same thing. If you're sure Claude wrote your endpoint right, why not review it anyway? It's going to take you two minutes, and you're not going to wonder whether this time it missed a nuance.
Can agentic engineers adhere to a similar code of ethics that a professional engineer is sworn to uphold?
https://www.nspe.org/career-growth/nspe-code-ethics-engineer...
Can software engineers?
> If another team hands over something and says, “hey, this is the image resize service, here’s how to use it to resize your images”... I’m not going to go and read every line of code that they wrote.
The distance of accountability of the output from its producer is an important metric. Who will be held accountable for which output: that's important to maintain and not feel the "guilt".
So, organizations would need to focus on better and more granular building incentives and punishment mechanisms for large-scale software projects.
Claude Code in particular seems really uninterested in this aspect of the problem and I've stopped using entirely because of this.
No one is suggesting that.
There are certain codebases and pieces of code we definitely want every line to be reasoned and understood. But like his API endpoint example, no reason to fuss with the boilerplate.
This has definitely been my shift over the past few months, and the advantage is I can spend much more time and energy on getting the code architecture just right, which automatically prevents most of the subtle bugs that has people wringing their hands. The new bar is architecting code to be defined as well as an API endpoint->service structure so you can rely on LLMs to paint by numbers for new features/logic.
Spend a lot more time on architecting and testing than hand rolling most repos now.
Hats off to people who enjoy the minutia of programming everything by hand, but turns out I enjoy the other aspects of software development more.
So the number of bugs to find remains constant but the amount of code to review scales with the capability of the agent.
[1] https://github.com/mohsen1/tsz
e.g, I change velocity of player to '200' and of bullets to '300', and it only updated the bullet velocity. Then told me the player was already 'at the correct value' even though it was set to 150. Things like that.. :)
How do you manage/orchestrate this? I'm genuinely curious.
I believe this is a common fault of not being able to zoom out and look at what trade offs are being made. There’s always trade-offs, the question is whether you can define them and then do the analysis to determine whether the result leaves you in a net benefit state.
> But I’m not reviewing that code. And now I’ve got that feeling of guilt: if I haven’t reviewed the code, is it really responsible for me to use this in production?
Answer: it wholly depends upon what management has dictated be the goal for GenAI use at the time.
There seems to be a trend of people outside of engineering organizations thinking that the "iron triangle" of software (and really, all) engineering no longer holds. Fast, cheap, good: now we can pick all three, and there's no limit to the first one in particular. They don't see why you can't crank out 10x productivity. They've been financially incentivized to think that way, and really, they can't lose if they look at it from an "engineer headcount" standpoint. The outcomes are:
1) The GenAI-augmented engineer cranks out 10x productivity without any quality consequences down the line, and keeps them from having to pay other people
or
2) The GenAI-augmented engineer cranks out 10x productivity with quality consequences down the line, at which point the engineer has given another exhibit in the case as to why they should no longer be employed at that organization. Let the lawyers and market inertia deal with the big issues that exist beyond the 90-day fiscal reporting period.
Either way, they have a route to the destination of not paying engineers, and that's the end goal.
If you don't like that way of running a software engineering organization, well, you're not alone, but if nothing else, you could use GenAI to make working for yourself less risky.
Like many people I have used AI to generate crap I really don't care about. I need an image. Generate something like, whatever. Great hey a good looking image! No that's done I can do something I find more interesting to do.
But it's slop. The image does not fit the context. Its just off. And you can tell that no one really cared.
This isn't good.
Do this enough times, and I will have forgotten how to think.
I think this highlights a problem that has always existed under the surface, but it's being brought into the light by proliferation of vibeslop and openclaw and their ilk. Even in the beforetimes you could craft a 100.0% pure, correct looking github repo that had never stood the test of production. Even if you had a test suite that covers every branch and every instruction, without putting the code in production you aren't going to uncover all the things your test suite didn't--performance issues, security issues, unexpected user behavior, etc.
As an observer looking at this repo, I have no way to tell. It's got hundreds of tests, hundreds of commits, dozens of stars... how am I to know nobody has ever actually used it for anything?
I don't know how to solve this problem, but it seems like there's a pretty obvious tooling gap here. A very similar problem is something like "contributor reputation", i.e. the plague of drive-by AI generated PRs from people (or openclaws) you've never seen before. Stars and number of commits aren't good enough, we need more.
My side project is 80% vibe code. Every now and then I look and see all the bad stuff, then I scold Codex a bit and it refactors it for me. So I do see the author's point.
I took a rock carving course in school that really enlightened me about software engineering, and it still applies today, especially to AI. You can't just decide what you want to carve, hold the chisel in just the right spot, and whack it with a hammer just perfectly so all the rock you want falls away leaving a perfect statue behind.
"I saw the angel in the marble and carved until I set him free." -Michelangelo
It's a long drawn out iterative process of making millions of tiny little chips, and letting the statue inside find its way out, in its natural form, instead of trying to impose a pre-determined form onto it.
Vibe coding is hoping your first whack of the hammer is going to make a good statue, then not even looking at the statue before shipping it!
But AI assisted conscientious coding (or agentic engineering as Simon calls it) is the opposite of that, where you chip away quickly and relentlessly, but you still have to carefully control where you chisel and what you carve away, and have an idea in your mind what you want before you start.
Just piggy backing on this post since I'm early:
Would love to see your take on how the AI and Django worlds will collide.
I'm not checking the code since the code doesn't really matter anymore anyways - I just have the agent write passing tests for the changes or additions I make, and so even if something breaks I can just point to the tests.
Some days, the tickets are completed much faster than I expect and I don't hit my daily token expenditure goal, so I have my own custom harness that actually hooks up an agent to TikTok, basically it splits up the reel into 1 second increments and then feeds those frames to the LLM for it's own consumption. I can easily burn 10m tokens a day on this, and Claude seems to enjoy it.
Personally I want to thank you Simon for putting me onto this "vibe engineering" concept, I really didn't expect an archaeology major like myself to become a real engineer but thanks to AI now I can be! Truly gatekeeping in tech is now dead.
Rather, I just feel like I have to constantly remind myself of the impermanence of all things. Like snow, from water come to water gone.
Perhaps I put too much of my identity in being a programmer. Sure, LLMs cannot replace most us in their current state, but what about 5 years, 10 years, ..., 50 years from now? I just cannot help be feel a sense of nihilism and existential dread.
Some might argue that we will always be needed, but I am not certain I want to be needed in such a way. Of course, no one is taking hand-coding away from me. I can hand-code all I want on my own time, but occupationally that may be difficult in the future. I have rambled enough, but all and all, I do not think I want to participate in this society anymore, but I do not know how to escape it either.
The job, as you have done it at least, was also not here 50 years before you started doing it.
Did you have any of the same feelings knowing that you were doing a job that has not existed in the world very long? That seems like a strange requirement for a meaningful job, that it should remain the same for 50+ years.
In truth, our world and what we do for our careers is entirely shaped by the time that we live in. Even people that ostensibly do the same thing people have done for centuries (farmer, teacher, etc) are very different today than 100 years ago.
I don't buy this argument at all. I think if we could pay $20/month to a service that would send over a junior plumber/carpenter/electrician with an encyclopedic knowledge of the craft, did the right thing the majority of the time, and we could observe and direct them, we'd all sign up for that in a heartbeat. Worst case, you have to hire an experienced, expensive person to fix the mess. Yes, I can hear everyone now, "worst case is they burn your house down." Sure, but as we're reminded _constantly_ when we read stories about AI agent catastrophes -- a human could wipe your prod database too. wHy ArE yOu HoLdInG iT tO a DiFfErEnT sTaNdArD???
The business side of the house is getting to live that scenario out right now as far as software goes. Sure you've got years of expertise that an LLM doesn't have _yet_. What makes you think it can't replace that part of your job as well?
I don’t think this comparison quite works (or maybe I think it works and is wrong) and I think it has something to do with creativity or the initial ideation.
I would do this, but I’m a jack of all trades. I built my own diner booth in my kitchen recently. But my wife, who loves the diner booth, just doesn’t really want to get over the hump of figuring out what she might want. I think most people want to offload the mental load of figuring out where to start.
Most people aren’t just bored by coding, they’re bored or overwhelmed by the idea of thinking about software in the first place. Same with plumbing or construction, most people aren’t hiring someone to direct, they’re hiring a director.
Even I have this about some things, sometimes I choose to outsource the full stack of something to give me more space to do creativity elsewhere.
But that's not what the author is talking about in that passage you quoted. What he's saying is that, if you can pay $20 for an AI plumber, then it stands to reason that eventually you will be able to pay $30 to a company that manages AI plumbers for you, so that you don't even have to go to the trouble of supervising the plumber. Most people will choose the $30.
The implication here is software engineer jobs are still safe despite basically free labor/material being available to do said jobs because he thinks other people would prefer to pay experienced professionals to do it right at a significantly higher cost. My point is, I think most people will take the low-stakes gamble of having the cheap AI agent do it with self-supervision[0]. He's naive in thinking people are really going to care about artisanal software built by experienced professionals in the future.
0: Even if you subscribe to the "your job will be to supervise the agents" train of thought, you're kinda glossing over the fact that it's probably gonna involve a pretty significant pay cut and the looming problem of "how do new experienced professionals get created if they don't have to/don't need to get their hands dirty"?
And AI generated code should be different than human code. AI has infinite memory for details. AI doesn’t need organizational patterns like classes. Potentially AI can write code that is more performant than any human.
Will it look like garbage? Sure. Will the code be more suited to the task? Yes.
The code produced will only be understandable by AI. You could use locally hosted LLMs, but it won't be as performant as AI run by big guys. And there is nothing stopping greedy companies implementing some ridiculous pattern that only their model can reasonably work with.
So what you'll do in situation when you can't understand "your" codebase and you have to make changes or fix a bug?
Code that is organized well and operates coherently in the first place, by an LLM or not, will be easier to iterate on, by an LLM or not.
No, just no.