Back to News
Advertisement
Advertisement

⚡ Community Insights

Discussion Sentiment

58% Positive

Analyzed from 6042 words in the discussion.

Trending Topics

#code#don#more#same#tests#loc#llms#write#llm#abstraction

Discussion (130 Comments)Read Original on HackerNews

btrettelabout 13 hours ago
Similar to bragging about LOC, I have noticed in my own field of computational fluid dynamics that some vibe coders brag about how large or rigorous their test suites are. The problem is that whenever I look more closely into the tests, the tests are not outstanding and less rigorous than my own manually created tests. There often are big gaps in vibe coded tests. I don't care if you have 1 million tests. 1 million easy tests or 1 million tests that don't cover the right parts of the code aren't worth much.
CJeffersonabout 11 hours ago
Yes, I've found tests are the one thing I need to write. I then also need to be sure to keep 'git diff'ing the tests, to make sure claude doesn't decide to 'fix' the tests when it's code doesn't work.

When I am rigourous about the tests, Claude has done an amazing job implementing some tricky algorithms from some difficult academic papers, saving me time overall, but it does require more babysitting than I would like.

eru39 minutes ago
You might want to look into property based testing, eg python-hypothesis, if you use that language. It's great, and even finds minimal counter-examples.
Tuna-Fishabout 11 hours ago
Give claude a separate user, make the tests not writable for it. Generally you should limit claude to only have write access to the specific things it needs to edit, this will save you tokens because it will fail faster when it goes off the rails.
LelouBilabout 9 hours ago
Don't even need a separate user if you're on linux (or wsl), just use the sandbox feature, you can specify allowed directories for read and/or write.

The sandbox is powered by bubblewrap (used by Flatpaks) so I trust it.

senkoabout 5 hours ago
The “red/green TDD” (ie. actual tdd) and mutation testing (which LLMs can help with) are good ways to keep those tests under control.

Not gonna help with the test code quality, but at least the tests are going to be relevant.

eru37 minutes ago
If you start with the failing tests, you can use them plus the spec to give to review to another agent (human or silicon).

It's a bit like pre-registering your study in medicine.

WalterBrightabout 6 hours ago
The trick is crafting the minimal number of tests.
bodegajedabout 11 hours ago
it is like reward hacking, where the reward function in this case the test is exploited to achieve its goals. it wants to declare victory and be rewarded so the tests are not critical to the code under test. This is probably in the RL pre-training data, I am of course merely speculating.
colechristensenabout 12 hours ago
It's a struggle to get LLMs to generate tests that aren't entirely stupid.

Like grepping source code for a string. or assert(1==1, true)

You have to have a curated list of every kind of test not to write or you get hundreds of pointless-at-best tests.

btrettelabout 12 hours ago
What I've observed in computational fluid dynamics is that LLMs seem to grab common validation cases used often in the literature, regardless of the relevance to the problem at hand. "Lid-driven cavity" cases were used by the two vibe coded simulators I commented on at r/cfd, for instance. I never liked the lid-driven cavity problem because it rarely ever resembles an actual use case. A way better validation case would be an experiment on the same type of problem the user intends to solve. I think the lid-driven cavity problem is often picked in the literature because the geometry is easy to set up, not because it's relevant or particularly challenging. I don't know if this problem is due to vibe coders not actually having a particular use case in mind or LLMs overemphasizing what's common.

LLMs seem to also avoid checking the math of the simulator. In CFD, this is called verification. The comparisons are almost exclusively against experiments (validation), but it's possible for a model to be implemented incorrectly and for calibration of the model to hide that fact. It's common to check the order-of-accuracy of the numerical scheme to test whether it was implemented correctly, but I haven't seen any vibe coders do that. (LLMs definitely know about that procedure as I've asked multiple LLMs about it before. It's not an obscure procedure.)

colechristensenabout 10 hours ago
Both of these points seem like they would be easy to instruct an LLM to shape its testing strategy.
theshrike79about 3 hours ago
> You have to have a curated list of every kind of test not to write

This should be distilled into a tool. Some kind of AST based code analyser/linter that fails if it sees stupid test structures.

Just having it in plain english in a HOW-TO-TEST.md file is hit and miss.

gpmabout 12 hours ago
> have a curated list of every kind of test not to write

I've seen a lot of people interact with LLMs like this and I'm skeptical.

It's not how you'd "teach" a human (effectively). Teaching (humans) with positive examples is generally much more effective than with negative examples. You'd show them examples of good tests to write, discuss the properties you want, etc...

I try to interact with LLMs the same way. I certainly wouldn't say I've solved "how to interact with LLMs" but it seems to at least mostly work - though I haven't done any (pseudo-)scientific comparison testing or anything.

I'm curious if anyone else has opinions on what the best approach is here? Especially if backed up by actual data.

jerfabout 11 hours ago
It's going to be difficult for anyone to have any more "data" than you already do. It's early days for all of us. It's not like there's anyone with 20 years of 2026 AI coding assistant experience.

However we can say based on the architecture of the LLMs and how they work that if you want them to not do something, you really don't want to mention the thing you don't want them to do at all. Eventually the negation gets smeared away and the thing you don't want them to do becomes something they consider. You want to stay as positive as possible and flood them with what you do want them to do, so they're too busy doing that to even consider what you didn't want them to do. You just plain don't want the thing you don't want in their vector space at all, not even with adjectives hanging on them.

TeMPOraLabout 11 hours ago
I don't have much data to go on (in accordance with what 'jerf wrote), however I offer a high-level, abstract perspective.

The ideal set of outcomes exist as a tiny subspace of a high-dimensional space of possible solutions. Almost all those solutions are bad. Giving negative examples is removing some specific bits of the possibility space from consideration[0] - not very useful, since almost everything else that remains is bad too. Giving positive examples is narrowing down the search area to where the good solutions are likely to be - drastically more effective.

A more humane intuition[1], something I've observed as a parent and also through introspection. When I tell my kid to do something, and they don't understand WTF it is that I want, they'll do something weird and entirely undesirable. If I tell them, "don't do that - and also don't do [some other thing they haven't even thought of yet]", it's not going to improve the outcome; even repeated attempts at correction don't seem effective. In contrast, if I tell (or better, show) them what to do, they usually get the idea quickly, and whatever random experiments/play they invent, is more likely to still be helpful.

--

[0] - While paradoxically also highlighting them - it's the "don't think of a pink elephant" phenomenon.

[1] - Yes, I love anthropomorphizing LLMs, because it works.

colechristensenabout 10 hours ago
It's not a person. Unlike a person it has a tremendous "memory" of everything ever done its creators could get access to.

If I tell it what to do, I bias it towards doing those things and limit its ability to think of things I didn't think of myself, which is what I want in testing. In separate passes, sure a pass where I prescribe types and specific tests is effective. But I also want it to think of things I didn't, a prompt like "write excellent tests that don't break these rules..." is how you get that.

suzzer99about 14 hours ago
> Generally, though, most of us need to think about using more abstraction rather than less.

Maybe this was true when Programming Perl was written, but I see the opposite much more often now. I'm a big fan of WET - Write Everything Twice (stolen from comments here), then the third time think about maybe creating a new abstraction.

badlucklotteryabout 14 hours ago
>WET - Write Everything Twice

I've always heard this as the "Rule of three": https://en.wikipedia.org/wiki/Rule_of_three_(computer_progra...

hackable_sandabout 11 hours ago
Antymony with DRY
dasil003about 13 hours ago
Totally agree with this, the beauty of software is the right abstractions have untold impact, spanning many orders of magnitude. I'm talking about the major innovations, things like operating systems, RDBMS, cloud orchestration. But the majority of code in the world is not like that, it's just simple business logic that represents ideas and processes run by humans for human purposes which resist abstraction.

That doesn't people from trying though, platform creation is rife within big tech companies as a technical form of empire building and career-driven development. My rule of thumb in tech reviews is you can't have a platform til you have three proven use cases and shown that coupling them together is not a net negative due to the autonomy constraint a shared system imposes.

genxyabout 10 hours ago
That is where I put systems programmers, they need to extract an abstract algebra out of the domain. If they are able to accomplish this, the complexity of the problem largely evaporates.

Use the wrong abstraction and you are constantly fighting the same exact bug(s) in the system. Good design makes entire classes of bugs impossible to represent.

I don't believe the trope that you need to make a bunch of bad designs before you can do good. Those lessons are definitely valuable, but not a requirement.

A great example is the evolution from a layered storage stack to a unified one like ZFS. Or compilers from multipass beasts to interactive query based compilers and dynamic jits.

The design and properties of the system was always the problem I loved solving, sometimes the low level coding puzzles are fun. Much of programming is a slog though, the flow state has been harder and harder to achieve. The super deep bug finding, sometimes, if you satisfactorily found it and fixed it. This is the part where you learn an incredible amount. Fixing shallow cross module bugs is hell.

Don't you have to be really seasoned to in good faith, attempt to couple two systems and say where that would be productive? You can't prove this negative. I would imagine a place like that would have to have a very strong culture of building towards the stated goals. Keeping politics and personalities out of it as much as possible.

marcus_holmesabout 9 hours ago
"Duplication is far cheaper than the wrong abstraction"

Sandi Metz https://sandimetz.com/blog/2016/1/20/the-wrong-abstraction

suzzer99about 8 hours ago
And adding an abstraction later is much easier than removing an unneeded one, which can be very hard or even impossible depending on the complexity of the app.
layer8about 13 hours ago
More than twice is a rather low bar, I don’t think that it conflicts with the quote from Programming Perl.
suzzer99about 11 hours ago
I don't think it's a hard rule, more of an ethos. If you know there are going to be a bunch of something, write the abstraction out of the gate. If you have three code entities with a lot of similar properties, but the app is new and you feel like there's a good chance they might diverge in the future, then leave them separate.
raincoleabout 12 hours ago
I agree. It's crazy how many layers of abstraction have been created since 1991 (when Programming Perl was published.)
nixpulvisabout 13 hours ago
I've been advocating for writing everything twice since college.
jimbokunabout 12 hours ago
That will still result in more abstraction than the average programmer.
HarHarVeryFunnyabout 12 hours ago
Writing twice makes sense if time permits, or the opportunity presents itself. First time may be somewhat exploratory (maybe a thow-away prototype), then second time you better understand the problem and can do a better job.

A third time, with a new abstraction, is where you need to be careful. Fred Brooks ("Mythical Man Month") refers to it as the "second-system effect" where the confidence of having done something once (for real, not just prototype) may lead to an over-engineered and unnecessarily complex "version 2" as you are tempted to "make it better" by adding layers of abstractions and bells and whistles.

wcarssabout 12 hours ago
I agree with what you're saying about writing something twice or even three times to really understand it but I think you might have misunderstood the WET idea: as I understand it, it's meant in opposition to DRY, in the sense of "allow a second copy of the same code", and then when you need a third copy, start to consider introducing an abstraction, rather than religiously avoiding repeated code.
HarHarVeryFunnyabout 11 hours ago
Personally, even for a prototype, I'd be using functions immediately as soon as I saw (or anticipated) I needed to do same thing twice - mainly so that if I want to change it later there is one place to change, not many. It's the same for production code of course, but when prototyping the code structure may be quite fluid and you want to keep making changes easy, not have to remember to update multiple copies of the same code.

I'm really talking about manually writing code, but the same would apply for AI written code. Having a single place to update when something needs changing is always going to be less error prone.

The major concession I make to modularity when developing a prototype is often to put everything into a single source file to make it fast to iteratively refactor etc rather than split it up into modules.

reenorapabout 10 hours ago
As someone who has switched to exclusively coded using AI after 30 years of coding by myself, I find it really weird when people take credit for the lines of code ad features that AI generates. Flexing that one "coded" tens of hundreds of thousands of lines per day is a bit cringe, seeing as though it's really just the prompt that one typed.
strattsabout 10 hours ago
It's a spectrum, isn't it? From targeted edits that you approve manually - which I think you can reasonably take credit for - all the way to full blown vibe-coded apps where you're hardly involved in the design process at all.

And then there's this awkward bit in the middle where you're not necessarily reviewing all the code the AI generates, but you're the one driving the architecture, coming up with feature ideas, pushing for refactors from reading the code, etc. This is where I'm at currently and it's tricky, because while I'd never say that I "wrote" the code, I feel I can claim credit for the app as a whole because I was so heavily involved in the process. The end result I feel is similar to what I would've produced by hand, it just happened a lot faster.

(granted, the end result is only 2000 LoC after a few weeks working on and off)

Centigonalabout 6 hours ago
I think LOC and "writing code" are largely irrelevant as metrics of productivity in a world with LLMs that love to churn out overly loquacious code.

I think the right way to explain the work done sounds something like, "I worked with Claude to create an app that does ______. I know it works because ______."

HarHarVeryFunnyabout 10 hours ago
Meta apparently now has a "leaderboard" for who is using the most AI - consuming the most tokens. Must make Anthropic happy, since Meta is using Claude, and accounts for some significant percentage (10%? 20%?) of their total volume.
WatchDogabout 10 hours ago
Token usage is a different and more sympathetic heuristic than LOC produced.

The metric by itself tells you nothing about what value those tokens produced, but to some extent it represents the amount of thinking you are able to offload to the computer for you.

Wide breadth problems seem to scale well with usage, like scanning millions of LOC of code for vulnerabilities, such as the recent claude mythos results.

HarHarVeryFunnyabout 10 hours ago
The trouble with rewarding token usage is the same as rewarding LOC written/generated - if that's what you are asking for then that is what you will get. Asking the AI to "scan the entire codebase for vulnerabilities" would certainly be a good way to climb the leaderboard!
marcus_holmesabout 9 hours ago
Yes!

I don't mind it so much when it's a newbie or non-techie who has never actually written code before, because bless their hearts, they did it! They got some code working!

But if you've been developing for decades, you know that counting lines of code means nothing, less than nothing. That you could probably achieve the same result in half the lines if you thought about it a bit longer.

And to claim this as an achievement when it's LLM-generated... that's not a boast. That doesn't mean what you think it means.

But I guess we hit the same old problem that we've always had - how do you measure productivity in software development? If you wanted to boast about how an LLM is making you 100x more productive, what metric could you use? LOC is the most easily measurable, really, really, terrible measure that PMs have been using since we started doing this, because everything else is hard.

eucyclosabout 8 hours ago
I forget who said it, but I heard the idea floated that if your work can be measured in terms of productivity at all, it can and probably should be done by software. Not sure how that applies here since as you point out, a 10x programmer probably doesn't produce 10x the code.
bdangubicabout 8 hours ago
here’s one thing that somewhat worked for my team. when we first started using LLMs we decided to run the same process as if they did not exist, same sprint planning meetings, same estimation. we did this for 6 months and saw roughly 55% increase in output compared to pre-LLM usage. there are biases in what were tried to achieve, it is not easy to estimate something will take XX hours when you know some portion (for example writing documentation or portions of the test coverage) you won’t have to write but we did our best. after we convinced ourselves of productivity gains we stopped doing this.
marcus_holmesabout 7 hours ago
wow, great experiment. I'm amazed the whole team went through with duplicating everything for that long. Nice work :)

I resorted to feels. After decades of programming, I know when I'm being productive, and I can reasonably estimate when a colleague is being productive. I extrapolate that to the LLM, too. Absolutely not an objective measure, but I feel that I can get the LLM to do in a day a task that would take me 2-3 weeks (post-Nov 25 and using parallel agents).

ofjcihenabout 10 hours ago
If anything couldn’t huge amounts of code changes or LoC be a sign of a poor outcome?
Turskaramaabout 10 hours ago
Yes, there's a reason it was abandoned as a KPI almost as soon as it was introduced. Just because AI is writing the code instead now doesn't magically make it a good metric.
ignoramousabout 10 hours ago
Some argue, LoC is irrelevant as a quality/complexity metric as (in this new software product development lifecycle) implementation + testing + maintainance is wholly overseen by agents.

It has never been possible to code & deploy software with all but specs. Whatever software Garry is building are products he couldn't otherwise. LoC, in that context, serves as a reminder of the capabilities of the agents to power/slog through reqs/specs (quite incredibly so).

Besides, critical human review can always be fed back as instructions to agents.

njarboeabout 13 hours ago
German General Kurt von Hammerstein-Equord (a high-ranking army officer in the Reichswehr/Wehrmacht era):

“I divide my officers into four groups. There are clever, diligent, stupid, and lazy officers. Usually two characteristics are combined.

Some are clever and diligent — their place is the General Staff.

The next lot are stupid and lazy — they make up 90% of every army and are suited to routine duties.

Anyone who is both clever and lazy is qualified for the highest leadership posts, because he possesses the intellectual clarity and the composure necessary for difficult decisions.

One must beware of anyone who is both stupid and diligent — he must not be entrusted with any responsibility because he will always cause only mischief.”

greazyabout 8 hours ago
I much prefer the Napolean attributed joke version where diligent is replaced with energetic. It ends with Napolean being asked "but general what about the fourth group, stupid and energetic?"

"I have them shot".

quantummagicabout 13 hours ago
Where my fellow ninety-percenters at?
dijitabout 13 hours ago
I think we put too much negative emphasis on people who aren’t as gifted intellectually.

In reality, the world works because of human automotons, honest people doing honest work; living their life in hopefully a comforting, complete and wholesome way, quietly contributing their piece to society.

There is no shame in this, yet we act as though there is.

xboxnolifesabout 13 hours ago
This is what pains me with how many people respond negatively toward the idea of everyone being able to earn an honest living and raise a family. Too often the idea of "deserving it" comes into it as if doing your small part to contribute to society is not enough.
theshrike79about 3 hours ago
Doing a repetitive(ish) task day in day out requires a specific type of person, I'm not one of them.

But I do know multiple, just in my immediate familuy. People who graduated from school, went to the local factory and worked there for half a century before retiring. Pretty much the same job, moving widgets from A to B etc, nothing massively complex. I do respect the people who can do it and especially the ones who make it look effortless and efficient - even a bit performative.

Also because my home town is a "factory town", guess where I worked for my summer job(s). I wanted to shove a hot poker in my ear just to get away from the tedium after the first day. On the second day I was thinking how to automate the damn process to not involve me in it at all :D

analog31about 12 hours ago
I'm not blaming you here, but I think "automatons" may be inaccurate. A lot of the jobs that seem menial would be utterly bollixed if done by an automaton. The people continually handle the edge cases and tiny discrepancies between formal procedures and how things actually work. Consider the many stories of people experience AI bots when they try to get vendor support for products. "Please let me talk to a real person."

Many of those people, probably including most bureaucrats, are working on systems that have already been automated to the fullest extent possible. This is one of the reasons why bureaucracies seem chaotic and inefficient -- the stuff that works is happening automatically and is invisible. You only see the exceptions.

The automation can be improved, but it's a laborious process and fraught with the risks associated with the software crisis. You never know when a project is going to fall into the abyss and never emerge, and the best models of project failure are stochastic.

genxyabout 10 hours ago
I love a dog and a cat and tree. I can respect someone not as intelligent as other folks. I'd love it we started holding the crude, mean and willfully ignorant to a higher standard.
Jtariiabout 13 hours ago
The movie Perfect Days captures this perfectly.
ChosenEndabout 13 hours ago
Human automatons? Why would you have mercy for automatons? Just call them cattle, we might feel more compassion towards them if we don't think of them as machinelike.
wiseowiseabout 12 hours ago
I’m here man. Just want to make money and support my family. Couldn’t care less what some German general thinks about me. Even less care about online clowns trying to put people in buckets.
geodelabout 8 hours ago
> Just want to make money and support my family.

That'd be just fine. But you do seem to care and feel hurt enough to call people online clowns.

eucyclosabout 8 hours ago
I think this heuristic used to be more useful before it became widely known. Laziness is a fine quality if diligence is publicly rewarded, but once people game the metrics to look more lazy than they really are, things break
bitwizeabout 6 hours ago
And that's why Peter Gibbons is clearly management material!
xhrpostabout 13 hours ago
I've had this exact sentiment in the past couple months after seeing a few PRs that were definitely the wrong solution to a problem. One was implementing it's own parsing functions to which well established solutions like JSON or others likely existed. I think any non-llm programmer could have thought this up but then immediately decide to look elsewhere, their human emotions would have hit and said "that's way too much (likely redundant) work, there must be a better way". But the LLM has no emotion, it isn't lazy and that can be a problem because it makes it a lot easier to do the wrong thing.
nulltraceabout 12 hours ago
It also doesn't bother checking what's already in your project. Grep around a bit and you'll find three `formatTimestamp` functions all doing almost the same thing.
johnfnabout 14 hours ago
As dumb as it is to loudly proclaim you wrote 200k loc last week with an LLM, I don’t think it’s much better to look at the code someone else wrote with an LLM and go “hah! Look at how stupid it is!” You’re making exactly the same error as the other guy, just in the opposite direction: you’re judging the profession of software engineering based on code output rather than value generation.

Now, did Garry Tan actually produce anything of value that week? I dunno, you’ll have to ask him.

fao_about 14 hours ago
Yeah! It's not like code quality matters in terms of negative value or lives lost, right?!

https://en.wikipedia.org/wiki/Horizon_IT_scandal

Furthermore,

> As for the artifact that Tan was building with such frenetic energy, I was broadly ignoring it. Polish software engineer Gregorein, however, took it apart, and the results are at once predictable, hilarious and instructive: A single load of Tan’s "newsletter-blog-thingy" included multiple test harnesses (!), the Hello World Rails app (?!), a stowaway text editor, and then eight different variants of the same logo — one of which with zero bytes.

Do you think any of the... /things/ bundled in this software increased the surface area that attacks could be leveraged against?

SvenLabout 13 hours ago
I also struggle with this all the time, balance between bringing value/joy and level of craft. Most human written stuff might look really ugly or was written in a weird way but as long as it’s useful it’s ok.

What I don’t like here is the bragging about the LoC. He’s not bragging about the value it could provide. Yes people also write shitty code but they don’t brag about it - most of the time they are even ashamed.

flirabout 12 hours ago
> a stowaway text editor

?!

Was it hiding in one of the lifeboats?

lotsofpulpabout 13 hours ago
The Horizon IT scandal was not caused by poor code quality, the scandal was the corrupt employees of the UK government/Post Office. Poor quality code might have caused the error, but the failure to investigate the errors and sweep them under the rug was made by humans.
fao_about 12 hours ago
> Poor quality code might have caused the error, but the failure to investigate the errors and sweep them under the rug was made by humans.

That's not quite correct.

The root set of errors were made by the accounting software. The branch sets of errors were made by humans taking Horizon IT's word for it that there was no fault in the code, and instead blaming the workers for the differences in the balance sheets.

If there were no errors in the accounting software (i.e. it had been properly designed and tested), then none of that would have happened.

Nobody blames THERAC-25 on the human operator.

8noteabout 13 hours ago
> included multiple test harnesses (!)

ive seen plenty of real code written by real people with multiple test harnesses and multiple mocking libraries.

its still kinda irrelevant to whether the code does anything useful; only a descriptor of the funding model

flirabout 12 hours ago
If I'm reading this correctly ("a single homepage load of http://garryslist.org downloads 6.42 MB across 169 requests"), the test harnesses were being downloaded by end users. They weren't being installed as devDependencies.
sdevonoesabout 13 hours ago
> Now, did Garry Tan actually produce anything of value that week? I dunno, you’ll have to ask him.

Let’s not be naive. Garry is not a nobody. He absolutely doesn’t care about how many lines of code are produced or deleted. He made that post as advertisement: he’s advertising AI because he’s the ceo of YC which profitability depends on AI.

He’s just shipping ads.

Terr_about 13 hours ago
"Follow the money" was always relevant, but especially when it comes to any kind of LLM news or investment-du-jour.

The cautionary/pessimist folks at least don't make money by taking the stance.

slyallabout 12 hours ago
A few do.

At the extreme end you'll get invited to conferences but further down you could have other products you are pushing. Even non-AI related that takes advantage of your "smart person" public persona.

tmoertelabout 13 hours ago
> You’re making exactly the same error as the other guy, just in the opposite direction: you’re judging the profession of software engineering based on code output rather than value generation.

But the true metric isn't either one, it's value created net of costs. And those costs include the cost to create the software, the cost to understand and maintain it, the cost of securing it and deploying it and running it, and consequential costs, such as the cost of exploited security holes and the cost of unexpected legal liabilities, say from accidental copyright or patent infringement or from accidental violation of laws such as the Digital Markets Act and Digital Services Act. The use of AI dramatically decreases some of these costs and dramatically increases other costs (in expectation). But the AI hypesters only shine the spotlight on the decreased costs.

alemwjslabout 13 hours ago
It isn't worth the time. I am not going to read the 200k LOC to prove it was a bad idea to generate that much code in a short time and ship it to production. It is on the vibe coder to prove it is. And if it is just tweets being exchanged, and I want to judge someone who is boasting about LOC and aiming to make more LOC/second. Yep I'll judge 'em. It is stupid.
ObscureScienceabout 13 hours ago
"Value generation" is a term I would be somewhat wary of.

To me, in this context, it's similar to drive economic growth on fossil fuel.

Whether in the end it can result in a net benefit (the value is larger than the cost of interacting with it and the cost to sort out the mess later) is likely impossible to say, but I don't think it can simply be judged by short sighted value.

II2IIabout 13 hours ago
Given the framing of the article, I can understand where the opposite direction comment is coming from. The author also gives mixed signals, by simultaneously suggesting that the "laziness" of the programmer and code are virtues. Yet I don't think they are ignoring value generation. Rather, I think they are suggesting that the value is in the quality of the code instead of the problem being solves. This seems to be an attitude held by many developers who are interested in the pursuit of programming rather than the end product.
roncesvallesabout 13 hours ago
The main value he generated from that exercise was the screenshot. It's a kind of credentialism.
arthurjjabout 14 hours ago
LLMs not being lazy enough definitely feels true. But it's unclear to me if it a permanent issue, one that will be fixed in the next model upgrade or just one your agent framework/CICD framework takes care of.

e.g. Right now when using agents after I'm "done" with the feature and I commit I usually prompt "Check for any bugs or refactorings we should do" I could see a CICD step that says "Look at the last N commits and check if the code in them could be simplified or refactored to have a better abstraction"

ocrowabout 9 hours ago
I've tried this approach of instructing the LLM to look for opportunities to abstract, but it's not good at finding the commonalities after the fact, when possibly related functions have already diverted unnecessarily. It writes "sloppy" code, that is to say code that is locally correct but which fails to build towards overall generalizations, but that sloppy code is a cul-de-sac: easy to write, but adding to messiness, and really tough to improve.

When a good programmer writes a new feature, they are looking for both existing and new abstractions that can be applied. They are considering their mental model of the whole system and examining whether it can be leveraged or needs to be updated. That's how they avoid compounding complications.

In order to take a big picture view like that, the LLM needs the right context. It would need to focus on what its system model is and decide when to update that system model. For now, just telling it what to write isn't enough to get good code. You have to tell it what to pay attention to.

arthurjjabout 8 hours ago
> When a good programmer writes a new feature, they are looking for both existing and new abstractions that can be applied. They are considering their mental model of the whole system and examining whether it can be leveraged or needs to be updated. That's how they avoid compounding complications.

This is actually a pretty good argument that it's a permanent issue. I haven't tried with writing, or having an LLM write, a summary of the coding style of any of my code bases but my hunch is it wouldn't do a good job either writing it or taking it into account when coding a new feature

crabmusketabout 7 hours ago
"Programming as theory building" undefeated still.
layer8about 13 hours ago
It’s difficult to define a termination criterion for that. When you ask LLMs to find any X, they usually find something they claim qualifies as X.
arthurjjabout 12 hours ago
Agreed. If I'm looking at what it proposes then about 1/2 the time I don't make the changes. If this were fully automated you would need an addendum like "Only make the change if it saves over 100 lines of code or removes 3 duplicate pieces of logic".

There are other scenarios you would want to check for but you get the idea.

JeremyNTabout 12 hours ago
I agree, it's not a fundamental characteristic but a limitation of how the tool is being used.

If you just tell these things to add, they'll absolutely do that indiscriminately. You end up with these huge piles of slop.

But if I tell an LLM backed harness to reduce LOC and DRY during the review phase, it will do that too.

I think you're more likely to get the huge piles if you delegate a large task and don't review it (either yourself or with an agent).

singronabout 13 hours ago
I have noticed LLMs have a propensity to create full single page web applications instead of simpler programs that just print results to the terminal.

I've also struggled with getting LLMs to keep spec.md files succinct. They seem incapable of simplifing documents while doing another task (e.g. "update this doc with xyz and simply the surrounding content") and really need to be specifically tasked at simplifying/summarizing. If you want something human readable, you probably just need to write it yourself. Editing LLM output is so painful, and it also helps to keep yourself in the loop if you actually write and understand something.

theshrike79about 3 hours ago
My rule of thumb is that documentation for AIs can be written by AIs

Documentation for humans should be written by humans. Or at the very least read through and signed off on by humans.

jimbokunabout 12 hours ago
Time to teach the LLMs and the vibe coders one of the timeless lessons of software development:

https://www.folklore.org/Negative_2000_Lines_Of_Code.html

pityJukeabout 14 hours ago
Man, I cannot imagine how nice it must to be to work with leadership like this, who just gets it.
spprashantabout 13 hours ago
At this point, I almost feel bad that people are piling on Garry Tan. Almost.
Advertisement
eucyclosabout 8 hours ago
The reference to 'literature by the pound ' made me think of an apocryphal story about a pottery teacher who at the end of the year would grade his students on either the quality of a single piece or the weight of all finished pieces. With very few exceptions, the best piece of the year would be one of the ones where a student went for volume.

Which is plausible if you need to touch each piece- more repetitions lead to more improvement if you're already motivated to improve anyway - but if the output is coming from an llm, I'm not sure ..

mplappertabout 13 hours ago
I very much agree; I think laziness / friction is basically a critically important regularizer for what to build and for what to not build. LLMs remove that friction and it requires more discipline now. (Wrote some of this up a while ago here: https://matthiasplappert.com/blog/2026/laziness-in-the-age-o...)
genxyabout 10 hours ago
The scariest things I have ever seen are super hard-working programmers.

So much code, it wouldn't stop. Refactoring was harder than rewriting. Unit tests metastasized into bone. And now another PR.

progbitsabout 13 hours ago
Great article, I've been saying something similar (much less eloquently) at work for months and will reference this one next time it comes up.

Quite often I see inexperienced engineers trying to ship the dumbest stuff. Back before LLM these would be projects that would take them days or weeks to research, write, test, and somewhere along the way they could come to the realization "hold on, this is dumb or not worth doing". Now they just send 10k line PR before lunch and pat themselves on the back.

pythontongueabout 11 hours ago
Similar issue as social media making communication more effortless, and thus encouraging higher quantity over quality
jwpapiabout 12 hours ago
I‘m so happy about this article. I was forming a thought in my head the last couple of days, which is how to describe what it is that makes AI code practically unusable in good systems.

And one of the reasons is the one described in this article and the other is, that you skip training your mental model when you don’t grind these laziness patterns. If you are not in the code grinding to your codebase, you don’t see the fundamental issues that block the next level nor you have the itch to name and abstract it properly so you wont have to worry about in the future, when somebody or you have to extend it.

Knowing your shit is so powerful.

I believe now that my competive advantage is grinding code, whilst others are accumulating slop.

flumpcakesabout 13 hours ago
The more people boast about AI while delivering absolute garbage like in the example here, the more I feel happier toiling around in Nginx configurations and sysadmin busy work. Why worry about AI when it's the same old idiots using it as a crutch, like any new fad.
dwgabout 11 hours ago
As a counterpoint, perhaps there is a sort of "natural selection" which will drive better abstractions due to being more token-efficient. Albeit perhaps a relatively smaller effect.
warwickmcintoshabout 12 hours ago
laziness makes you understand the problem before writing anything. an LLM will happily generate 500 lines for something that needed 20 because it never has to maintain any of it.
dzongaabout 11 hours ago
software engineering - let's break it down.

engineering is about applying rules of thumb to solve problems.

however what's not usually stated to most people in the profession - that 'software engineering' is also an art - which means do the most you can with constrains. at the edges that's when you create things that are notable.

do LLMs spitting code - fit the 'Engineering or Art' part. NO

Advertisement
abcde666777about 11 hours ago
Being a somewhat lazy individual myself, I'm wary of this statement. It feels too... comforting. "It's okay that I wasn't productive today, because laziness has merits".

I consider my laziness a part of who I am, and I don't demonize it, but I also don't consider it my ally - to get the things I care about done I often have to actively push against it.

rsanheimabout 10 hours ago
Bragging about loc was silly before ai. Now it’s not only silly, but also makes you look like a huge tool.
theshrike79about 3 hours ago
I have no idea what the loc of any of my projects are. Why would I care?

And why do other people care? It's not "line of code" is an uniform measurement you can apply between languages.

A 1 loc Perl program can do more things than a 1000 lines of Enterprise-grade Java.

glitchcabout 12 hours ago
Hard disagree with the initial assumption: Abstractions do not make a system simpler.

Note: I would have added usually but I really do mean always.

dapabout 5 hours ago
I’m curious what you think an abstraction is. Even running “ls” involves several layers of abstraction: a shell, a process (abstracts memory), a thread (abstracts CPU)… you think it would be simpler if you had to deal with all that to list a directory (another abstraction)? Even bits are an abstraction over analog voltage levels.
love2readabout 12 hours ago
the thing about abstractions is that nothing implies that they aren’t leaky abstractions, which may be worse than no abstraction for future bug hunters
geophileabout 11 hours ago
Oh spare me the cult of Larry Wall. His language was crap, and his pontification was unbearable.

That said, he's not wrong about laziness. I'd state it less cutely: Good software takes time, so be patient. The same can be said of most things created by people. Sure, there are flashes of inspiration, like Paul McCartney sitting down and just coming up with Get Back. But those are quite rare. And even in those cases, it often takes time to refine the idea to its final form.

fragmedeabout 12 hours ago
Since we all, stupidly, are leaning into LoC as a metric, because we can't handle subjectivity, at the very least, we could just do orders of magnitude for LoC. Was it a 10/100/1,000/10,000 LoC hour/week/day/month? 1,2,3,4 or 5. Dtrace's 60kLo, would then be a 5, Linux kernel is an8 (40M), Firefox is also an 8. Notepad++ is a 6,
gnerd00about 14 hours ago
oh this hits all the right notes for me! I am just the demographic that tried to perl my way into the earliest web server builds, and read those exact words carefully while looking at the very mixed quality, cryptic ascii line noise that is everyday perl. And as someone who had built multi-thousand line C++ systems already, the "virtues" by Larry Wall seemed spot on! and now to combine the hindsight with current LLM snotty Lord Fauntleroy action coming from San Francisco.. perfect!
jauntywundrkindabout 12 hours ago
Abstractions and strong basis as a freedom to think freely at high levels.

The slop drowning and impinging our ability to do good hammock driven development.

Love it. Thanks Bryan.

It's invaluable framing and we'll stayed. There's a pretty steady background dumb-beat of "do we still need frameworks/libraries" that shows up now. And how to talk to that is always hard. https://news.ycombinator.com/item?id=47711760

To me, the separation of concerns & strong conceptual basis to work from seem like such valuable clarity. But these are also anchor points that can limit us too, and I hope we see faster stronger panning for good reusable architectures & platforms to hang out apps and systems upon. I hope we try a little harder than we have been, that there's more experimentation. Cause it sure felt like the bandwagon effect was keeping us in a couple local areas. I do think islands of stability to work from make all the sense, are almost always better than the drift/accumulation of letting the big ball of mud architecture accrue.

Interesting times ahead. Amid so much illegibile miring slop, hopefully too some complementary new finding out too.

simianwordsabout 14 hours ago
This is a person clearly grieving that his hard earned knowledge in his field is now not that valuable.

It is * exactly * the same as a person who spent years perfecting hand written HTML, just to face the wrath of React.

vsgherziabout 13 hours ago
Disregarding the fact that Bryan operates oxide a company that has multiple investors and customers (id say this proves valuable knowledge) the crazier fact is that people think html is useless knowledge.

React USES html. Understanding html is core to understanding react. React does not in anyway devalue html in the same way that driving automatic devalues driving manual

simianwordsabout 13 hours ago
Go to Facebook.com and right click view source and tell me html is not being devalued. No person who wants to write aesthetic html would write that stuff.
theshrike79about 3 hours ago
Tbh this is like looking at any binary's assembly code and complaining it's "unasthetic".

Whatever blob of HTML the browser is offered is just the end result of multiple "compilation" steps. Nobody has spent a single iota of time thinking about whether it's pretty.

Just like opening any binary produces inelegant assembler.

vsgherziabout 13 hours ago
Do the same to Google.com

When it matters it matters. Even in facebooks case they made react fit for their use case. You think the react devs didn’t understand html? Do you think quality frontends can be written without any understanding of html?

Like the article says we’ve moved an abstraction up. That does not make the html knowledge useless

rakel_rakelabout 13 hours ago
https://xkcd.com/1053/

I recommend you go look at some of his talks on Youtube, his best five talks are probably all in my all time top-ten list!

theshrike79about 3 hours ago
The content of his talks is very good, but his habit of SHOUTING all the time just grates my brain the wrong way. I can literally see the veins in his neck bulging as he screams out his presentation.
g-b-rabout 13 hours ago
Your account name is so fitting

Now look up who he actually is.

lapcatabout 14 hours ago
> This is a person clearly grieving that his hard earned knowledge in his field is now not that valuable.

He's co-founder and CTO of his own company, so I think he's doing fine in his field.

simianwordsabout 13 hours ago
It doesn't change the fact that much of what (I think) he prides in himself in is getting commoditised.
wiseowiseabout 12 hours ago
LLMs dissolved your brain if you think they commoditize what a guy like this[0] prides in himself.

https://bcantrill.dtrace.org/about/

0xBA5EDabout 12 hours ago
I would seriously consider if you've developed an imaginary caricature in your mind that you apply to people you don't know. Further, I would consider if any living person actually lives up to it.
bcantrillabout 10 hours ago
On the one hand, I admire (at some level) you sticking to your guns here, willing to take on all comers. On the other, though, I don't entirely understand the inference that you're drawing from the piece; what, exactly, is getting commoditized?
pxcabout 12 hours ago
What he prides himself in (in this context) is craft, which LLM use probably can enable, but definitely isn't commoditized by the kind of vibe coding that Garry Tan is doing.
qwh1287about 5 hours ago
Yes, people can continue to use their commoditized AI girlfriends while others prefer real ones.

Jesus Christ, even AI websites are just garbage, and that is the lowest form of programming.