RU version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
58% Positive
Analyzed from 6042 words in the discussion.
Trending Topics
#code#don#more#same#tests#loc#llms#write#llm#abstraction

Discussion (130 Comments)Read Original on HackerNews
When I am rigourous about the tests, Claude has done an amazing job implementing some tricky algorithms from some difficult academic papers, saving me time overall, but it does require more babysitting than I would like.
The sandbox is powered by bubblewrap (used by Flatpaks) so I trust it.
Not gonna help with the test code quality, but at least the tests are going to be relevant.
It's a bit like pre-registering your study in medicine.
Like grepping source code for a string. or assert(1==1, true)
You have to have a curated list of every kind of test not to write or you get hundreds of pointless-at-best tests.
LLMs seem to also avoid checking the math of the simulator. In CFD, this is called verification. The comparisons are almost exclusively against experiments (validation), but it's possible for a model to be implemented incorrectly and for calibration of the model to hide that fact. It's common to check the order-of-accuracy of the numerical scheme to test whether it was implemented correctly, but I haven't seen any vibe coders do that. (LLMs definitely know about that procedure as I've asked multiple LLMs about it before. It's not an obscure procedure.)
This should be distilled into a tool. Some kind of AST based code analyser/linter that fails if it sees stupid test structures.
Just having it in plain english in a HOW-TO-TEST.md file is hit and miss.
I've seen a lot of people interact with LLMs like this and I'm skeptical.
It's not how you'd "teach" a human (effectively). Teaching (humans) with positive examples is generally much more effective than with negative examples. You'd show them examples of good tests to write, discuss the properties you want, etc...
I try to interact with LLMs the same way. I certainly wouldn't say I've solved "how to interact with LLMs" but it seems to at least mostly work - though I haven't done any (pseudo-)scientific comparison testing or anything.
I'm curious if anyone else has opinions on what the best approach is here? Especially if backed up by actual data.
However we can say based on the architecture of the LLMs and how they work that if you want them to not do something, you really don't want to mention the thing you don't want them to do at all. Eventually the negation gets smeared away and the thing you don't want them to do becomes something they consider. You want to stay as positive as possible and flood them with what you do want them to do, so they're too busy doing that to even consider what you didn't want them to do. You just plain don't want the thing you don't want in their vector space at all, not even with adjectives hanging on them.
The ideal set of outcomes exist as a tiny subspace of a high-dimensional space of possible solutions. Almost all those solutions are bad. Giving negative examples is removing some specific bits of the possibility space from consideration[0] - not very useful, since almost everything else that remains is bad too. Giving positive examples is narrowing down the search area to where the good solutions are likely to be - drastically more effective.
A more humane intuition[1], something I've observed as a parent and also through introspection. When I tell my kid to do something, and they don't understand WTF it is that I want, they'll do something weird and entirely undesirable. If I tell them, "don't do that - and also don't do [some other thing they haven't even thought of yet]", it's not going to improve the outcome; even repeated attempts at correction don't seem effective. In contrast, if I tell (or better, show) them what to do, they usually get the idea quickly, and whatever random experiments/play they invent, is more likely to still be helpful.
--
[0] - While paradoxically also highlighting them - it's the "don't think of a pink elephant" phenomenon.
[1] - Yes, I love anthropomorphizing LLMs, because it works.
If I tell it what to do, I bias it towards doing those things and limit its ability to think of things I didn't think of myself, which is what I want in testing. In separate passes, sure a pass where I prescribe types and specific tests is effective. But I also want it to think of things I didn't, a prompt like "write excellent tests that don't break these rules..." is how you get that.
Maybe this was true when Programming Perl was written, but I see the opposite much more often now. I'm a big fan of WET - Write Everything Twice (stolen from comments here), then the third time think about maybe creating a new abstraction.
I've always heard this as the "Rule of three": https://en.wikipedia.org/wiki/Rule_of_three_(computer_progra...
That doesn't people from trying though, platform creation is rife within big tech companies as a technical form of empire building and career-driven development. My rule of thumb in tech reviews is you can't have a platform til you have three proven use cases and shown that coupling them together is not a net negative due to the autonomy constraint a shared system imposes.
Use the wrong abstraction and you are constantly fighting the same exact bug(s) in the system. Good design makes entire classes of bugs impossible to represent.
I don't believe the trope that you need to make a bunch of bad designs before you can do good. Those lessons are definitely valuable, but not a requirement.
A great example is the evolution from a layered storage stack to a unified one like ZFS. Or compilers from multipass beasts to interactive query based compilers and dynamic jits.
The design and properties of the system was always the problem I loved solving, sometimes the low level coding puzzles are fun. Much of programming is a slog though, the flow state has been harder and harder to achieve. The super deep bug finding, sometimes, if you satisfactorily found it and fixed it. This is the part where you learn an incredible amount. Fixing shallow cross module bugs is hell.
Don't you have to be really seasoned to in good faith, attempt to couple two systems and say where that would be productive? You can't prove this negative. I would imagine a place like that would have to have a very strong culture of building towards the stated goals. Keeping politics and personalities out of it as much as possible.
Sandi Metz https://sandimetz.com/blog/2016/1/20/the-wrong-abstraction
A third time, with a new abstraction, is where you need to be careful. Fred Brooks ("Mythical Man Month") refers to it as the "second-system effect" where the confidence of having done something once (for real, not just prototype) may lead to an over-engineered and unnecessarily complex "version 2" as you are tempted to "make it better" by adding layers of abstractions and bells and whistles.
I'm really talking about manually writing code, but the same would apply for AI written code. Having a single place to update when something needs changing is always going to be less error prone.
The major concession I make to modularity when developing a prototype is often to put everything into a single source file to make it fast to iteratively refactor etc rather than split it up into modules.
And then there's this awkward bit in the middle where you're not necessarily reviewing all the code the AI generates, but you're the one driving the architecture, coming up with feature ideas, pushing for refactors from reading the code, etc. This is where I'm at currently and it's tricky, because while I'd never say that I "wrote" the code, I feel I can claim credit for the app as a whole because I was so heavily involved in the process. The end result I feel is similar to what I would've produced by hand, it just happened a lot faster.
(granted, the end result is only 2000 LoC after a few weeks working on and off)
I think the right way to explain the work done sounds something like, "I worked with Claude to create an app that does ______. I know it works because ______."
The metric by itself tells you nothing about what value those tokens produced, but to some extent it represents the amount of thinking you are able to offload to the computer for you.
Wide breadth problems seem to scale well with usage, like scanning millions of LOC of code for vulnerabilities, such as the recent claude mythos results.
I don't mind it so much when it's a newbie or non-techie who has never actually written code before, because bless their hearts, they did it! They got some code working!
But if you've been developing for decades, you know that counting lines of code means nothing, less than nothing. That you could probably achieve the same result in half the lines if you thought about it a bit longer.
And to claim this as an achievement when it's LLM-generated... that's not a boast. That doesn't mean what you think it means.
But I guess we hit the same old problem that we've always had - how do you measure productivity in software development? If you wanted to boast about how an LLM is making you 100x more productive, what metric could you use? LOC is the most easily measurable, really, really, terrible measure that PMs have been using since we started doing this, because everything else is hard.
I resorted to feels. After decades of programming, I know when I'm being productive, and I can reasonably estimate when a colleague is being productive. I extrapolate that to the LLM, too. Absolutely not an objective measure, but I feel that I can get the LLM to do in a day a task that would take me 2-3 weeks (post-Nov 25 and using parallel agents).
It has never been possible to code & deploy software with all but specs. Whatever software Garry is building are products he couldn't otherwise. LoC, in that context, serves as a reminder of the capabilities of the agents to power/slog through reqs/specs (quite incredibly so).
Besides, critical human review can always be fed back as instructions to agents.
“I divide my officers into four groups. There are clever, diligent, stupid, and lazy officers. Usually two characteristics are combined.
Some are clever and diligent — their place is the General Staff.
The next lot are stupid and lazy — they make up 90% of every army and are suited to routine duties.
Anyone who is both clever and lazy is qualified for the highest leadership posts, because he possesses the intellectual clarity and the composure necessary for difficult decisions.
One must beware of anyone who is both stupid and diligent — he must not be entrusted with any responsibility because he will always cause only mischief.”
"I have them shot".
In reality, the world works because of human automotons, honest people doing honest work; living their life in hopefully a comforting, complete and wholesome way, quietly contributing their piece to society.
There is no shame in this, yet we act as though there is.
But I do know multiple, just in my immediate familuy. People who graduated from school, went to the local factory and worked there for half a century before retiring. Pretty much the same job, moving widgets from A to B etc, nothing massively complex. I do respect the people who can do it and especially the ones who make it look effortless and efficient - even a bit performative.
Also because my home town is a "factory town", guess where I worked for my summer job(s). I wanted to shove a hot poker in my ear just to get away from the tedium after the first day. On the second day I was thinking how to automate the damn process to not involve me in it at all :D
Many of those people, probably including most bureaucrats, are working on systems that have already been automated to the fullest extent possible. This is one of the reasons why bureaucracies seem chaotic and inefficient -- the stuff that works is happening automatically and is invisible. You only see the exceptions.
The automation can be improved, but it's a laborious process and fraught with the risks associated with the software crisis. You never know when a project is going to fall into the abyss and never emerge, and the best models of project failure are stochastic.
That'd be just fine. But you do seem to care and feel hurt enough to call people online clowns.
Now, did Garry Tan actually produce anything of value that week? I dunno, you’ll have to ask him.
https://en.wikipedia.org/wiki/Horizon_IT_scandal
Furthermore,
> As for the artifact that Tan was building with such frenetic energy, I was broadly ignoring it. Polish software engineer Gregorein, however, took it apart, and the results are at once predictable, hilarious and instructive: A single load of Tan’s "newsletter-blog-thingy" included multiple test harnesses (!), the Hello World Rails app (?!), a stowaway text editor, and then eight different variants of the same logo — one of which with zero bytes.
Do you think any of the... /things/ bundled in this software increased the surface area that attacks could be leveraged against?
What I don’t like here is the bragging about the LoC. He’s not bragging about the value it could provide. Yes people also write shitty code but they don’t brag about it - most of the time they are even ashamed.
?!
Was it hiding in one of the lifeboats?
That's not quite correct.
The root set of errors were made by the accounting software. The branch sets of errors were made by humans taking Horizon IT's word for it that there was no fault in the code, and instead blaming the workers for the differences in the balance sheets.
If there were no errors in the accounting software (i.e. it had been properly designed and tested), then none of that would have happened.
Nobody blames THERAC-25 on the human operator.
ive seen plenty of real code written by real people with multiple test harnesses and multiple mocking libraries.
its still kinda irrelevant to whether the code does anything useful; only a descriptor of the funding model
Let’s not be naive. Garry is not a nobody. He absolutely doesn’t care about how many lines of code are produced or deleted. He made that post as advertisement: he’s advertising AI because he’s the ceo of YC which profitability depends on AI.
He’s just shipping ads.
The cautionary/pessimist folks at least don't make money by taking the stance.
At the extreme end you'll get invited to conferences but further down you could have other products you are pushing. Even non-AI related that takes advantage of your "smart person" public persona.
But the true metric isn't either one, it's value created net of costs. And those costs include the cost to create the software, the cost to understand and maintain it, the cost of securing it and deploying it and running it, and consequential costs, such as the cost of exploited security holes and the cost of unexpected legal liabilities, say from accidental copyright or patent infringement or from accidental violation of laws such as the Digital Markets Act and Digital Services Act. The use of AI dramatically decreases some of these costs and dramatically increases other costs (in expectation). But the AI hypesters only shine the spotlight on the decreased costs.
To me, in this context, it's similar to drive economic growth on fossil fuel.
Whether in the end it can result in a net benefit (the value is larger than the cost of interacting with it and the cost to sort out the mess later) is likely impossible to say, but I don't think it can simply be judged by short sighted value.
e.g. Right now when using agents after I'm "done" with the feature and I commit I usually prompt "Check for any bugs or refactorings we should do" I could see a CICD step that says "Look at the last N commits and check if the code in them could be simplified or refactored to have a better abstraction"
When a good programmer writes a new feature, they are looking for both existing and new abstractions that can be applied. They are considering their mental model of the whole system and examining whether it can be leveraged or needs to be updated. That's how they avoid compounding complications.
In order to take a big picture view like that, the LLM needs the right context. It would need to focus on what its system model is and decide when to update that system model. For now, just telling it what to write isn't enough to get good code. You have to tell it what to pay attention to.
This is actually a pretty good argument that it's a permanent issue. I haven't tried with writing, or having an LLM write, a summary of the coding style of any of my code bases but my hunch is it wouldn't do a good job either writing it or taking it into account when coding a new feature
There are other scenarios you would want to check for but you get the idea.
If you just tell these things to add, they'll absolutely do that indiscriminately. You end up with these huge piles of slop.
But if I tell an LLM backed harness to reduce LOC and DRY during the review phase, it will do that too.
I think you're more likely to get the huge piles if you delegate a large task and don't review it (either yourself or with an agent).
I've also struggled with getting LLMs to keep spec.md files succinct. They seem incapable of simplifing documents while doing another task (e.g. "update this doc with xyz and simply the surrounding content") and really need to be specifically tasked at simplifying/summarizing. If you want something human readable, you probably just need to write it yourself. Editing LLM output is so painful, and it also helps to keep yourself in the loop if you actually write and understand something.
Documentation for humans should be written by humans. Or at the very least read through and signed off on by humans.
https://www.folklore.org/Negative_2000_Lines_Of_Code.html
Which is plausible if you need to touch each piece- more repetitions lead to more improvement if you're already motivated to improve anyway - but if the output is coming from an llm, I'm not sure ..
So much code, it wouldn't stop. Refactoring was harder than rewriting. Unit tests metastasized into bone. And now another PR.
Quite often I see inexperienced engineers trying to ship the dumbest stuff. Back before LLM these would be projects that would take them days or weeks to research, write, test, and somewhere along the way they could come to the realization "hold on, this is dumb or not worth doing". Now they just send 10k line PR before lunch and pat themselves on the back.
And one of the reasons is the one described in this article and the other is, that you skip training your mental model when you don’t grind these laziness patterns. If you are not in the code grinding to your codebase, you don’t see the fundamental issues that block the next level nor you have the itch to name and abstract it properly so you wont have to worry about in the future, when somebody or you have to extend it.
Knowing your shit is so powerful.
I believe now that my competive advantage is grinding code, whilst others are accumulating slop.
engineering is about applying rules of thumb to solve problems.
however what's not usually stated to most people in the profession - that 'software engineering' is also an art - which means do the most you can with constrains. at the edges that's when you create things that are notable.
do LLMs spitting code - fit the 'Engineering or Art' part. NO
I consider my laziness a part of who I am, and I don't demonize it, but I also don't consider it my ally - to get the things I care about done I often have to actively push against it.
And why do other people care? It's not "line of code" is an uniform measurement you can apply between languages.
A 1 loc Perl program can do more things than a 1000 lines of Enterprise-grade Java.
Note: I would have added usually but I really do mean always.
That said, he's not wrong about laziness. I'd state it less cutely: Good software takes time, so be patient. The same can be said of most things created by people. Sure, there are flashes of inspiration, like Paul McCartney sitting down and just coming up with Get Back. But those are quite rare. And even in those cases, it often takes time to refine the idea to its final form.
The slop drowning and impinging our ability to do good hammock driven development.
Love it. Thanks Bryan.
It's invaluable framing and we'll stayed. There's a pretty steady background dumb-beat of "do we still need frameworks/libraries" that shows up now. And how to talk to that is always hard. https://news.ycombinator.com/item?id=47711760
To me, the separation of concerns & strong conceptual basis to work from seem like such valuable clarity. But these are also anchor points that can limit us too, and I hope we see faster stronger panning for good reusable architectures & platforms to hang out apps and systems upon. I hope we try a little harder than we have been, that there's more experimentation. Cause it sure felt like the bandwagon effect was keeping us in a couple local areas. I do think islands of stability to work from make all the sense, are almost always better than the drift/accumulation of letting the big ball of mud architecture accrue.
Interesting times ahead. Amid so much illegibile miring slop, hopefully too some complementary new finding out too.
It is * exactly * the same as a person who spent years perfecting hand written HTML, just to face the wrath of React.
React USES html. Understanding html is core to understanding react. React does not in anyway devalue html in the same way that driving automatic devalues driving manual
Whatever blob of HTML the browser is offered is just the end result of multiple "compilation" steps. Nobody has spent a single iota of time thinking about whether it's pretty.
Just like opening any binary produces inelegant assembler.
When it matters it matters. Even in facebooks case they made react fit for their use case. You think the react devs didn’t understand html? Do you think quality frontends can be written without any understanding of html?
Like the article says we’ve moved an abstraction up. That does not make the html knowledge useless
I recommend you go look at some of his talks on Youtube, his best five talks are probably all in my all time top-ten list!
Now look up who he actually is.
He's co-founder and CTO of his own company, so I think he's doing fine in his field.
https://bcantrill.dtrace.org/about/
Jesus Christ, even AI websites are just garbage, and that is the lowest form of programming.