RU version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
51% Positive
Analyzed from 3188 words in the discussion.
Trending Topics
#code#more#tests#write#software#minutes#llms#tasks#bob#where

Discussion (76 Comments)Read Original on HackerNews
I use Claude code and codex daily. They have become an integral part of my workflow.
There is no task that takes me a day that they can complete in five minutes.
Even with the lightning fast progress being made, it looks like LLMs are a decade or more away from being that good.
If AI can do your job for you, you should be the first to know. Just try it and see!
Still, I find complex code fixes confirmed by tests end in the LLM fudging the code to make the specific test pass, rather than fixing the general issue. Like, where successful code run should generate a file and the test checks for the file, eventually LLM will just touch the file regardless and be done.
This has completely solved the cheating and fudging to make tests pass for me.
For me, there may be one thing I do every few months that AI is really good at.
The overwhelming majority of the work I do, LLM tooling is just ok at. Definitely faster overall, but with lots of human planning, hand holding and course correction.
I would estimate LLMs make me, on average 50% more productive , which is huge! But from my experience I cannot believe anyone is experiencing a 8h/5m multiple productivity boost overall
I also like to use LLMs for background work on iterative tasks, but the way some people talk about work in the days before LLMs make me realize how we’re arriving at these claims that LLMs make us 10X more productive. If it took someone all day to do a few minutes of active work then I could see how LLMs would feel like a 10X or 50X productivity unlocker simply by not shutting down and doing nothing at the first sign of a pause.
Best example I’ve found: translating code from one language to another where there’s a large corpus of existing acceptance tests.
It’s always gonna be a multi shot process. And it can already write code good enough. That’s no longer the bottleneck.
Further, Qwen 27b is such an incredible masterpiece for coding and it can run on consumer hardware today. Anthropic/OpenAI are gonna give up on coding models very soon. There’s not gonna be any money in it when you can run your own local model for significantly cheaper.
Qwen27b is not SOTA but the value is insane. You can basically use it for small tasks and then route harder problems to opus or sonnet and boom you’ve said a lot of money.
In any case, on that one time that AI works perfectly, it saves me hours of coding. So the potential is there...
> There is no task that takes me a day that they can complete in five minutes.
It's highly dependent on task. I was watching a podcast with Simon Wilson, where he said something like, (paraphrasing) "My whole selling point as a dev was that I could ship POCs / MVPs fast. Now that's somewhat obsolete."
It resonated with me because I feel like that also was a skill that I cultivated and excelled at. I agree with Simon's general thesis: that skill is largely dead. There are many pedants and detractors that will race to the defense of this art with various arguments to try to challenge the idea, but they simply do not hold up to reality. I have non-programmer friends with 10 dollar claude code subscriptions whipping up products to solve niche problems in their life / job.
I offered to help one of my friends who's working on generating math exams based on curricula and seed problem sets. I taught him how to use git, he pushed the repo, I looked at the repo and it wasn't clear he needed me. Everything I could do would be related to scale / reliability / optimization. They don't need any of that, they just need to prompt the ai to say, "go burn some subscription tokens for my AP Calc track this year." There's a whole saas and c2c industry built around this problem that this guy just solved for 10 bucks a month.
Of course, there's much more depth to engineering then just cranking out prototypes. There is still "real engineering" to be done, and software will likely start focusing more towards specification / verification.
But a lot of the industry was built around the idea of speed of delivery / time to market to explore product fit and rapidly iterate. IMO frontier llms (private and open weights) have this largely solved. I can build and test ideas that would have taken me a weekend last year in half a day now, the majority of that time I can be talking to the llm via matrix while I'm out in the world.
The results are always so ridiculously different.
Well... yes! It's not the same as running a program through a compiler 100k times and getting the same binary, it's... different: https://www.lelanthran.com/chap15/content.html
So the effect is just merely some kind of acceleration of "boilerplate code writing", which is very impressive for beginner coders who are mostly doing automateable, trivial tasks, but much less so once you start doing real concurrency / threading / embedded / etc work
Five minutes is pushing it, but 15 minutes? Absolutely.
To me, the lack of amazing productivity gains is that we have done nothing to speed up figuring out what to build and nothing to speed up getting code into production from pull request and in a lot of companies, code review is already saturated.
Also, the agents are good at figuring out problems for themselves, so I can ask it to set up a CI/CD pipeline, give it GitHub access, and it will just try things until it succeeds.
As for AI-written code, I wouldn't fly on a plane controlled by AI-designed and AI-tested code, but much of development is busy work, not problem solving or design. AI excels at turning a protocol spec into a parser for example. I'll take that any day. AI excels at finding stuff, particularly non-code, thesis-level ideas for algorithms and also at about the same level, what's been shown not to work when solving a non-deterministic problem.
If we're lucky, AI will fill in after exposing who is only doing busy work and who is creating.
also, his prediction assumes that ai will be able to learn from its own code going forward. will it also create its new programming languages and tools?
but it's a funny rant.
I agree to some extent with regards to writing new code. One piece where I have been perpetually impressed is at asking it to put together a plausible explanation of how something weird has happened. I have been blown away, multiple times, by Codex and Claude’s ability to take a prompt like “When I did X, I expected Y to happen but instead observed Z. Put together an explanation for how that could happen, including the individual lines of code that can lead to ending up in that state.”
In one notable case, it traced through a pretty complex sensor fusion -> computational geometry problem and identified a particular calculation far upstream that could go negative in certain circumstances, which would lead to a function far downstream generating a polygon with incorrect winding order (clockwise instead of CCW).
In another, it identified a variable that was being initialized to 0 instead of initialized to (a specific runtime value that it should’ve been initialized to during a state transition). The downstream effect, minutes later, would be pathological behaviour that would happen exactly once per boot.
In both cases I was provided with a specific causal chain of events with individual source files and line numbers so that I could verify the plausibility of the explanation myself.
I don't mean to completely dismiss their utility. I realized recently that I was having more fun coding than I ever remember. It is a strange feeling to go along with vibe out there that software developers are becoming obsolete.
And speaking of agents writing tests, I have an ask. The tests agents love to write are in a lot of ways like human written tests, perfunctory and smelly. They are there to check coverage or prompt checkbox, but they barely stress the system under test. I often find that the tests are faking and mocking so many inputs, methods, and side effects, that they aren't testing anything at all. Asking the agent to write the tests first so that they the underlying implementation is more testable has yielded no results.
What has worked for people to get agents to write more testable implementations and better tests?
PS. Reacting to Uncle Bob, I found metric driven agentic refactors just push complexity to outside the scope of the metric. I am finding I need to actively guide the agents for the refactors to actually improve things without increasing the entropy of the codebase.
[1] https://x.com/unclebobmartin/status/2046206145597972849
However this conclusion made no sense as we had similar scenarios across our project that worked flawlessly. After intervening I determined the root cause was a combination of an async issue with the production code and some incorrect mocking that was covering up the async issue.
It never occurred to the AI agent to do some simple cross examination before essentially throwing in the towel?
I would argue your example is exactly the reason why you need to supervise AI
1) https://qntm.org/clean
2) https://blog.cleancoder.com/uncle-bob/2017/10/04/CodeIsNotTh...
https://x.com/stevesi/status/2050325415793951124
Here's how history rhymes with this logic. The development of compilers v writing assembly language was not without a very similar "controversy" — that is, are the new tools more efficient or less efficient.
The first compilers were measured relative to hand-tuned assembly language efficiency. The existing world of compute was very much "compute bound" and inefficient code was being chased out of every system.
The introduction of the first compilers generally delivered code "within 10-30%" as efficient as standard professional assembly. This "benchmark" was enough for almost a generation of Fortran programmers to dismiss the capabilities of compilers.
Also worth noting, early compilers (all through the 1980s) routinely had bugs that generated incorrect code. Debugging a compiler is a nightmare (personal experience). This only provided more "ammo."
With the arrival of COBOL the debate started to shift. COBOL generated decidedly "bloated" code so there was no way to win the efficiency argument. But what people started to realize was that a "modern" programming language made it possible to deliver vastly more software and for many more people to work on the same code (ASM notorious for being challenging for multiple engineers on the same portion of code). So the metric slowly started to move from "as good as hand tuned assembler" to "able to write bigger, more sophisticated code in less time with more people). Computers gained timesharing, more memory, and faster CPUs which made the efficiency argument far less compelling (only to repeat with the first 8K or 64K PCs).
This entire transition is capped off with a description in Fred Brooks "Mythical Man Month" book, one of the seminal books in the field of programming and standard issue book sitting in my office waiting for me on my first day at Microsoft. (See full book free here https://web.eecs.umich.edu/~weimerw/2018-481/readings/mythic...)
It is very early. I was not a programmer when the above happened though I did join the professional ranks while many still held these beliefs. For example, I interned writing COBOL on mainframes while PCs were using C and Pascal which were buggy and viewed as inefficient on processor/space-constrained PCs.
The debate would continue with C++, garbage collection, interpreted v compiled (Visual Basic) and more. As a fairly consistent observation over decades, every new tool is viewed through a lens (at first) by experienced programmers over what is worse while new programmers use the tool and operate in a new context (eg "more software" or "bigger projects"). The excerpt below shows this debate as captured in 1972.
Incorrect. They had bugs that generated incorrect code. They didn't routinely have bugs that generated incorrect code :-/
And the bugs they had were reproducible.
And now you just played yourself, by creating a morass of tiny functions. Well tested (CRAP says so!) And impossible to understand how they compose together.
AI will happily return the next token and ruin your codebase, if you ask it.
https://blog.cleancoder.com/uncle-bob/2021/11/28/Spacewar.ht...
It comes as no surprise to me that the guy who has bad opinions about software architecture, has worse opinions about vibe coding.
But I found myself laughing at the style; just ranting about software like a cartoon villain in his bathrobe. No fucks given.