Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
55% Positive
Analyzed from 1536 words in the discussion.
Trending Topics
#code#output#compilers#llms#compiler#deterministic#formal#write#review#assembly
Discussion Sentiment
Analyzed from 1536 words in the discussion.
Trending Topics
Discussion (31 Comments)Read Original on HackerNews
When optimizing code it's not unusual to look at the assembly. It's not unusual to look for opportunities for autovectorization or to verify inlining or loop unrolling.
Compilers are, for the most part, deterministic. This means after people have reviewed the output, it's unlikely to change. It also means if they do change, only a few people are required to notice.
None of this applies to LLMs. They are worse than compilers, in regards to the quality and characteristics of their output, in every possible way.
If no one reviewed compiler output then https://godbolt.org/ wouldn't exist.
The rest of us uses it because it is cool way to share code snippets.
No. In reality, this is almost never done anymore.
We used to do it all the time back when performance mattered, but that was then.
HN readers don't have to like it, and obviously they (we) don't, but shooting the messenger won't help.
This is a complete misunderstanding of what makes compilers trustworthy. Those are all properties of the language, not the compiler. The compiler is trustworthy to the extent that it is well built, internally. It is trustworthy to the extent that the mapping from source code to machine code is well defined, and implemented correctly.
You can have the best type system you want, but if the compiler is badly implemented, it won't be trustworthy. A perfect example is C - a language that barely has a type system, yet has some of the most trustworthy and optimized compilers. And it also has, or at least had, plenty of buggy compilers, typically for small embedded platforms with complicated mappings between C constructs and the limited CPU instruction set.
From my own experience working with agents is that there’s “snowball of shit” effect. Small little mistakes that compound on each other. You can either
- review the code and try to prune some of the shit occasionally - let the LLM handle everything
As of the current status of the industry it’s very hard for me to not see option 2 as extremely irresponsible. Coding agents limits are not well defined and unless you’re running an open weight model locally (most people aren’t) you just gave up all control over your code to a third party. If running local models were the norm, the argument that LLM are just another layer of abstraction would hold a little better. Reusing the compiler analogy from the post, it’s like depending on a compiler where you pay a monthly premium to compile your code. Those did exist a while ago with closed licenses, but I think the majority of deployed code nowadays is on open-ish platforms. This walled garden development paradigm already lost once
Nobody goes to space.
Nobody is the head of state of a country.
Nobody knows the airspeed velocity of an unladen swallow.
Just because few specialist people do something doesn't mean "nobody" does it.
The thesis of this article is false.
i.e. can useful deterministic complier-like behavior ever be found with a non-deterministic LLM approach?
In my view the answer is yes (for most people). I don't think the technology has to formally perfect to create a significant shift in how we write (most) software.
There will still be some who review AI code. Probably in the domains where people review complier code. But not everything actually needs that level of formal verification.
[0] https://github.com/figsoda/mmtc/
It is probably easier to just write that program.
I guess you can argue that these are two independent processes so you can combine them to get something more reliable than both - this might be a viable path. But from what I heard writing formal specifications is just really hard - I haven't seen anything practical in this area.
They are using the same AI to generate the proofs.
In my company we have so much english prose committed to MD files that Im starting to think it’s all just snake oil. I cannot trust an engineer that writes “no bugs , please” and can go on with their lives.
The formal foundations of compilers are completely different from the formal foundations of LLMs.
The former are deterministic, easy to formally verify, and extremely simple in nature. "Translate a for-loop into x86 instructions using a set of rules."
The latter is intrinsically statistical in nature. "Translate a human language prompt into functional code" has to infer the correct output statistically from similar, observed input->output relationships. There is no guarantee of consistency. Different builds of the model will see different input->output evidence, in different order, and parameter tuning will further change how it responds to those pieces of evidence. Evidence is incomplete. Local minima are inevitable. LLMs are lossy curve-fitters under the hood. Errors aren't an option, they're an inevitability.
This is so dumb.
Lets indeed treat non-deterministic output exactly like we treat deterministic output.
https://github.com/llvm/llvm-project/tree/main/llvm/test/Cod...
“AI-checks-AI pipelines as first-class CI infrastructure, not bolt-on curiosity”—what’s the contrast here? Is it serious aspiration, not unserious aspiration?
“Formal specification layers that agents execute against, not just prompts”—Okay.
It just looks like it is stating lots of problems with a x-not-y as if there is progress being made by way of insistence.
I am open to the idea of something like a small verification kernel that can be comprehended by “humans” which can check GenAI output. But right now we can contrast mature (decade+) compilers with GenAI like this.
- Compilers: You get the abstraction you asked for: it might not be “optimal” code, but it is code that works the way you wrote it
- GenAI: Here is 200KLOC, good luck, could be anything
Now you could reduce the space of those 200KLOC with tests and verification. But so far (based on this submission) it looks like this is at the handwaving stage.
Certainly you would need high-value tests if tests are the thing that is supposed to be the verification. Either something simple and expressive enough for “humans” to write or something that is both short and easy to read for “humans” (and generated by GenAI). Not some copy-paste smelling mockfest that looks like it is a pile of junk that has evolved over five years, each author pushing some junk on top while taking care to not make the whole pile tilt and collapse.
Just yesterday I've reported a codegen bug in MSVC. (Luckily they've fixed it very fast.) Can you realise that it's an optimiser bug without inspecting the assembly? Hardly.
All the arguments people claim against LLMs are similarly applicable to compilers, but compilers are old technology and LLMs are new.
If you're an expert, just about every compiled function contains obvious inefficiencies, and a skilled assembly programmer can speed it up by in the ballpark of 3x. If we're talking about your average webapp, you can usually get 1000x better resource usage in most ways, including CPU, RAM, storage and so on.
And the output isn't deterministic either - the bugs no withstanding, code generation is highly chaotic, optimisations have non-local impacts and you can't easily predict optimised codegen output from source.
LLMs aren't much worse. They have non-deterministic output, but you can steer it - similarly to a compiler. An expert can use it to gain great speed and efficiency, but in the hands of someone not as capable, you can make something awful just as fast. Both tools are force multipliers.