Back to News
Advertisement
Advertisement

⚡ Community Insights

Discussion Sentiment

85% Positive

Analyzed from 606 words in the discussion.

Trending Topics

#verifier#llm#agent#write#system#thing#karpathy#genetic#algorithm#where

Discussion (14 Comments)Read Original on HackerNews

pteetor•about 2 hours ago
In case you are unfamiliar with Karpathy's Loop[1], it is a genetic algorithm[2] where the genetic "mutations" are clever-but-random ideas generated by an LLM agent, aimed at improving a system.

  (1) Let the LLM randomly perturbate the system.
  (2) Measure the system's performance.
  (3a) If the perturbation improved performance, keep the change.
  (3b) Otherwise, don't.
  (4) Repeat
[1] https://github.com/karpathy/autoresearch

[2] https://en.wikipedia.org/wiki/Genetic_algorithm

2001zhaozhao•about 1 hour ago
Wtf, this has a name now? I thought of this exact idea literally months ago but never had the time to do any experiments on it.

At the time I dismissed it as potentially being incredibly expensive for the improvement you do get, and runs into typical pitfalls of evolutionary algorithms (in the same way evolution doesn't let an organism grow a wheel, your LLM evolution algorithm will never come up with something that requires a far bigger leap than what you allow the LLM to perturb on a single step. Also the genetic algorithm will probably result in a vibecoded mess of short-sighted decisions just like evolution creates a spaghetti genome in real life.)

I'll definitely need to look into how people have improved the idea and whether it is practical now.

beepdyboop•41 minutes ago
This is not a new idea at all, many many have had it, no one really can claim it
naveen99•2 minutes ago
You know this doesn’t work most of the time…
sho_hn•about 2 hours ago
Salient on the value of the verifier. Matches my experience in the last two quarters.

Nice detail on the encountered failures. Very similar experiences with my own loops against testsuites.

Great post. A snapshot in time.

fc417fc802•about 2 hours ago
Extremely interesting but I don't understand why it was written by an LLM. Either the frontier models are far better than I realized or else writing this document required a lot of manual work regardless at which point why not keep it in your own voice?

> The agent did not know that would also halve the LUT count. It found out by doing it and watching the synthesizer.

So I guess this is an example of an LLM anthropomorphizing and making wild conjectures about the internal workings of a different LLM.

osti•about 1 hour ago
> propose, implement, measure, keep the wins

Pretty much what I did to let Codex with gpt5.4xhigh improve my fairly complex CUDA kernel which resulted in 20x throughput improvement.

hackyhacky•about 1 hour ago
Concretely, what interesting changes did it make to achieve such a significant improvement?
outside1234•about 2 hours ago
Has anyone actually written a verifier for a business / project?
sho_hn•about 2 hours ago
I'd say "a verifier" here is a loose term. A great testsuite is a verifier. I've done reverse-engineering projects that involved generating trace logs from the object under test, having a reimplementation emit the same logs, and running strict comparisons.

OP's post is basically pointing out what certainly many others have independently discovered: Your agent-based dev operation is as good as the test rituals and guard rails you give the agents.

dataviz1000•about 1 hour ago
Can you explain your question a little more? The recursive agents will find the minimum to satisfy the deterministic termination condition including cheating. In other words, it will be literally correct yet wrong. I would go so far to say malicious compliance.

I have recursive agent that finds trading strategies after recreating academic research and probing the model using its training on everything. It works really well but I have to force it to write out every line and write a proof that data in the future from the time of the wall clock didn't enter the system. Even then some stupid thing like not converting the timezone with daylight savings will allow it to peek into the future 1 hour. These types of bugs are almost impossible to find. Now there needs to be another agent whose only purpose to write out every line explaining that the timezone for that line of code was correct.

DeathArrow•about 1 hour ago
Is this related to autoresearch? https://github.com/karpathy/autoresearch
thin_carapace•about 2 hours ago
> "If you can write the rules down, an agent will satisfy them faster than your team will."

a fantastic opportunity to become the next next big thing and write a verifier verifier.

at the hypothesized inflexion point where AI instantly performs exactly as commanded, what happens to heavily regulated industries like medical? do we get huge leaps and bounds everywhere EXCEPT where it matters, or is regulation going to be handed over to a verifier verifier?

_carbyau_•about 1 hour ago
> performs exactly as commanded

The devil is in the details. There are an amazing number of details in a good [thing]. Someone somewhere has to say exactly what this [thing] being built actually is.

Read almost any story about wishes from a genie. Simple statements don't work.