FR version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
91% Positive
Analyzed from 481 words in the discussion.
Trending Topics
#verifier#agent#llm#write#system#thing#karpathy#where#keep#don

Discussion (13 Comments)Read Original on HackerNews
[2] https://en.wikipedia.org/wiki/Genetic_algorithm
Nice detail on the encountered failures. Very similar experiences with my own loops against testsuites.
Great post. A snapshot in time.
> The agent did not know that would also halve the LUT count. It found out by doing it and watching the synthesizer.
So I guess this is an example of an LLM anthropomorphizing and making wild conjectures about the internal workings of a different LLM.
Pretty much what I did to let Codex with gpt5.4xhigh improve my fairly complex CUDA kernel which resulted in 20x throughput improvement.
I have recursive agent that finds trading strategies after recreating academic research and probing the model using its training on everything. It works really well but I have to force it to write out every line and write a proof that data in the future from the time of the wall clock didn't enter the system. Even then some stupid thing like not converting the timezone with daylight savings will allow it to peek into the future 1 hour. These types of bugs are almost impossible to find. Now there needs to be another agent whose only purpose to write out every line explaining that the timezone for that line of code was correct.
OP's post is basically pointing out what certainly many others have independently discovered: Your agent-based dev operation is as good as the test rituals and guard rails you give the agents.
a fantastic opportunity to become the next next big thing and write a verifier verifier.
at the hypothesized inflexion point where AI instantly performs exactly as commanded, what happens to heavily regulated industries like medical? do we get huge leaps and bounds everywhere EXCEPT where it matters, or is regulation going to be handed over to a verifier verifier?
The devil is in the details. There are an amazing number of details in a good [thing]. Someone somewhere has to say exactly what this [thing] being built actually is.
Read almost any story about wishes from a genie. Simple statements don't work.