Back to News
Advertisement
Advertisement

⚡ Community Insights

Discussion Sentiment

50% Positive

Analyzed from 171 words in the discussion.

Trending Topics

#loop#llm#training#code#repo#repair#iterative#prompting#reinforcement#learning

Discussion (7 Comments)Read Original on HackerNews

madanparasabout 15 hours ago
The "RL repair loop" is iterative LLM prompting with stderr feedback, not reinforcement learning. There is no training code, no reward function, and no environment in the repo. The loop also freezes the scene spec and only regenerates code, so if the planner specified 12 objects that geometrically do not fit on screen, three repair attempts will not help.
yorwbaabout 12 hours ago
There's no training code because the author is using an external service for that https://docs.primeintellect.ai/hosted-training/getting-start... The reward function is https://github.com/HarleyCoops/Math-To-Manim/blob/d1c412d22a... The environment is iterative LLM prompting.

The idea is apparently that a model that is bad at fixing its own mistakes might become better if you train it on this task using reinforcement learning.

tptacekabout 15 hours ago
Thanks, I was wondering what this README could have meant by "RL loop" here.
geuisabout 15 hours ago
Entire article reads as output from a well structured prompt. It's almost point for point style-wise when I ask for a summary for current repo changes before deciding to do the commit.
xigoiabout 5 hours ago
Let me guess, it’s the 4294967296th LLM wrapper.
iosoviabout 13 hours ago
Did anybody else notice the random "Christian" in the README?
sheeptabout 13 hours ago
probably intended to be a signature for the preface, since Christian is the repo owner's name, that turned into a list because of markdown syntax