Advertisement
Advertisement
β‘ Community Insights
Discussion Sentiment
50% Positive
Analyzed from 171 words in the discussion.
Trending Topics
#loop#llm#training#code#repo#repair#iterative#prompting#reinforcement#learning
Discussion Sentiment
Analyzed from 171 words in the discussion.
Trending Topics
Discussion (7 Comments)Read Original on HackerNews
The idea is apparently that a model that is bad at fixing its own mistakes might become better if you train it on this task using reinforcement learning.