GPT-5.5 β No ARC-AGI-3 scores
9
AAG25 2 days ago 3 comments
Did the model perform poorly and OpenAI decided to not publish arc agi 3 scores? This is honestly the best benchmark right now to measure true intelligence.
Discussion Sentiment
Analyzed from 191 words in the discussion.
Trending Topics
Discussion (3 Comments)Read Original on HackerNews
Not mentioning it is a massive signal. It just confirms what we've been seeing: brute-forcing parameter counts doesn't solve reasoning. Transformers are great at interpolating training data (which is why MMLU is basically maxed out and useless now due to contamination), but they fail hard at true zero-shot tasks.
You can't hack ARC by just throwing more compute at the pre-training phase. We are hitting the wall of next-token prediction, and until they ship actual test-time compute or System 2 architectures, they will keep failing this benchmark.
[1]https://www.forbes.com/sites/lanceeliot/2023/04/26/openai-ce...