GPT-5.5 – No ARC-AGI-3 scores
8
AAG25 about 6 hours ago 4 comments
HI version is available. Content is displayed in original English for accuracy.
Did the model perform poorly and OpenAI decided to not publish arc agi 3 scores? This is honestly the best benchmark right now to measure true intelligence.

Discussion (4 Comments)Read Original on HackerNews
Not mentioning it is a massive signal. It just confirms what we've been seeing: brute-forcing parameter counts doesn't solve reasoning. Transformers are great at interpolating training data (which is why MMLU is basically maxed out and useless now due to contamination), but they fail hard at true zero-shot tasks.
You can't hack ARC by just throwing more compute at the pre-training phase. We are hitting the wall of next-token prediction, and until they ship actual test-time compute or System 2 architectures, they will keep failing this benchmark.
[1]https://www.forbes.com/sites/lanceeliot/2023/04/26/openai-ce...