Back to News
Advertisement

GPT-5.5 – No ARC-AGI-3 scores

AAG25 about 6 hours ago 4 comments

HI version is available. Content is displayed in original English for accuracy.

Did the model perform poorly and OpenAI decided to not publish arc agi 3 scores? This is honestly the best benchmark right now to measure true intelligence.
Advertisement

⚡ Community Insights

Discussion Sentiment

50% Positive

Analyzed from 191 words in the discussion.

Trending Topics

#arc#agi#wall#models#reasoning#tasks#training#compute#benchmark#hit

Discussion (4 Comments)Read Original on HackerNews

LouisLau11 minutes ago
ARC-AGI-3 is interesting, but it’s not necessarily aligned with what models are being optimized for right now (reasoning + usefulness in real tasks).
ForgeSynapseabout 5 hours ago
Spot on. If they had decent ARC-AGI-3 scores, it would be the first slide of their keynote.

Not mentioning it is a massive signal. It just confirms what we've been seeing: brute-forcing parameter counts doesn't solve reasoning. Transformers are great at interpolating training data (which is why MMLU is basically maxed out and useless now due to contamination), but they fail hard at true zero-shot tasks.

You can't hack ARC by just throwing more compute at the pre-training phase. We are hitting the wall of next-token prediction, and until they ship actual test-time compute or System 2 architectures, they will keep failing this benchmark.

ragequittah1 minute ago
I've been reading that we've hit a wall since chatgpt 3.5 [1]. Then 6 months later when the models are significantly better the goalposts are moved and we've hit the wall again. It's a very strange thing to watch so many people be so confidently incorrect so many times in a row. Not even saying you're wrong, just that historically this argument has been a losing bet.

[1]https://www.forbes.com/sites/lanceeliot/2023/04/26/openai-ce...

casey2about 5 hours ago
ARC-AGI-3 scoring is really weird, in some views it's already saturated in others it's near 0. But I assume, since the entire benchmark IMO is a PR tool for OpenAI they will publish it eventually.