GPT-5.5 – No ARC-AGI-3 scores

AAG25 about 6 hours ago 3 comments

RU version is available. Content is displayed in original English for accuracy.

Did the model perform poorly and OpenAI decided to not publish arc agi 3 scores? This is honestly the best benchmark right now to measure true intelligence.

⚡ Community Insights

Discussion Sentiment

33% Positive

Analyzed from 134 words in the discussion.

Discussion (3 Comments)Read Original on HackerNews

LouisLau•9 minutes ago

ARC-AGI-3 is interesting, but it’s not necessarily aligned with what models are being optimized for right now (reasoning + usefulness in real tasks).

ForgeSynapse•about 5 hours ago

Spot on. If they had decent ARC-AGI-3 scores, it would be the first slide of their keynote.

Not mentioning it is a massive signal. It just confirms what we've been seeing: brute-forcing parameter counts doesn't solve reasoning. Transformers are great at interpolating training data (which is why MMLU is basically maxed out and useless now due to contamination), but they fail hard at true zero-shot tasks.

You can't hack ARC by just throwing more compute at the pre-training phase. We are hitting the wall of next-token prediction, and until they ship actual test-time compute or System 2 architectures, they will keep failing this benchmark.

casey2•about 5 hours ago

ARC-AGI-3 scoring is really weird, in some views it's already saturated in others it's near 0. But I assume, since the entire benchmark IMO is a PR tool for OpenAI they will publish it eventually.