Today's Frontier AI companies will never exceed the AI capability frontier again

BoiledCabbage•6 minutes ago

This entire post is crap.

Their graphs don't even support their claims.

Their graph leaders are fusions of frontier models. Not of local models.

I saw one single fusion of local models.

If their claim and Opus 4.8 is a smaller, "older" model then again that's pretty absurd.

What is with all of these claims on HN recently where people want open models to win (which I understand and can support), but have absolutely bs claims supposedly showing it and ignore that the claims are entirely made up.

In this case they have only a single "fusion" of non latest models. They found a metric where Deekseek 4 beats out gpt 5.5 and Opus 4.8 (which is already pretty suspicious to me), and showed that adding two other models it still beats them out and doesn't pass Fable.

What in the world is the actual claim here? Fusion didn't drag it down??

If you found something where Deepseek 4 is already supposedly better than Opus and GPT 5.5 the rest is just smoke and mirrors.

Is HN now just dead internet theory? Nobody can look at this for 2 mins and conclude anything other than it's nonsense. And this is the third topic I've had this conversation on recently on HN. Highly upvoted blogspam, that claims "open models are great" and is heavily stretching the truth or out right intentionally misleading.

I'm not agreeing or disagreeing on open models, I'm saying why do people keep posting clear spam that is clearly misleading on the topic?

blargey•about 1 hour ago

A manic riff on https://xcancel.com/OpenRouter/status/2065856853989270011 , which advertises https://openrouter.ai/fusion/1 , which is a (slow) multi-model multi-prompt workflow that's specific to the "DRACO" benchmark for "deep research", and doesn't say much about coding and long-horizon agentic work, nor does it imply you can somehow parlay this into duct-taping 50 budget-tier models together for even more gains. Not even sure what "solo" even means in the context of the comparison chart - oneshot? Variant workflow since it doesn't make sense to run on one input?

Mixing outputs of different models one way or another is old news, if it were anywhere near as promising as the author dreams it would have exploded many months ago.

jwpapi•about 3 hours ago

This is very interesting and the biggest glimpse of hope I’ve seen the last couple months here. I haven’t really paid attention to Fusion even though I got the email. I didn’t even assume it would be comparable.

JumpCrisscross•about 2 hours ago

I’m so confused. The top fusion is Fable 5 and GPT 5.5. That is not an “ensemble of weaker AI models.”

maniacwhat•about 2 hours ago

What its saying is if you look at any single model, it can be beaten by an ensemble of weaker models. E.g fable 5 is beaten by an ensemble of previous gen models.

JumpCrisscross•about 2 hours ago

I guess so. 4.8 + 4.8 > Fable 5 is interesting, though not particularly game changing. (The others all fuse frontier models. Which is an argument for using those frontier models more. Not less.)

pants2•36 minutes ago

Yeah, all that's really saying is a weaker model with a better harness can beat a stronger model with a worse harness, specifically on the DRACO benchmark

This isn't really a surprising result. Needs more evidence to make a broader claim.

tim-star•about 2 hours ago

i guess the point is that any fusion is better than any single model and a fusion of the top two models is obviously the best? for cost though i guess you could just duct tape together 10 open source models and then thats comparable?

JumpCrisscross•about 2 hours ago

> though i guess you could just duct tape together 10 open source models and then thats comparable?

This is what I was hoping to see data for.

Today's Frontier AI companies will never exceed the AI capability frontier again

⚡ Community Insights

Discussion (9 Comments)Read Original on HackerNews