HI version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
67% Positive
Analyzed from 264 words in the discussion.
Trending Topics
#models#different#deterministic#runs#results#model#nondeterministic#therefore#pretty#normal

Discussion (12 Comments)Read Original on HackerNews
The models are nondeterministic, and therefore it's pretty normal for different runs to give different results.
I don't see this as evidence that Opus 4.6 has gotten worse.
And how is that an excuse?
I don't care about how good a model could be. I care about how good a model was on my run.
Consequently, my opinion on a model is going to be based around its worst performance, not its best.
As such, this qualifies as strong evidence that Opus 4.6 has gotten worse.
> And how is that an excuse? […] this qualifies as strong evidence…
This qualifies as nothing due to how random processes work, that’s what the gp is saying. The numbers are not reliable if it’s just one run.
If this is counter-intuitive, a refresher on basic statistics and probability theory may be in order.
Come on Anthropic, admit what you're doing already and let us access your best models unhindered, even if it costs us more. At the moment we just all feel short-changed.