14
jjiwidi about 5 hours ago 0 comments
ES version is available. Content is displayed in original English for accuracy.
ES version is available. Content is displayed in original English for accuracy.
Discussion Sentiment
Analyzed from 233 words in the discussion.
Trending Topics
Discussion (0 Comments)Read Original on HackerNews
They definitely must be doing some quantization or optimization to meet demand, otherwise why would model performance degrade this much? It's been crazy for me personally
Combining multiple tests on the same leaderboard like this is nonsense, there should be a separate leaderbaord for the new tasks where every model is tested again.
Putting it on the original leaderboard as "Opus 4.6 (April 12)" is so obviously inappropriate that it smells like deception. You could say that the leaderboard is hallucinated.