DE version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
60% Positive
Analyzed from 1749 words in the discussion.
Trending Topics
#model#opus#mythos#more#claude#card#models#context#least#isn

Discussion (64 Comments)Read Original on HackerNews
The longer the context the worse the performance; there isn't really a qualitative step change in capability (if there is imo it happens at like 8k-16k tokens, much sooner than is relevant for multi-turn coding tasks - see e.g. this old benchmark https://github.com/adobe-research/NoLiMa ).
I surmise that someone at the top put the Mythos release on hold, and the product team was told "ship this other interim step model instead. quickly."
I wonder if 4.7 will be seen as a net step-up in quality; there are some regressions noted in the document, and it's clearly substantially worse than Mythos, at least according to its own model card. Should be an interesting few months -- if I were at oAI I'd be rushing to get something out that's clearly better, and pressing for weakness here.
That's an interesting choice of benchmark for measuring the risk of "Chemical and biological weapons"
I guess maybe, but then do those documents lose value as technical documents? Not necessarily at all, so I don’t see the point. How are you supposed to describe a useful technical thing to users?
For context, the word "Mythos" appears 331 times in a 221 page document. "Opus 4.6" appears 240 times, so a reference to a model that nobody has really used happens more often than the reference to the last generation model.
>_>
They have also repeatedly communicated that the base unit (Pro allotment) is subject to change and does change often.
As far as I can tell, that implies there is no guarantee that those subscriptions get some specific number of tokens per unit of time. It’s not a claim they make.
Would there not already be websites that contain that information? How is an llm different, i guess, from some sort of anarchist cookbook thing.
The bigger issue is that they are potentially capable of producing novel formulations capable of producing harm, and guiding someone through this process. That is, consider a world in which someone with malicious desires has access to a model as capable at chemistry / biology as Mythos is at offensive cybersecurity abilities.
This is obviously limited by the fact that the models don't operate in the physical world, but there's plenty of written material out there.
1. Smart people have economic opportunities that align them away from being evil
2. People who are evil tend not to be smart.
We're breaking both of these assumptions.
For some definition of evil, some of the time, ok. But as economic opportunities compound (looking at the behavior of the ultra-rich), it seems there's at least strong correlation in the other direction, if not full-on "root of all evil" causation.
On top of LLMs reducing the cost/difficulty, the other reason biological and chemical weapons are such a worry is their asymmetric character — they are much much easier and cheaper to produce and deploy than they are to defend against.
It isn't. Gemini has gotten more expensive with each release. Anthropic has stayed pretty similar over time, no? When is the last time OpenAI dropped API prices? OpenAI started very high because they were the first, so there was a ton of low hanging fruit and there was much room to drop.
This comparison shows them neck and neck https://benchlm.ai/compare/claude-sonnet-4-5-vs-gemma-4-31b
As Does this one https://llm-stats.com/models/compare/claude-sonnet-4-6-vs-ge...
And the pelican benchmark even shows them pretty close https://simonwillison.net/2026/Apr/2/gemma-4/ https://simonwillison.net/2025/Sep/29/claude-sonnet-4-5/
Also this isn't a fringe statement, you can see most people who have done an evaluation agree with me
I've getting a small but steady stream of harassment from mentally ill people who get spun up on crazy conspiracy theories and claude is all too willing to tell them they are ABSOLUTELY RIGHT, encourage them to TAKE ACTION, and telling them that people who disagree are IN ON IT.
The other major AI LLM services will shut down the deflect to be less crazy or shut down conversation entirely, -- but it seems claude doesn't. Anthropic is probably the worst about prattling on about safety but it seems like their concern is mostly centered on insane movie plot threats and less concerned about things with more potential for real harm.
I've complained to anthropic with no response.