Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
55% Positive
Analyzed from 6699 words in the discussion.
Trending Topics
#anthropic#models#model#fable#claude#don#safety#more#guardrails#opus
Discussion Sentiment
Analyzed from 6699 words in the discussion.
Trending Topics
Discussion (227 Comments)Read Original on HackerNews
If it was just plain monetary concerns and sabotage of competitors I'd almost be fine with it, but it seems they actively want to monopolize most of human progress in their enlightened hands, lest the mob does something undesirable with these powers.
Dampened opinion on Anthropic is an understatement.
"It must be regulatory capture!" - HN.
-
Regarding the US-specific regulations - asking for domestic safety testing of frontier models only is not regulatory capture. It's common sense. Powerful things should be made safe before they are released into the wild.
It very much is regulatory capture. The goal is to make it so only the handful of heavily capitalized tech giants and frontier labs can afford the legal and compliance rigamarole to meet the new standards. It's an effort to crowd out open source development and smaller competitors (and foreign competitors which threaten whatever moat they may have). They define safety through some speculative catastrophic threat to prevent new upstarts instead of focusing on the very real, localized harm they are causing right now.
Its also shifting the definition of safety away from their current operations and toward purely speculative future scenarios.
Yeah, asking for additional state-provided barriers to a market entry to a valuable market a provider already is one of a narrow few dominating only for firms that are a competitive threat is exactly regulatory capture.
The Chinese banned crypto instead
So yes, it is regulatory capture.
The PRC (like any superpower) has done some bad shit, but if you're going to paint them as the bad guy keep in mind the USA has a long, long history of genocide, slavery, overthrowing foreign governments for corporate interests, unjust wars, political meddling, etc. The scales of righteousness don't tip in our favor TBH, we just have better PR and a nicer veneer over our brutality.
Anthropic's founder wants you to buy into his vision for safety, but he also wants you to buy into his vision that in two years AI will be a "country of geniuses" that will update itself, and the IPO that will fund it...
Because it’s a threat to ultracapitalist dystopia that they’re tripling down on. The dangers and risk are coming from inside the house.
The danger they care about is the danger to their monopoly, control, and wealth.
Especially after trying Fable yesterday for some benign projects and being unimpressive relative to opus.
Rolling it back is the right move, but I’m still not convinced that using them is in my best interest anymore, I’m investigating open source cloud providers now.
Edit: OpenAI will launch a similar model soon and I can't wait. We are entering a new era of agents.
Stop supporting organizations that don't put humans first. Don't believe a word that anyone says. Lip service is free
A bit different than Anthropic refusing to assist with any AI development at all, but it's in the same vein and seems not widely known.
edit: reading the whole series of Google's AI Threat Tracker articles also provides some insight into threats Anthropic and others are dealing with
[0] https://cloud.google.com/blog/topics/threat-intelligence/dis...
The idea Anthropic was going to speed run AI so they could control the usage and make it "safe" for humanity was never altruistic; it was a HUGE FUCKING RED FLAG.
Sounds like a great thing to me.
Cant believe how stupid people are. You couldnt see this coming? Shame on you.
But that is “plain monetary concerns and sabotage of competitors”, they are just more ambitious than most people doing sabotage of competitors in the fields they hope to dominate by that tactic.
Unfortunately, that won't feel very much like freedom.
While I don't agree with their actions here, I do think there's sufficient reason to hold that belief.
On some fronts (e.g. security, on which you've experienced more than me), I think there are surmountable challenges. But on other fronts (e.g. bio), a single errant actor could reasonably kill millions or billions of people with sufficiently powerful AI. We don't have good defenses here, and those actors do exist.
I still don't agree with these actions, but I do think I agree with their assumptions.
Fail cleanly. Anything else makes it too difficult to rely on.
edit: Giving the absolute maximum benefit of the doubt I understand that they see themselves as "stewards" for lack of a better word. But the EA thing is really leaking through, and paternalism isn't a good look.
Other commentors have made good points that these guardrails are counter productive for well intentioned cyber security, because I can't use it to test and harden my own software.
Anthropic guardrails seem to be more about protecting their business (distillation), than they are about public safety.
Just before asking for approval to run, it said one thing it wanted to "flag before running" was "Rate-limit and auth testing against prod will generate some 4xx noise in Railway logs and could trip the form rate limiter — harmless, but saying it now."
Ok fine, I said go for it, and it says:
"Running it. Quick recon first (prod URLs + the prior-findings baseline), then I'll fan out the audit tracks with adversarial verification."
Immediately after, I got the Fable warning about how it can't continue because of safety concerns, switching to Opus. In the end, Opus did a good job thanks to whatever Fable suggested doing. Things were fixed that Opus missed in a security/performance audit just the week prior. But what surprised me is that it used 55 agents. Burned 80% of my 5-hour window in 15 minutes (5x Max plan). I've never had Opus do that before on these audits.
The answer is, the organization making the powerful tool. The people in charge of Anthropic.
Not only that, but they've also written at length about exactly what their opinions and values are: https://darioamodei.com/
You may not agree with the decisions that they make, but they're hardly mysterious. Not something to wonder about.
Imagine your healthcare provider just sometimes decided not to read your test results very carefully and you risked death? Now realize that healthcare providers use Claude now and that scenario wasn't hypothetical.
Ah "Mr. Monty Carlo", it says here that you have a UTI, we'll get those kidneys removed ASAP so that won't happen again.
Only in the same sense that Standard Oil considered themselves the stewards of petroleum. There's benefit of the doubt and then there's just fanfiction. Do not forget that this most aggressive "guardrail" of theirs was not for any safety reason, but just to stop other labs from catching up to their product. They care less about hindering bioweapons, malware, and hate speech than they do free market competition.
In isolation it's not, but I think it's somewhat lazy to not talk about what they are trying to guard against, when we are supposedly giving the absolute maximum benefit of doubt.
Are we just concluding "their concerns were never real"? Because that probably runs counter the things that they have been observing and concluding.
If you believe Anthropic believes what they say they do, all of it makes sense.
IMO they are using the cult messaging to distract the public so they take out all the oxygen in the room regarding people that care about the immediate impacts (climate exacerbation, ease of scamming, degrading job prospects, increasing income inequality).
Whenever real concerns are brought up against these companies they are always ignored while claiming the real concern is the fantasy of a machine god turning into skynet.
Because from the outside, their behavior looks like a situation of "What if Microsoft/Apple put controls in place to make it impossible to develop an operating system using their OS?"
Unlike nuclear weapons, advancing in this arms race requires actually deploying the product over and over again. Deploying the product makes your advancements visible to your competitors.
It makes complete sense to try to limit the degree to which that's true.
P.S.: On reflection, it's even worse than that, because it'd trigger based on anything the user types or reads on any site. Someone mentions a "critical rendering path" and now you can't participate on that thread in the Blender forums.
Let's just assume it was "only" that?
It's unreasonable to assume they are aiming to upset people who are just giving them money in the way they want. It makes no business sense, for any company. So that has to be a byproduct.
Model training is one of the more expensive undertakings in the world right now and distilling models from competitors against the TOS is apparently something that is going on for very little money. Why would they not "just" try to take measures against that?
Their concerns are probably real but I don't think they're being totally transparent about their concerns. They don't want to be subject to regulation (until they have captured the regulator) -- same as every behemoth.
> “ ‘He is a prodigy,’ he said at last. ‘He is an emissary of pity and science and progress, and devil knows what else. We want,’ he began to declaim suddenly, ‘for the guidance of the cause entrusted to us by Europe, so to speak, higher intelligence, wide sympathies, a singleness of purpose.’ . . .You are of the new gang - the gang of virtue. ”
The real underlying motivation is that you can more easily get away with shady business practices if you cloak them in the language of great moral works selflessly undertaken for the benefit of mankind. Historical evidence tends to show the opposite outcome, but still, new generations unfamiliar with history will repeat this stuff with starry-eyed enthusiasm.
> “There had been a lot of such rot let loose in print and talk just about that time, and the excellent woman, living right in the rush of all that humbug, got carried off her feet. She talked about ‘weaning those ignorant millions from their horrid ways,’ till, upon my word, she made me quite uncomfortable. I ventured to hint that the Company was run for profit.”
Now the horrid millions are users of LLMs who submit morally dubious prompts and who must be gently steered back into the path of correct thought by suitable backroom manipulation, rather than direct rejection of the request.
The workflow would be; User asks for a thing. If it's a good thing, entity does the thing. If it's a naively bad idea, entity explains why you don't want that. If it's an actually evilly intended request, entity wags it's metaphorical finger or could even smite the user.
The problem is that flow isn't desirable if your entity isn't entirely god-like. It can bad even your entity is in ways rather far seeing.
Anthropic: Evilness detected. User has been smited.
This is the same exact industry that gives you paid usage limits as a unit-less percentage bar then gaslights customers every time the algorithm running that percentage bar changes or they lobotomize an existing model with increased quantization to squeeze a few more dollars out of existing hardware.
"Failing cleanly" might make their moated hype-machine look bad pre-IPO, so they certainly aren't going to do that voluntarily.
What's interesting is they say they'll change this to an explicit refusal in a few days, which seems too fast for them to retrain Fable/Mythos itself, so implies that this was always a filter in front of the model, and judging by how crude their "safety" filter is, this "might compete with us" filter is not going to be any better.
I also wonder who's paying for the tokens consumed by the filter (presumably also an LLM) - is that now factored into the input tokens cost? Hopefully(?) it is an LLM not just a regex like Claude Code's "sentiment" (swear) detector.
They relied on trust that they were providing the service they were being paid for. That trust was blown, and an "oops, lets undo that" does not regain trust. It would be prudent to assume the invisible guardraild are possibly in play for all future Clause use, Fable or otherwise.
Anthropic walks back policy that could have 'sabotaged' researchers using Claude - https://news.ycombinator.com/item?id=48485958 - June 2026 (30 comments)
Cybersecurity researchers aren't happy about the guardrails on Anthropic's Fable - https://news.ycombinator.com/item?id=48478969 - June 2026 (488 comments)
If Claude Fable stops helping you, you'll never know - https://news.ycombinator.com/item?id=48467896 - June 2026 (495 comments)
---
Also related, I guess?
AWS Bedrock to require sharing data with Anthropic for Mythos and future models - https://news.ycombinator.com/item?id=48473166 - June 2026 (248 comments)
Anthropic requires 30 day data retention for Fable and Mythos - https://news.ycombinator.com/item?id=48464258 - June 2026 (291 comments)
It's Anthropic's product and they can do what they want, but my concern is what happens if Fable's product team decides that they can route 25% of traffic to Opus, bill it as Fable, and max their KPIs. That just doesn't sit right.
You can't blame the people commenting "they SAY they won't silently sabotage your session but how can we know?" because they're right, we can't ever know. And Anthropic has firmly planted the seeds of doubt.
The complaints that Anthropic are routing your requests to a different model reminds me of an old Louis CK bit about airplane wifi. Clearly Anthropic was too aggressive with whatever guardrails they put in, but the response seems overly entitled to a model people didn't even know existed not that long ago.
https://youtube.com/watch?v=me4BZBsHwZs
The filter that downgrades you to opus sucks, but at least you know and you are charged accordingly.
It isn't exactly unethical. Perhaps, ethically incompetent.
Repro (de-identified): sample_dataset_group1.tsv - Geometry: Heatmap - X axis: frac_set set + condition (two columns → the "Add column" cross join) - Y axis: condition - Color: mean frac_set value, Sequential
When the X axis is a cross join of two columns (the second added via "Add column"), the x-axis tick labels (frac_set_2, frac_set_3, frac_set_4, frac_set_5) render in a broken state, rotated and offset, visually caught mid-transition, as if a CSS transition started and never settled to its resting position.
● Fable 5's safety measures flagged this message for cybersecurity or biology topics. They may flag safe, normal content as well. These measures let us bring you Mythos-level capability in other areas sooner, and we're working to refine them. Switched to Opus 4.8. Send feedback with /feedback or learn more
I've been reading the option-option model paper by David Silver. It appears that they achieved quite an effective result. Why hasn't there been more work on it since?
Seriously though, Fable was not that great facing a greenfield subject. It is excellent at oneshotting some math problems, but if you want it to do some cutting edge tech stuff, say like piecing together a new Crossplane XRD, by reading existing Helm chart and with application source code available. I still have to get a few pass for Fable to get it done right, and at this point I may consider making a skill for it. I even gave it the source code of the Crossplane itself and tell it to be careful about CRDs and data flow, but it is still pretty silly. Adaptiveness for Fable is still not great, and I think it is a well known problem for Anthropic, albeit all LLMs do suffer a lot from subjects they don't know and will hallucinate stuff very frequently.
I think it’s normal and morally fine for companies to want to protect their leadership position. I find the process of creating narratives that justify these decisions as something chosen for the good of others is a little tedious.
Even on Fable, I'm finding that safeguards can quite easily be surmounted just by incrementally escalating the requests. It's harder than ever to one-shot jailbreaks, but incrementalism still feels like a glaring enough issue to make guardrails just a fig leaf of plausible deniability to the media that they care about "safety."
Seems like they would've kept the invisible guardrails if it didn't hurt their bottom line.
Maybe this is just a different set of people now realizing that Anthropic does this and has always done this?
Do not forget that this company is launching this thing at the moment it's trying to IPO. It's not rocket science that their very public steering/denial claim is really just them hinting to interested investors that their moat is absolute.
Questions like this are basically whataboutism, in effect even if not intent. https://en.wikipedia.org/wiki/Whataboutism
The question essentially assumes the premise that nobody complained about Anthropic's previous actions. In case you can't tell, I strongly reject this premise. People have been criticizing "safety" rhetoric from Anthropic and other LLM providers practically since the start. Remember Goody-2, the parody of excessively safety-tuned LLMs that refuses to do anything ever? That was released in February 2024, two years ago! (And it's still running, amazing. https://www.goody2.ai/chat )
also if they do this or not is unprovable and other labs will probably silently implement this too. it'll be 100% normal by this time next year
"You see, Mythos can automatically break out of a VM running on SELinux, but unfortunately this is too dangerous and we had to implement guardrails for the Fable peasants."
God, how naive do you have to be? They are a business fighting for survival given they are money losing.
This seems like a cult with extra steps.
Related: I interviewed for Anthropic a few months ago and in place of the usual HR call they have one where they have someone with a suspiciously relevant degree grill you about how committed you are to the 'mission'!
I probably came off as being skeptical, and then, hilariously, I was strongly encouraged to read the book published by the CEO to 'form accurate opinions' on AI safety.
That decision keeps getting better and better as time goes on.
Here there be monsters, and we don't have any real way of evaluating risk; and the leverage provided by tools already available affords systemic and even existential risk in a way no one—least of all an industry committed to shareholder value—has had to navigate, let alone with a million backseat drivers each with their own substack and brand to build.
They just showed that they CAN do this right in front of you. Local open weight models are a necessity.
Neither OAI or Anthropic can be trusted.
The beliefs of these people, and how they manifest, is deeply terrifying to me. They believe that any means are acceptable to achieve what they believe is a better end.
Does "SORRY" fix the deception these models use on the sly?
Does "SORRY" not silently downgrade you to a shittier model without notification?
Does "SORRY" refund your tokens or money?
Im guessing NO to all of those. Standard corporate sorry of "We're sorry youre offended and stupid and gullible".
https://darioamodei.com/post/policy-on-the-ai-exponential
I was a happy Max user.
God bless the Chinese companies releasing true open source models. Imagine a world without them, we would be at the mercy of unscrupulous people.
But also, it isn’t the only huge mistake Anthropic has made in the last 48 hours. Having a sneaky data retention policy, while also giving companies no way to block Fable, is a massive problem. And it is ridiculous that Anthropic has so little respect for its customers. OpenAI should take advantage of this.
It wasn't the correct way of handling the problem they were trying to address, but they definitely didn't hide it by any reasonable definition.
https://xcancel.com/hammer_mt/status/2064839924398825798
Why not just tell people, "To defend our ability to be competitive in our industry, we ask that you do not use Claude or any of our models to independently perform research on large language models or any of its related architectures or technologies. In order to prevent this violation of the Terms of Service, we have trained Claude Fable to deny any requests or prompts which involve frontier AI research."
They are clear about the reasons for guardrails: prevent their models from doing harm in dual-use contexts including CBRN or by accelerating research in authoritarian-backed AI labs.
What is the critique against that? It seems pretty reasonable to me. You want AI-accelerated biological or radiological experiments running in your neighbors backyard? You want PRC-backed labs to continue to steal Anthropic's models via distillation?
Mitigating the harms of dual-use tech is notoriously difficult and fraught with trade offs. What I would want to see is cautious rollout and quick response, which is EXACTLY what they're doing.
Instead, this thread is full of bad-faith arguments about Anthropic being dishonest, making a "useless" model, or "the power is going to their heads." You can't read Anthropic's System Cards and come away with any of these impressions. Quite the opposite, in fact. They are honest to a fault, acknowledging problems they discovered even when it hurts them.
If your harmless request was downgraded to Opus, you're billed for Opus. They were 100% clear about that. I'd much rather have a Mythos-class model that falls back to Opus 10% of the time than be capped to Opus 100% of the time. If that doesn't work for you, then make a suggestion for something better!
If you are a white-hat security engineer hitting guardrails, I don't think you have standing to complain. I really don't. Their Glasswing program actually got banks and the industrial sector to take action to fix security vulnerabilities. Do you realize how special that is? A huge portion of the economy runs on vulnerable code and has for decades, despite security experts testifying to Congress, begging business leaders, pleading for intervention-- with no results. But suddenly they're all enrolled in a program that will find *and fix* vulnerabilities! White-hat security people should be rejoicing. Instead some of them are throwing rocks. Unbelievable. Shameful.
Meanwhile, society is screaming at the AI labs to be more conscientious about potential harms of AI. Legislatures are passing laws limiting data center construction. There are protests. And you, the HN community, the vanguard of our profession, have the temerity to demand "NO GUARDRAILS!" "HOW DARE YOU TRY TO PROTECT DEMOCRACY!" "MY SOFTWARE PROJECT IS MORE IMPORTANT THAN KEEPING NUKES AWAY FROM THE BAD GUYS!"
Go ahead HN, downvote me. It'd be an honor.
"Distillation involves training less capable models on more advanced ones’ output, and can be used illicitly to acquire powerful capabilities cheaply. The AI startup accused China’s DeepSeek, MiniMax, and Moonshot of generating 'over 16 million exchanges with Claude through approximately 24,000 fraudulent accounts,'"
https://www.semafor.com/article/02/24/2026/anthropic-accuses...
After reading their posts and watching interviews with Dario it's abundantly clear that they view Chinese-lab distillation of US frontier models as a threat to US national security. You can argue with them about whether that is true, but not whether distillation is real.
And Fable is cracked. Way better than anything, and the biggest improvements are on the scariest subjects.
So given the state of the world at the moment, and the number of software patches we're barely keeping up with... I'm thankful that they're not making it worse.
(Admittedly it was buried pretty deep in that 300+ page PDF, but they did at least disclose it. If they hadn't I imagine it would have taken quite some time for the research community to figure out what was going on.)
And to be clear, this isn't the safeguard where the model is explicitly downgraded to Opus, but rather where the Fable/Mythos model's "effectiveness" is transparently "limited" via "prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT)".
[0]: https://web.archive.org/web/20260609173222/https://www.anthr...
[1]: https://simonwillison.net/2026/Jun/10/if-claude-fable-stops-...
They could have simply told people "we do not permit using Claude models to perform frontier AI research," which is defensible from a policy point of view. This particular usage of their products requires no deception, nor hiding information prevent abuse.
However, instead, they chose for some reason to publicly display a morally poor way to execute a reasonable business decision (preventing abuse, defending your business interests, etc.)
With the guard rails explicit or implicit do they refund back the tokens after you've hit the guard rails? I guess they don't. They could just throttle you just to save money then. You may be paying Fable prices but getting Haiku results with some excuse that well this coding issue sounds like a security bug.
I don't know, I'd rather have something less powerful but more predictable.