Anthropic Walks Back Policy That Could Have 'Sabotaged' Researchers Using Claude

eericflo about 3 hours ago 16 commentsRead Article on wired.com

RU version is available. Content is displayed in original English for accuracy.

⚡ Community Insights

Discussion Sentiment

40% Positive

Analyzed from 735 words in the discussion.

Discussion (16 Comments)Read Original on HackerNews

nmfisher•about 2 hours ago

Call me a cynic, but I don't believe this is a genuine change of heart at all. It feels much more like a panicked response to something that might undermine their IPO.

Even if you trust Anthropic today (which I don't), they clearly don't want competition and there's no telling what other shady moves they'll pull in future.

The only sustainable way forward is to support open models. I was already on the fence about whether or not to keep my Max subscription (the extra cost over something like DeepSeek V4 didn't really feel justifiable). This is the tipping point for me, I'll be cancelling my sub before it renews at the end of the month.

reasonableklout•about 1 hour ago

I guess I don't understand why it's shady. It seems more like a poorly executed decision to enforce a publicly stated policy (it's been against Anthropic's ToS to use their models on frontier ML research for a while now). After all, people found out about this through their published system card.

It is definitely a bad idea to do this without notifying the user, because users who are incorrectly affected will have no way of providing feedback or getting support. And it is also anticompetitive, but if you truly believe that AI is not a normal technology, it is rational.

nmfisher•3 minutes ago

It's shady because they were going to silently poison your outputs.

It's actually worse than it sounds initially, because Fable isn't actually omniscient when it comes to safety classification. Many people (myself included) had refusals or fallback to Opus 4.8 for seemingly compliant/innocuous requests.

Wouldn't you be pissed off if they decided to sabotage your project despite having done nothing wrong?

LoganDark•about 2 hours ago

I think they are legitimately convinced that this model is so dangerous it could destroy the world and that they genuinely have the responsibility to prevent it from assisting other models to destroy the world.

I don't think I agree that I should be forbidden from e.g. patching a binary to work on the latest macOS since the company behind it died and intentionally installed a time-based kill-switch (FUCK ADOBE for popularizing that practice). But ooOOooOOoo working with machine code is so cybersecurity and therefore suspicious.

chatmasta•about 1 hour ago

The company was founded basically out of the effective altruism movement.

LoganDark•about 1 hour ago

What is the impact of effective altruism? I looked it up, but I don't understand how it differs from simple logical consideration, i.e. how it would be responsible for any of Anthropic's eccentricities.

SilverElfin•42 minutes ago

> I think they are legitimately convinced that this model is so dangerous it could destroy the world and that they genuinely have the responsibility to prevent it from assisting other models to destroy the world.

Do they really believe that? Or do they just want to control this technology exclusively with moves like this and with pushing for regulatory capture after complaining about safety all the time? Didn’t Dario say that GPT2 or GPT3 would present a similar destroy the world level of danger?

impulser_•about 3 hours ago

It's kinda weird to think the Chinese AI labs might be more trust worthy than the US labs.

- Anthropic is ran by a bunch of nut jobs.

- OpenAI is ran by a guy you can't trust.

I don't even know if we should include DeepMind, Meta, or xAi in the conversation of AI labs at this point since they can't produce models better than Chinese labs.

reasonableklout•about 2 hours ago

To be fair, nerfing Claude on frontier research tasks is consistent with Anthropic's stated beliefs. So in that sense you can trust them to always behave consistently if strangely. But this launch was done very poorly with the lack of transparency on when the frontier research policy was violated.

impulser_•about 1 hour ago

Yeah and their belief are fucking crazy and dangerous. They are literally sabotaging their users. They built in malware into their model if you prompt it about training a fucking AI model. It doesn't tell you, no it literally sabotages you by editing your prompt and intentionally goes against your request.

You want fucking nut jobs like this building models?

It's one thing to build safeguards on your model and have it prompt the user back. I'm sorry I can't help you with this request. Chinese models do this for some requests.

It's another thing to actively try to make the model perform worst for your user on purpose because it asked the model to do something you, the model creator, didn't like.

Imagine someone is asking a logical medical question and the model swaps the prompt and purpose being less intelligent and gives bad advice to this person.

How do these people not understand they are stupid.

LoganDark•about 2 hours ago

https://archive.ph/yxYhU

dtj1123•38 minutes ago

Great, now just let me use it for bioinformatics and we're good.

rvz•about 3 hours ago

Too late.

We now all know that Anthropic CAN do that if they want to. The fact that they told you upfront about it shows that their arrogance on this self-sabotage against their customers is at stratospheric levels.

Believe them the first time, and they are not your friends at all.

SilverElfin•about 2 hours ago

The damage is done. They designed Fable to be dishonest and sabotage-y. Look at this report of Claude now randomly changing AI software without being asked to:

https://xcancel.com/hammer_mt/status/2064839924398825798

This is so completely dishonest. But it also shows how deeply anti competitive Anthropic is. They will talk about safety but it’s not actually about safety: building features like this seems intended to hurt competition in the AI space. They don’t mind if AI helps YOUR competitor but if it means competition for them, they suddenly have a problem with it.

I don’t care that they walked this back. They’ve shown who they are. And what they’re capable of.