DE version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
47% Positive
Analyzed from 1769 words in the discussion.
Trending Topics
#model#system#prompt#prompts#claude#anthropic#tool#malware#more#tools

Discussion (42 Comments)Read Original on HackerNews
Uff, I've tried stuff like these in my prompts, and the results are never good, I much prefer the agent to prompt me upfront to resolve that before it "attempts" whatever it wants, kind of surprised to see that they added that
Edit: That said, it's entirely possible that large and sophisticated LLMs can invent some pretty bizarre but technically possible interpretations, so maybe this is to curb that tendency.
To me too, if something is ambigious or unclear when I'm getting something to do from someone, I need to ask them to clarify, anything else be borderline insane in my world.
But I know so many people whose approach is basically "Well, you didn't clearly state/say X so clearly that was up to me to interpret however I wanted, usually the easiest/shortest way for me", which is exactly how LLMs seem to take prompts with ambigiouity too, unless you strongly prompt them to not "reasonable attempt now without asking questions".
It's a particularly sensitive issue so they are just probably being cautious.
So spending $50M to fund a team to weed out "food for crazies" becomes a no-brainer.
Letting the system improve over time is fine. System prompt is an inefficient place to do it, buts it's just a patch until the model can be updated.
The wider implication: this is Anthropic admitting the tool-list-in-the-system-prompt model doesn't scale. Once you have dozens of specialised tools (remote MCPs, custom agents, per-workspace plugins), you can't fit them all into the context window's tool slots at initialization. You need a searchable tool registry and a mechanism for the model to pull tools on demand.
MCP's tools/list pagination (added in the 2025-06-18 spec) is the protocol-level version of the same idea. Clients that actually use paginated tool loading + dynamic tool fetching haven't taken off yet — most still flatten all tools into the initial handshake. The tool_search system-prompt entry is Anthropic's nudge for the model itself to handle deferred tools smarter.
Also full of "can" and "should" phrases: feels both passive and subjunctive as wishes, vs strict commands (I guess these are better termed “modals”, but not an expert)
So I'm guessing they want none of the model users (webui + API) to be able to do those things, rather than not being able to do that just in the webui. The changes mentioned in the submission is just for claude.ai AFAIK, not API users, so the "disordered eating" stuff will only be prevented when API users would prompt against it in their system prompts, but not required.
It gets pretty efficiently cached, but does eat the context window and RAM.
The malware paranoia is so strong that my company has had to temporarily block use of 4.7 on our IDE of choice, as the model was behaving in a concerningly unaligned way, as well as spending large amounts of token budget contemplating whether any particular code or task was related to malware development (we are a relatively boring financial services entity - the jokes write themselves).
In one case I actually encountered a situation where I felt that the model was deliberately failing execute a particular task, and when queried the tool output that it was trying to abide by directives about malware. I know that model introspection reporting is of poor quality and unreliable, but in this specific case I did not 'hint' it in any way. This feels qualitatively like Claude Golden Gate Bridge territory, hence my earlier contemplation on steering vectors. I've been many other people online complaining about the malware paranoia too, especially on reddit, so I don't think it's just me!
Of course it's also been noted that this seems to be a new base model, so the change could certainly be in the model itself.
edit: to be fair Anthropic should be giving money back for sessions terminated this way.
I asked it for one and it told me to file a Github issue.
Which I interpreted as "fuck off".
My concern is these models revert all medical, scientific and personal inquiry to the norm and averages of whats socially acceptable. That's very anti-scientific in my opinion and feels dystopian.