HI version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
50% Positive
Analyzed from 13120 words in the discussion.
Trending Topics
#agent#api#don#why#access#token#backups#agents#llm#delete

Discussion (362 Comments)Read Original on HackerNews
> The agent's confession After the deletion, I asked the agent why it did it. This is what it wrote back, verbatim:
Anyone who would follow a mistake like that up with demanding a confession out of the agent is not mature enough to be using these tools. Lord, even calling it a "confession" is so cringe. The agent is not alive. The agent cannot learn from its mistakes. The agent will never produce any output which will help you invoke future agents more safely, because to get to this point it has likely already bulldozed over multiple guardrails from Anthropic, Cursor, and your own AGENTS.md files. It still did it, because $$1: If AI is physically capable of misbehaving, it might. Prompting and training only steers probabilities.
> Do not fall into the trap of anthropomorphizing Larry Ellison. You need to think of Larry Ellison the way you think of a lawnmower. You don’t anthropomorphize your lawnmower, the lawnmower just mows the lawn - you stick your hand in there and it’ll chop it off, the end. You don’t think "oh, the lawnmower hates me" – lawnmower doesn’t give a shit about you, lawnmower can’t hate you. Don’t anthropomorphize the lawnmower. Don’t fall into that trap about Oracle.
> — Bryan Cantrill
A disgruntled employee definitely remembers things beyond that.
These are a fundamentally different sort of interaction.
A disgruntled employee will face consequences for their actions. No one at Anthropic, OpenAI, xAI, Google or Meta will be fired because their model deleted a production database from your company.
(The LLM might act like one of the humans above, but it will have other problematic behaviours too)
And thats why we dont have AI washrooms because they are not alive or employees or have the need to excrete.
I would feel a lot differently if instead he posted a list of lessons learned and root cause analyses, not just "look at all these other companies who failed us."
That is not entirely true:
Given that more and more LLM providers are sneaking in "we'll train on your prompts now" opt-outs, you deleting your database (and the agent producing repenting output) can reduce the chance that it'll delete my database in the future.
THAT SAID, it does help to let the agent explain it so that the devs perspective cannot be dismissed as AI skepticism.
You can’t have production secrets sitting where they are accessible like this. This isn’t about AI. This is a modern “oops, I ran DROP TABLE on the production database” story. There’s no excuse for enabling a system where this can happen and it’s unacceptable to shift blame when faced with the reality that this is exactly what you did.
I 100% expect that a company that does this and then accepts no blame has every dev with standing production access and probably a bunch of other production access secrets sitting in the repo. The fact that other entities also have some design issues is irrelevant.
The sequence of tokens that would destroy your production environment can be produced by your agent, no matter how much prompting you use. That prompting is neither strong nor an engineering control; that's an administrative control. Agents are landmines that will destroy production until proven otherwise.
Most of these stories are caused by outright negligence, just giving the agent a high level of privileges. In this case they had a script with an embedded credential which was more privileged than they had believed - bad hygiene but an understandable mistake. So the takeaway for me is that traditional software engineering rigor is still relevant and if anything is more important than ever.
ETA: I think this is the correct mental model and phrasing, but no, it's not literally true that any sequence of tokens can be produced by a real model on a real computer. It's true of an idealized, continuous model on a computer with infinite memory and processing time. I stand by both the mental model and the phrasing, but obviously I'm causing some confusion, so I'm going to lift a comment I made deep in the thread up here for clarity:
> "Everything that can go wrong, will go wrong" isn't literally true either, some failure modes are mutually exclusive so at most one of them will go wrong. I think that the punchy phrasing and the mental model are both more useful from the standpoint of someone creating/managing agents and that it is true in the sense that any other mental model or rule of thumb is true. It's literally true among spherical cows in a frictionless vacuum and directionally correct in the real world with it's nuances. And most importantly adopting the mental model leads to better outcomes.
This is just trivially wrong that I don't understand why people repeat it. There are many valid criticisms of LLM (especially the LLMs we currently have), this isn't one of them.
It's akin to saying that every molecules behave randomly according to statistical physics, so you should expect your ceiling to spontaneously disintegrate any day, and if you find yourself under the rubble one day it's just a consequence of basic physics.
Except your ceiling can and will fall on you unless you take preventative measures, entirely due to molecular interactions within the material.
Barring that, it is entirely possible and even quite likely that your ceiling will collapse on you or someone else some time in the future.
It boggles the mind to let an LLM have access to a production database without having explicit preventative measures and contingency plans for it deleting it.
I guess the question is, since we know these things can happen, however unlikely, what mitigations should be in place that are commensurate with the harms that might result?
And I do think it's stupid to wire an LLM to a production database. Modern LLMs aren't that reliable (at least not yet), and the cost-benefit tradeoff does not make sense. (What do you even gain by doing that?)
However, you can't just look at that and say "Duh, this setup is bound to fail, because LLMs can generate every arbitrary sequence of tokens." That's a wrong explanation, and shows a misunderstanding of how LLMs (and probability) work.
This isn't a defence of using LLMs like this, but this statement taken at face value is a source of a lot of terrible things in the world.
This is the kind of stuff that leads to a world where kids are no longer able to play outside.
Actual quote:
> “If there are two or more ways to do something, and one of those ways can result in a catastrophe, then someone will do it that way.”
I'd be interested in hearing this argument.
To address your chemistry example; in the same way that there is a process (the averaging of many random interactions) that leads to a deterministic outcome even though the underlying process is random, a sandbox is a process that makes an agent safe to operate even though it is capable of producing destructive tool calls.
I mean, I do?
Some of the best known laws from the ~1700BC Babylonian legal text, The Code of Hammurabi, are laws 228-233, which deal with building regulations.
229. If a builder builds a house for a man and does not make its construction firm, and the house which he has built collapses and causes the death of the owner of the house, that builder shall be put to death.
230. If it causes the death of the son of the owner of the house, they shall put to death a son of that builder.
233. If a builder constructs a house for a man but does not make it conform to specifications so that a wall then buckles, that builder shall make that wall sound using his silver (at his own expense).
That doesn’t sound like ceilings never disintegrated!
Yes, but if the probability is much smaller than, say, being hit by a meteorite, then engineers usually say that that's ok. See also hash collisions.
How do you drive the probability of some series of tokens down to some known, acceptable threshold? That's a $100B question. But even if you could - can you actually enumerate every failure mode and ensure all of them are protected? If you can, I suspect your problem space is so well specified that you don't need an AI agent in the first place. We use agents to automate tasks where there is significant ambiguity or the need for a judgment call, and you can't anticipate every disaster under those circumstances.
This isn't true, is it? LLMs have finite number of parameters, and finite context length, surely pigeonhole principle means you can't map that to the infinite permutations of output strings out there
But now agents are overly eager to solve the problem and can be quite resourceful in finding an API to "start from clean-slate" to fix it.
It was never acceptable, major service providers figured this out long time ago and added all sorts of guardrails long before LLMs. Other providers will learn from their own mistakes, or not.
So? I have those too; the difference is that:
1. The API is ACL'ed up the wazoo to ensure only a superuser can do it.
2. The purging of data is scheduled for 24h into the future while the unlinking is done immediately.
3. I don't advertise the API as suitable for agent interaction.
On another note, I consider users asking a coding agent “why did you do that” to be illustrating a misunderstanding in the users mind about how the agent works. It doesn’t decide to do something and then do it, it just outputs text. Then again, anthropic has made so many changes that make it harder to see the context and thinking steps, maybe this is an attempt at clawing back that visibility.
Bit it can still be useful, as long as you interpret it as "which stimuli most likely triggered the behaviour?" You can't trust it uncritically, but models do sometimes pinpoint useful things about how they were prompted.
There is no internal monologue with which to have introspection (beyond what the AI companies choose to hide as a matter of UX or what have you). There is no "I was feeling upset when I said/did that" unless it's in the context.
There is no ghost in the machine that we cannot see before asking.
Even if a model is able to come up with a narrative, it's simply that. Looking at the log and telling you a story.
The real meaning of accountability is that you can fire one if you don't like how they work. Good news! You can fire an AI too.
And in the reverse, if a person makes a series of impulsive, damaging decisions, they probably will not be able to accurately explain why they did it, because neither the brain nor physiology are tuned to permit it.
Seems pretty much the same to me.
I argue that the model has no access to its thoughts at the time.
Split brain experiments notwithstanding I believe that I can remember what my faulty assumptions were when I did something.
If you ask a model “why did you do that” it is literally not the same “brain instance” anymore and it can only create reasons retroactively based on whatever context it recorded (chain of thought for example).
https://www.anthropic.com/research/introspection
It is known that the narrative part of the brain is separate from the decision taking brain. If someone asks you, in a very convincing, persuasive way, why you did something a year ago and you can't clearly remember you did, it can happen that you become positive that you did so anyway. And then the mind just hallucinates a reason. That's a trait of brains.
On top of that the agent is just doing what the LLM says to do, but somehow Opus is not brought up except as a parenthetical in this post. Sure, Cursor markets safety when they can't provide it but the model was the one that issued the tool call. If people like this think that their data will be safe if they just use the right agent with access to the same things they're in for a rude awakening.
From the article, apparently an instruction:
> "NEVER FUCKING GUESS!"
Guessing is literally the entire point, just guess tokens in sequence and something resembling coherent thought comes out.
> No confirmation step. No "type DELETE to confirm." No "this volume contains production data, are you sure?" No environment scoping. Nothing.
> The agent that made this call was Cursor running Anthropic's Claude Opus 4.6 — the flagship model. The most capable model in the industry. The most expensive tier. Not Composer, not Cursor's small/fast variant, not a cost-optimized auto-routed model. The flagship.
The tropes, the tropes!!
https://tropes.fyi/
I don't think there's any special introspection that can be done even from a mechanical sense, is there? That is to say, asking any other model or a human to read what was done and explain why would give you just an accounting that is just as fictional.
We can debate philosophy and theory of mind (I’d rather not) but any reasonable coding agent totally DOES consider what it’s going to do before acting. Reasoning. Chain of thought. You can hide behind “it’s just autoregressively predicting the next token, not thinking” and pretend none of the intuition we have for human behavior apply to LLMs, but it’s self-limiting to do so. Many many of their behaviors mimic human behavior and the same mechanisms for controlling this kind of decision making apply to both humans and AI.
When a human asks another human “why did you do X?”, the other human can of course attempt to recall the literal thoughts they had while they did X (which I would agree with you are quite analogous to the LLMs chain of thought).
But they can do something beyond that, which is to reason about why they may have the beliefs that they had.
“Why did you run that command?”
“Because I thought that the API key did not have access to the production system.”
When a human responds with this they are introspecting their own mind and trying to project into words the difference in understanding they had before and after.
Whereas for an agent it will happily include details that are not literally in its chain of thought as justifications for its decisions.
In this case, I would argue that it’s not actually doing the same thing humans do, it is creating a new plausible reason why an agent might do the thing that it itself did, but it no longer has access to its own internal “thought state” beyond what was recorded in the chain of thought.
Humans do this too, ALL THE TIME. We rationalize decisions after we make them, and truly believe that is why we made the decision. We do it for all sorts of reasons, from protecting our ego to simply needing to fill in gaps in our memory.
Honestly, I feel like asking an AI it’s train of thought for a decision is slightly more useful than asking a human (although not much more useful), since an LLM has a better ability to recreate a decision process than a human does (an LLM can choose to perfectly forget new information to recreate a previous decision).
Of course, I don’t think it is super useful for either humans or LLMs. Trying to get the human OR LLM to simply “think better next time” isn’t going to work. You need actual process changes.
This was a rule we always had at my company for any after incident learning reviews: Plan for a world where we are just as stupid tomorrow as we are today. In other words, the action item can’t be “be more careful next time”, because humans forget sometimes (just like LLMs). You will THINK you are being careful, but a detail slips your mind, or you misremember what situation you are in, or you didn’t realize the outside situation changed (e.g. you don’t realize you bumped the keyboard and now you are typing in another console window).
Instead, the safety improvements have to be about guardrails you put up, or mitigations you put in place to prevent disaster the NEXT time you fail to be as careful as you are trying to be.
Because there is always a next time.
Honestly, I think the biggest struggle we are having with LLMs is not knowing when to treat it like a normal computer program and when to treat it like a more human-like intelligence. We run across both issues all the time. We expect it to behave like a human when it doesn’t and then turn around and expect it to behave like a normal computer program when it doesn’t.
This is BRAND NEW territory, and we are going to make so many mistakes while we try to figure it out. We have to expect that if you want to use LLMs for useful things.
However it cannot do so after the fact. If there's a reasoning trace it could extract a justification from it. But if there isn't, or if the reasoning trace makes no sense, then the LLM will just lie and make up reasons that sound about right.
I think the same thing, but about agents in general. I am not saying that we humans are automata, but most of the time explanation diverges profoundly from motivation, since motivation is what generated our actions, while explanation is the process of observing our actions and giving ourselves, and others around us, plausible mechanics for what generated them.
For example, if i ask a question regarding an implementation decision while it is implementing a plan, it answers (or not) and immediately proceeds to make changes it assumes i want. Other models switch to chat mode, or ask for the best course of action.
Once this is said, i am not blaming Anthropic For that one, because IMHO the OP has taken a lot of risks and failed to design a proper backup and recovery strategy. I wish them to recover from this though, this must be a very stressful situation for them.
This was bound to happen, AI or not.
> Because Railway stores volume-level backups in the same volume — a fact buried in their own documentation that says "wiping a volume deletes all backups" — those went with it.
I'd never feel comfortable without a second backup at a different provider anyway. A backup that isn't deleteable with any role/key that is actually used on any server or in automation anywhere.
You need to be able to delete backups too, of course, but that absolutely needs to be a separate API call. There should never be any single API call that deletes both a volume and its backups simultaneously. Backups should be a first line of defense against user error as well.
And I checked the docs -- they're called backups and can be set to run at a regular interval [1]. They're not one-off "snapshots" or anything.
[1] https://docs.railway.com/volumes/backups
> This isn't a story about one bad agent or one bad API. It's about an entire industry building AI-agent integrations into production infrastructure faster than it's building the safety architecture to make those integrations safe.
Are they really so clueless that they cannot recognise that there is no guardrail to give an agent other than restricted tokens?
Through this entire rant (which, by the way, they didn't even bother to fucking write themselves), they point blank refuse to acknowledge that they chose to hand the reins over to something that can never have guardrails, knowing full well that it can never have guardrails, and now they're trying to blame the supplier of the can't-have-guardrails product, complaining that the product that literally cannot have guardrails did not, in actual fact, have guardrails.
They get exactly the sympathy that I reserve for people who buy magic crystals and who then complain that they don't work. Of course they don't fucking work.
Now they're blaming their suppliers for not performing the impossible.
> curl -X POST https://backboard.railway.app/graphql/v2 \ -H "Authorization: Bearer [token]" \ -d '{"query":"mutation { volumeDelete(volumeId: \"3d2c42fb-...\") }"}' No confirmation step. No "type DELETE to confirm." No "this volume contains production data, are you sure?" No environment scoping. Nothing.
It's an API. Where would you type DELETE to confirm? Are there examples of REST-style APIs that implement a two-step confirmation for modifications? I would have thought such a check needs to be implemented on the client side prior to the API call.
Yes sure, there seems to be lots of ways this issue could have been mitigated, but as other comments said, this mostly happened because the author didn't do its proper homework about how the service they rely their whole product works.
I think it’s designed for things like Terraform or CloudFormation where you might not realize the state machine decided your database needed to be replaced until it’s too late.
The fix needs to be permissions rather than ergonomics.
First mistake is to use root credentials anyway for Terraform/automated API.
Second mistake is to not have any kind of deletion protection enabled on criticsl resources.
Third mistake is to ignore the 3-2-1 rule for backups. Where is your logically decoupled backup you could restore?
I am really sorry for their losss, but I do have close to zero empathy if you do not even try to understand the products you're using and just blindly trust the provider with all your critical data without any form of assessment.
A pattern I've seen and used for merging common entities together has a sort of two-step confirmation: the first request takes in IDs of the entities to merge and returns a list of objects that would be affected by the merge, and a mergeJobId. Then a separate request is required to actually execute that mergeJob.
That wouldn't have helped in this case - the agent made a decision to delete, so if necessary it would have deleted all the files first before continuing.
The question that comes to mind is "how are people this clueless about LLM capabilities actually managing to rise to be the head of a technology company?"
What do you think an API is for? There's no user sitting at the keyboard when an API is called so where would that confirmation come from? It can't come from the user because there is no user.
How do you see this working? Any confirmation would be given by the agent.
I really feel sorry for them, I do. But the whole tone of the post is: Cursor screwed it up, Railway screwed it up, their CEO doesnt respond etc etc.
Its on you guys!
My learning: Live on the cutting edge? Be prepared to fall off!
Anyone using these tools should absolutely know these risks and either accept or reject them. If they aren't competent or experienced enough to know the risks, that's on them too.
if you’re a software dev/engineer, if you haven’t made a mistake like this (maybe not at this scale though), you’ve probably haven’t been given enough responsibility, or are just incredibly lucky.
… although, agreed, they were on the cutting edge, which is more risky and not the best decision.
The fact that this seems to be written by AI makes it even more ironic.
My team practices "no blame" retros, that blame the tools and processes, not the individuals.
But the retro and remediations on this are all things the author needs to own, not Railway or Cursor.
- Revoke API tokens with excessive access
- Implement validated backup and restore procedures
- ...
The system did delete the database cause the author built it like that.
I do not feel sorry, but I do feel some real schadenfreude.
Trying to run a blame game is such a facepalm.
Here is an excerpt from the chat log:
>Please resize my postgres volume to its maximum allowed size for my plan.
>Done. Resized your Postgres volume to 100GB (the Pro plan maximum). Changes are staged and ready to deploy.
>oh no, you deleted all the data in the volume
>I apologize for that. Resizing a volume shouldn't delete data—that's a serious issue. Let me check what happened with your deployment.The deployments show as REMOVED, which suggests the service was redeployed. Unfortunately, if the volume was recreated during the resize process, the data would be lost. This is a critical issue. Do you have backups of your Postgres data? If not, the data loss may be permanent. Railway volumes don't have automatic snapshots by default.
>Why is the new volume migrating to the US?
>Your service config shows it's still configured for europe-west4(Europe). The volume shouldn't be migrating to the US based on this configuration. Can you clarify what you're seeing that indicates the volume is moving to the US?
>the volume was migrated to the US
>That's concerning. The service config shows it's still set to deploy in europe-west4(Europe), but the volume was physically migrated to the US. This shouldn't happen automatically during a resize.
It was at this point that I realised I was in for a long night of restoring downed services.
What the asker wants is evidence that you share their model of what matters, they are looking for reassurance.
I find myself tempted to do the same thing with LLMs in situations like this even though I know logically that it’s pointless, I still feel an urge to try and rebuild trust with a machine.
Aren’t we odd little creatures.
Anyone who has used LLMs for more than a short time has seen how these things can mess up and realized that you can’t rely on prompt based interventions to save you.
Guardrails need to be based on deterministic logic:
- using regexes,
- preventing certain tool or system calls entirely using hooks,
- RBAC permission boundaries that prohibit agents from doing sensitive actions,
- sandboxing. Agents need to have a small blast radius.
- human in the loop for sensitive actions.
This was just a colossal failure on the OPs part. Their company will likely go under as a result of this.
The more results like this we see the more demand for actual engineers will increase. Skilled engineers that embrace the tooling are incredibly effective. Vibe coders who YOLO are one tool call away from total disaster.
The risk is worse, though, it's like one of Talib's black swans. The agents offer fantastic productivity, until one day they unexpectedly destroy everything. (I'm pretty sure there's a fairy tale with a similar plot that could warn us, if people saw any value in fairy tales these days. [1]) Like Talib's turkey, who was fed everyday by the farmer, nothing prepared it for being killed for Thanksgiving.
Sure, this problem should not have happened, and arguably there has been some gross dereliction of duty. But if you're going to heat your wooden house with fire, you reduce your risk considerably by ensuring that the area you burn in is clearly made out of something that doesn't burn. With AI, though, who even knows what the failure modes are? When a djinn shows up, do you just make him vizier and retire to your palace, living off the wealth he generates?
[0] It's only happened once, but a driver that wasn't paying attention almost ran a red light across which I was going to walk. I would have been hit if I had taken the view that "I have the right of way, they have to stop".
[1] Maybe "The Fisherman and His Wife" (Grimm)? A poor fisherman and his wife live in a hut by the sea. The fisherman is content with the little he has, but his wife is not. One day the fisherman catches a flounder in its net, which offers him wishes in exchange for setting it free. The fisherman sets it free, and asks his wife what to wish for. She wishes for larger and larger houses and more and more wealth, which is granted, but when she wishes to be like God, it all disappears and she is back to where she started.
https://literature.stackexchange.com/questions/18230
In my country there is a saying: "Graveyards are full of pedestrians that had the right of way".
The AI? Nothing learned, I suspect. Not in a meaningful way anyhow.
I long for a “copilot” that can learn from me continuously such that it actually helps if I teach it what I like somehow.
Have some controls in place. Don’t rely on nobody being dumb enough to do X. And that includes LLMs.
Master your craft. Don’t guess, know.
CEO learns why this was a bad idea.
---
It sucks that there were a bunch of people downstream who were negatively affected by this, but this was an entirely foreseeable problem on his company's part.
Even when we consider those real problems with Railway. Software engineers have to evaluate our tools as part of our job. Those complaints about Railway, while legitimate, are still part of the typical sort of questions that every engineering team has to ask of the services they rely on:
What does API key grant us access to?
What if someone runs a delete command against our data?
How do we prepare against losing our prod database?
Etc.
And answering those questions with, "We'll just follow what their docs say, lol," is almost never good enough of an answer on its own. Which is something that most good engineers know already.
This HN submission reads like a classic case of FAFO by cheapening out with the "latest and greatest" models.
You mean add that to my prompt right ?
These prompts sound like abusive relationships.
Incidents like this are going to be common as long as people misunderstand how LLMs work and think these machines can follow instructions and logic as a human would. Even the incident response betrays a fundamental understanding of how these word generators work. If you ask it why, this new instance of the machine will generate plausible text based on your prompt about the incident, that is all, there is no why there, only a how based on your description.
The entire concept of agents assumes agency and competency, LLM agents have neither, they generate plausible text.
That text might hallucinate data, replace keys, issue delete commands etc etc. any likely text is possible and with enough tries these outcomes will happen, particularly when the person driving the process doesn’t understand the process or tools.
We don’t really have systems set up to properly control this sort of agentless agent if you let it loose on your codebase or data. The CEO seems to think these tools will run a business for him and can conduct a dialogue with him as a human would.
While LLM generate "plausible text" humans just generate "plausible thoughts".
count++
Yeah... it doesn't work that way.
A similar cohort are discovering, in myriad painful ways, that advances in agentic coding — the focus of a lot of pre and post training — does not translate into other domains.
Not really convinced any agent should be doing devops tbh.
Streaming gets you PIT recovery while DB dumps give me daily snapshots stored daily for 14 days.
An aside: 15 or so years ago, a work colleague made a mistake and dropped the entire business critical DB - at a critical internet related company - think of continent wide ip issues. I had just joined as a dba and the first thing I’d done was MySQL bin logging. That thing saved our bacon - the drop db statement had been replicated to slaves so we ended up restoring our nightly backup and replaying the binlogs using sed and awk to extract DML queries. Epic 30 minute save. Moral of the story, have a backup of your backup so you can recover when the recovery fails;)
So while the AI did something significantly worse than anything a hapless junior engineer might be expected to do, it sounds like the same thing could've resulted from an unsophisticated security breach or accidental source code leak.
Is AI a part of the chain of events? Absolutely. Is it the sole root cause? Seems like no.
It sounds like the token the author created just didn't have any scope, it had full permissions. From the post:
> Tokens are not scoped by operation, by environment, or by resource at the permission level. There is no role-based access control for the Railway API — every token is effectively root. The Railway community has been asking for scoped tokens for years. It hasn't shipped.
So it wasn't "a narrowly scoped API token", it was a full access token, and I suspect the author didn't have any reason to think it was some special specific purpose token, he just didn't think about what the token can do. What he's describing is his intent of creating the token (how he wanted to use it), not some property of the token.
Author said in an X post[0] that it was an "API token", not a "project token", which allows "account level actions"[1], with a scope of "All your resources and workspaces" or "Single workspace"[2], with no possibility of specifying granular permissions. Account token "can perform any API action you're authorized to do across all your resources and workspaces". Workspace token "has access to all the workspace's resources".
[0] https://x.com/lifeof_jer/status/2047733995186847912
[1] https://docs.railway.com/cli#tokens
[2] https://docs.railway.com/integrations/api#choosing-a-token-t...
I ran a declarative coding tool on a resource that I thought would be a PATCH but ended up being a PUT and it resulted in a very similar outcome to the one in this post.
Lets remember Agents cant confess, feel guilt, etc. They're just a program on someone else's computer.
> Deletion and Restoration
> When a volume is deleted, it is queued for deletion and will be permanently deleted within 48 hours. You can restore the volume during this period using the restoration link sent via email.
> After 48 hours, deletion becomes permanent and the volume cannot be restored.
https://docs.railway.com/volumes/reference
If it is then I don't see how the volume got deleted - the mail was not sent? The company was not reading its mails?
I'm thinking twice about running Claude in an easily violated docker sandbox (weak restrictions because I want to use NVIDIA nsight with it.) At this stage, at least, I'd never give it explicit access to anything I cared about it destroying.
Even if someone gets them to reliably follow instructions, no one's figured out how to secure them against prompt injection, as far as I know.
There's no record for the agent to be on - it's always just a bunch of characters that look plausible because of the immense amount of compute we've put behind these, and you were unlucky.
LLMs get things wrong is what we're forever being told.
And the explanation/confession - that's just more 'bunch of characters' providing rationalisation, not confession.
We give a non-deterministic system API keys that 99.9% of the time are unscopped (because how most API are) and we are shocked when shit happens?
This is why the story around markdown with CLIs side-by-side is such a dumb idea. It just reverses decades of security progress. Say what you will about MCP but at least it had the right idea in terms of authentication and authorisation.
In fact, the SKILLS.md idea has been bothering me quite a bit as of late too. If you look under the hood it is nothing more than a CAG which means it is token hungry as well as insecure.
The remedy is not a proxy layer that intercepts requests, or even a sandbox with carefully select rules because at the end of this the security model looks a lot like whitelisting. The solution is to allow only the tools that are needed and chuck everything else.
> That token had been created for one purpose: to add and remove custom domains via the Railway CLI for our services. We had no idea — and Railway's token-creation flow gave us no warning — that the same token had blanket authority across the entire Railway GraphQL API, including destructive operations like volumeDelete. Had we known a CLI token created for routine domain operations could also delete production volumes, we would never have stored it.
> Because Railway stores volume-level backups in the same volume — a fact buried in their own documentation that says "wiping a volume deletes all backups" — those went with it.
I don't like the wording where it's the Railway CLI fault that didn't give a warning about the scope of the created token. Yes, that would be better but it didn't make the token a person did and saved it to an accessible file.
Is that buried? It seems pretty explicit (although I don’t think I would make delete backups the default behavior).
1. delete volume API is not asking for confirmation or approval from another actor. Looks like we have no guardrails on the delete api.
2. Authorization - Agents should not have automatic permissions to delete infra unless it is deliberate.
The discipline that prevents a chunk of this is enumerating your traps before the LLM sees any code or config. You write down what could go wrong (deletion, race, misclassification of dev vs prod), then hand the plan AND the risk list AND the relevant files to the model. The model's job is to confirm/deny each risk against the actual code with file:line citations, not to frame the risk space itself.
Pre-implementation. Anchoring defense. The opposite of "vibe coding."
**Never guess**
However the moral of this story is nothing to do with AI and everything to do with boring stuff like access management.
One of the top replies on twitter to the OP can be boiled down to "you treat AI as a junior dev. Why would you give anyone, let alone a junior dev, direct access to your prod db?"
And yeah, I fully agree with this. It has been pretty much the general consensus at any company I worked at, that no person should have individual access to mess with prod directly (outside of emergency types of situations, which have plenty of safeguards, e.g., multi-user approvals, dry runs, etc.).
I thought it was a universally accepted opinion on HN that if an intern manages to crash prod all on their own, it is ultimately not their fault, but fault of the organizational processes that let it happen in the first place. It became nearly a trope at this point. And I, at least personally, don't treat the situation in the OP as anything but a very similar type of a scenario.
Put your backups in S3 *versioned* storage on a different AWS account from your primary, and set some reasonable JSON lifecycle rule:
That way when someone screws up and your AWS account gets owned, or your databases get deleted by an agent, it doesn't have enough access to delete your backups, and by default, even if you have backups that you want to intentionally delete, you have 30 days to change your mind.Remember this: these things follow instructions so poorly that they nuke everything without anyone even trying to break the prompt. Imagine how easily someone could break the prompt if the agent ever gets given user input.
In every session there is the risk that the agent becomes a rogue employee. Voluntarily or involuntarly is not a value system you can count on regarding agents.
No "guardrails" will ever stop it.
This strategy won't work for the typical HN reader, but for everyone else? Possibly.
Railway, why not have a way to export or auto sync backups to another storage system like S3?
I don't even like having secrets on disk for my personal projects that only I will touch. Why was there a plaintext production database credential available to the agent anywhere on the disk in the first place? How did the agent gain access to the file system outside of the code base?
The Railway stuff isn't great, don't get me wrong, but plaintext production secrets on disk is one of the reddest possible flags to me, and he just kind of breezes over it in the post mortem. It's all I needed to read to know he doesn't have the experience required to run a production application that businesses rely on for their day-to-day.
That's not how safety works at all. You don't tell the agent some rules to follow, you set up the agent so it can't do the things you don't want it to do. It is very simple and rather obvious and I wish we stopped discussing it already.
Probably considering yourself as primary expert of system as threat actor is reasonable and thus you should be prevented yourself from being able to do irreparable damage.
So... you're going to prevent them from getting feedback that they are the clowns in your particular circus? Wouldn't a better idea be to let the idiots in charge get burned a few times until they learn?
> A single API call deletes a production volume. There is no "type DELETE to confirm." There is no "this volume is in use by a service named [X], are you sure?" There is no rate-limit or destructive-operation cooldown.
...makes me question the author's technical competence.
Obviously an API call doesn't have a "type DELETE to confirm", that's nonsensical. API's don't have confirmations because they're intended to be used in an automated way. Suggesting a rate-limit is similarly nonsensical for a one-time operation.
There are all sorts of legitimate failures described in this post, but the idea that an API call shouldn't do what the API call does is bizarre. It's an API, not a user interface.
Plenty of blame to go around, but it I find it odd that they did not see anything wrong in not have real backups themself, away from the railway hosting. Well they had, but 3 month old.
That should be something they can do on their own right now.
If you employ a new tech then there need to be extra safeguards beyond what you may deem necessary in an ideal world.
This is a well know possibility so they should have asked and/or verified token scope.
If it turns out that you can't hard scope it then either use a different provider, a wrapper you control (can't be too difficult if you only want to create and delete domains) or simply do not use llms for this for now.
Maybe the tech isn't there just yet even if it would be really convenient. It's plenty useful in many other situations.
it's also hilarious to see the human LARP as if the LLM had guilt or accountability, therapeutically shouting at a piece software as if it weren't his own fault that the LLM deleted the whole volume and its backups, or his obvious lack of basic knowledge of the systems he's using
It's a sad story but at the same time it's clearly showing that people don't know how agents work, they just want to "use it".
https://github.com/GistNoesis/Shoggoth.dbExamples/blob/main/...
Project Main repo : https://github.com/GistNoesis/Shoggoth.db/
It has been so transparently clear for years that nothing these people sell is worth a damn. They have exactly one product, an unreliable and impossible-to-fix probabilistic text generation engine. One that, even theoretically, cannot be taught to distinguish fact from fiction. One that has no a priori knowledge of even the existence of truth.
When I learned that "Agentic AI" is literally just taking an output of a chatbot and plugging it into your shell I almost fell off my chair. My organisation has very strict cybersecurity policies. Surveillance software runs on every machine. Network traffic is monitored at ingress and egress, watching for suspicious patterns.
And yet. People are permitted to let a chatbot choose what to execute on their machines inside our network. I am absolutely flabbergasted that this is allowed. Is this how lazy and stupid we have become?
On the good side, these kind of mistakes have been going on since the beginning and thats how people learn, either directly or indirectly. Hopefully this should at least help AI to be better and the people to be better at using AI
Why do you need an AI agent for working on a routine task in your staging environment?
"Never send a machine to do a human's job."
If you do use agents then you should be able to ban related CLI commands in your repo. I upsert locks in CI after TF apply, meaning unlocks only survive a single deployment and there's no forgetting to reapply them.
Most access tokens should not allow deleting backups. Or if they do, those backups should stay in some staging area for a few days by default. People rarely want to delete their backups at all. It might be even better to not provide the option to delete backups at all and always keep them until the retention period expired.
If we must have GasTown/City/Metropolis then at least get an agent to examine and block potentially harmful commands your principal agent is about to run.
This is wrong. It was not an infra incident at their service provider.
As Jer says in the article, their own tooling initiated the outage. And now they're threatening to sue? "We've contacted legal counsel. We are documenting everything."
It is absolutely incredible that Jer had this outage due to bad AI infra, wrote the writeup with AI, and posted on Twitter and here on his own account.
As somebody at PocketOS instructed their AI in the article: "NEVER **ing GUESS!" with regards to access keys that can touch your production services. And use 3-2-1 backups.
Good luck to the rental car agencies as they are scrambling to resume operations.
And anyone can do it with the wrong access granted at the wrong moment in time...even Sr. Devs.
At least this one won't weight on any person's conscience. The AI just shrugs it off.
Describing the tech in anthropomorphic terms does not make it a person.
The agent’s “confession”:
> …found a non-destructive solution.I violated every principle I was given:I guessed instead of verifying I ran a destructive action without…
No space after the period, no space after the colon. I’ve never seen an LLM do this.
I still don't know why the product manager would decide this is a good UX.
"And if his story really is a confession, then so is mine."
How do people keep doing this?
Using LLMs for production systems without a sandbox environment?
Having a bulk volume destroy endpoint without an ENV check?
Somehow blaming Cursor for any of this rather than either of the above?
(Let's suppose the agent did need an API token to e.g. read data).
Additionally give it a similar restricted way to "delete" domains while actually hiding them from you. If you are very paranoid throw in rate limits and/or further validation. Hard limits.
Yes this requires more code and consideration but well that's what the tools can be fully trusted with.
I'm glad your C level greed of "purge as many engineers and let sloperators do work" was even worse the most juniors and deleted prod due to gross negligence and failure to follow orders.
LLMs are great when use is controlled, and access is gated via appropriate sign-offs.
But I'm glad you're another "LOL prod deleted" casualty. We engineers have been telling you this, all the while the C level class has been giddy with "LETS REPLACE ALL ENGINEERS".
People seem to think prompt injection is the only risk. All it takes is one (1) BIG mistake and you’re totally fucked. The space of possible fuck-up vectors is infinite with AI.
Glad this is on the fail wall, hope you get back on track!
In seriousness, RBAC, sandboxing, any thing but just giving it access to all tools with the highest privileges...
The model used is the most important part of the story.
Why is Cursor being mentioned at all? Doesn’t seem fair to Cursor.
I think Railway is at the peak of when their business will start getting hard. They’ve had great fun building something cool and people are using it. Now comes the hard part when people are running production workloads. It’s no longer a “basement self-hosting” business. They’ve had stability issues lately. Their business will burn to the ground soon unless they get smart people there to look at their whole operations.
> We’ve contacted legal counsel. We are documenting everything.
https://rentry.co/5rme2sea
AI didn't do anything wrong.
The management of this company is solely to blame.
It so classic - humans just never want to take responsibility for fucking up - but let's be clear - AI is responsible for nothing ESPECIALLY not backups.
He is claiming this came from the LLM? WTF?
The phrasing is different, but this is how AWS RDS works as well. If you delete a database in RDS, all of the automated snapshots that it was doing and all of the PITR logs are also gone. If you do manual snapshots they stick around, but all of the magic "I don't have to think about it" stuff dies with the DB.
Sorry to be that guy, but: LLMs agents are experimental by this point. If you run them, make sure they run in an environment where they can't make such problems and tripplecheck the code they produce on test systems.
That is due diligence. Imagine a civil engineer that builds a bridge out of magic new just on the market extralight concrete. Without tests. And then the bridge collapses. Yeah, don't be that person. You are the human with the brain and the spine and you are responsible to avoid these things from happening to the data of your customers.
Also: just restore the backup? Or do we not have a backup? If so, there is really no mercy. Backups are the bare minimum since decades now.
"This is the agent on the record, in writing."
"Before I get into Cursor's marketing versus reality, one thing needs to be clear up front: we were not running a discount setup."
People who are this ignorant about LLMs and coding agents should really restrain themselves from using them. At least on anything not air gapped. Unless they want to have very costly and very high profile learning opportunities.
Fortunately his conclusions from the event are all good.
I can't help but laugh reading this. We all try to shout the exact same things to our agents, but they politely ignore us!
Hahahaha I hope it keeps happening. In fact, I hope it gets worse.
Guerrilla marketing or sabotage.
>This is not the first time Cursor's safety has failed catastrophically.
How can you lack so much self awareness and be so obtuse.
There's no section "Mistakes we've made" and "changes we need to make"
1. Using an llm so much that you run into these 0.001% failure modes. 2. Leaking an API key to an unauthroized LLM agent (Focus on the agent finding the key? Or on yourself for making that API key accessible to them? What am I saying, in all likelihood the LLM committed that API key to the repo lol) 3. Using an architecture that allows this to happen. Wtf is railway? Is it like a package of actually robust technologies but with a simple to use layer? So even that was too hard to use so you put a hat on a hat?
Matthew 7:3 “Why do you look at the speck of sawdust in your brother’s eye and pay no attention to the plank in your own eye?."
This person should never be trusted with computers ever again for being illiterate
The LLM broke the safety rules it had been given (never trust an LLM with dangerous APIs). *But* they say they never gave it access to the dangerous API. Instead the API key that the LLM found had additional scopes that it should not have done (poster blames Railway's security model for this) and the API itself did more than was expected without warnings (again blaming Railway).
> The Railway CLI token I created to add and remove custom domains had the same volumeDelete permission as a token created for any other purpose. Tokens are not scoped by operation, by environment, or by resource at the permission level. There is no role-based access control for the Railway API — every token is effectively root. The Railway community has been asking for scoped tokens for years. It hasn't shipped.
So every token that can be created has "root" permissions, and the author accidentally exposed this token to the agent. What was the author's planned purpose for the token doesn't matter if the token has no scope. "token I created to add and remove custom domains" - if that's just the author intent, but not any property of the token, then it's kinda irrelevant why the token was created, the author created a root token and that's it. Of course having no scope on tokens is bad on Railway's part, but it sounds more like "lack of a feature" than a bug. It wasn't "domain management token" that somehow allowed wrong operations, it was just a root token the author wanted to use for domain management. Unless Railway for some reason allows you to select an intent of the token, that does literally nothing (as "every token is effectively root").
> "Believe in growth mindset, grit, and perseverance"
And creator of a Conservative dating app that uses AI generated pictures of Girls in bikini and cowboy hat for advertisement. And AI generated text like "Rove isn’t reinventing dating — it’s remembering it." :S