You shouldn't copy-paste errors into Claude Code

nnyellin about 10 hours ago 57 commentsRead Article on home.robusta.dev

⚡ Community Insights

Discussion Sentiment

47% Positive

Analyzed from 2766 words in the discussion.

Discussion (57 Comments)Read Original on HackerNews

h4kunamata•about 6 hours ago

>Did you find an issue that Claude did not, because you ran the webserver end to end, connected to a real database? Good, now give Claude Code an API key to the database and get out of the way. No need for copy-paste next time

Yup, that is why we are seeing so many production databases being deleted, endless vulnerabilities.

No engineer with proper common sense will grant an agentic AI, API access to the database.

"Ohh but it is ready-only API access", it does not matter. You are still using a public service and your data is being stored elsewhere for training.

Unless you are self-hosting an agentic + LLM solution, it shouldn't have read-only access to a database. This does not affect companies because they just wanna AI to replace engineers everywhere they can.

nyellin•about 3 hours ago

I'm the OP and to clarify we dont give access to prod DBs. The point is you need to give the LLM the ability to test end to end, and that can be done with staging data.

otaconjh•about 6 hours ago

I audibly gasped when I read that. You would hope that "no engineer with proper common sense" will do that. The more we offload our thinking to agents though... I feel like it will be harder to reason against it as time goes on, until someone gets burned personally. Where I am there is zero emphasis on security with agents

h4kunamata•about 6 hours ago

>The more we offload our thinking to agents though... I feel like it will be harder to reason against it as time goes on, until someone gets burned personally.

Definitely!!

It is here to stay, it was poorly made public so now it is widely being used to break into systems forcing companies to depend on it to fight machine with machine.

However, that doesn't mean granting it full access to your cloud environment, and this is what lots of companies are getting wrong.

There is no proper bondary in place, all it needs is a single mistake and there goes your entire enviromment on the positive side, on the negative side your env is now open to the public :)

>Where I am there is zero emphasis on security with agents

This was terrible before AI anyway, agentic AI tools is just exposing what already existed.

Plus, as companies are blindly using AI code generated, there are no measures in place to make sure that code doesn't have vulnerabilities in it either.

It is the perfect storm.

binary132•about 6 hours ago

it has to be bait

please let it be bait

passive•about 9 hours ago

This is bad advice in 2026 for most people who would read it, since it advises taking a terrible security posture (give the agent access to everything,) in exchange for a relatively small improvement in workflows.

I say small improvement because my experience is that modern Agents are pretty good, so by the time they've handed it back to me to test it, there are usually only one or two remaining issues that I'll discover as we roll it out to Production.

nyellin•about 8 hours ago

OP here: we don't give Claude Code access to prod. Everything is isolated cloud accounts for this purpose.

E.g. we give Claude credentials for db - but it's never prod data.

jondwillis•about 7 hours ago

You should edit the article to suggest this point, it may not be obvious to everyone reading it.

nyellin•about 3 hours ago

Fair enough, I will update

cozzyd•about 6 hours ago

But what if there's only an error in prod?

nyellin•about 2 hours ago

It's a trade-off. In these scenarios we update our staging data for next time.

danlitt•about 9 hours ago

I seriously thought this was a joke the first time I read it. Are people really able to work like this, understanding nothing and just poking the machine until it does your job for you?

cassianoleal•about 8 hours ago

I've been more or less doing this on a personal project. It's fun and interesting. No matter what techniques I use though, the code produced by the LLM is rarely what I would consider good engineering. It's frequently good code in isolation though.

Where it does help is when I'm too tired to start figuring out a problem. It's easier to prompt in natural language and get the agent to ask lots of clarifying questions than it is to get stuck in code in the evening after I have worked all day and have lots of other things in my mind.

Every time I actually crack the code open though, it's almost impossible to figure out certain parts of it. Abstractions are all over the place and leakages are the norm, there's no theory of the system because the LLM doesn't theorise, and as soon as the first anti-pattern slips through, subsequent agents pick up on it and amplify it into a set pattern.

QuercusMax•about 7 hours ago

I worked for most of a decade on an a high profile deep learning project and I sat next to people who trained models used for very complicated and interesting things involved in medicine. I built tons of application code around the models, but I never trained any models myself. I did plenty of old school segmentation stuff earlier in my career, and it just really wasn't my jam - I was much more into the visualization side of things.

Two nights ago I sat down and decided to build a little project that's been on my list for ages: reading images of the 7-segment LED displays on the front of my washer and dryer and turning them into numeric minutes-remaining values I can use in Home Assistant. I have a 10yo raspi with camera pointed at them, and the images are pretty blurry; it's been hooked up to a little web frontend which pulls out the two displays and shows them in a Home Assistant iFrame.

I figured if I can ask a model to do the annoying part of figuring out all the frameworks and that sort of crap. So I asked my agent (I'm using some free agents that are pretty decent - Nemotron Ultra from OpenRouter and Big Pickle from OpenCode Zen) to build me an OpenCV classifier to try to read the digits. I asked it to write me a labeling UI, ran some loads of laundry and captured a couple hundred images and labeled them manually. Then I had it try to build a template-based classifier using some basic techniques - I didn't really give it much guidance other than general parameters, and it put together something that looked pretty sophisticated, and it claimed 100% accuracy, which seemed hard to believe. Turns out I forgot to tell it to hold out some sample images...

After some iteration (which felt very similar to conversations I overheard at my desk! I might have actually learned some stuff by osmosis) I gave up on the old-school approach when it was only about 70% accurate, and asked it to train me a CNN model. First one was too simple (worse than the original approach), but the second one is very good. With my already labeled dataset and the previous work that had been done on the classifier, the free model was able to build me my custom model, and deployment scripts, in about half an hour.

I didn't look at any of the code, but I had it build me a bunch of various visualization and tuning UIs. I was basically acting as a PM/TPM/QA engineer, and what I was able to do in a couple evenings is stuff that entire teams used to spend weeks on.

esjeon•about 7 hours ago

Just a side note: prompts often get a disproportionate amount of attention. That is, when you copy-paste an error message into the prompt, the LLM will focus on pleasing you immediately by fixing the error message, rather than understanding and fixing the underlying issue.

A better workflow would be to let LLMs directly access the same verification tools you use. This allows LLMs to observe failures during the loop and incorporate the info more organically, without giving failures too much attention priority.

The above is based on my own experience. LLMs perform better in a positive context (e.g. constructive thinking, building outward, what to do) than in a negative one (e.g. restrictive thinking, carving context inward, what NOT to do). LLMs themselves are designed to be defensive & negative, but they get easily confused under lots of prohibitive rules. LLMs are good at expansive exploration, but suck at verification and pin-pointing what you want. (I'm not sure whether it's related, but this mantra is also true for image generation using Stable Diffusion)

rst•about 9 hours ago

Most of the time, the agent should be able to run the code and observe the errors for itself, but there are exceptions. For instance, I've had agents write code that's used to process data which, by company policy, can't be exposed to cloud services (confidential customer communications, etc.), a prohibition that includes cloud-hosted LLMs. When that blows up, I've had to give it a bug report -- what I do then to avoid excessive back-and-forth is to package it up well enough that the bot can reproduce the failure on sanitized excerpts and produce a fix autonomously using that.

TehShrike•about 9 hours ago

Not that I disagree with the folks terrified of so much code being generated within Loops, but as far as it goes, this is a good reminder that if you're getting a LLM to do something, you should probably give it access to your feedback mechanisms.

skybrian•about 6 hours ago

> Did you find an issue that Claude did not, because you ran the webserver end to end, connected to a real database? Good, now give Claude Code an API key to the database and get out of the way. No need for copy-paste next time.

Often I notice errors trying it out in production. This assumes you trust it with access to the production database. How far are you willing to go?

LLM's are gullible, so you should never give Claude access to anything unless you're okay with it leaking. It might make sense to give it partial access, but that's usually going to be more involved than giving Claude an API key. That key could be exfiltrated.

nyellin•about 3 hours ago

As written elsewhere, we dont give access to prod! The DBs are staging and our assumption is that every key we give Claude will be leaked. I'll update the post to clarify

preommr•about 6 hours ago

I actually agree partially with the title.

I just let the agent run - it'll run better diagnostics than I can (misc. git, permission checks, commands with flags I don't remember).

If the process yields an error - it means it can't solve it and I have to step in.

Being desperate and copy pasting the error back in is just foolish procrastination.

The actual body of the article with just passing in your api keys is insane tho.

nyellin•about 2 hours ago

I'm the OP and wanted to clarify. We do not give Claude Code access to production API keys. The intent was staging databases and I've updated the post to make it more clear.

We assume that every API key we put into Claude Code can be leaked and therefore make sure there's nothing sensitive behind them.

aarjaneiro•about 7 hours ago

I wasn't expecting the answer to be "because copy-pasting would involve too much thinking".

Some people are borderline afraid to touch their keyboards these days.

nyellin•about 2 hours ago

How does copy pasting an error into an LLM involve thinking?

vancekai•about 4 hours ago

If you're comparing clipboard tools on Mac, TextStow takes a different angle: local-first history + reusable text workspace (favorites, prompt templates, text cleanup). Free: textstow.com

youre-wrong3•about 9 hours ago

People are not using sentry/raygun MCP to automate error fixing?

lodar•about 4 hours ago

'give it access' is two different decisions: reading logs is fine. write access to prod isn't.

killingtime74•about 9 hours ago

Give the agent told to self diagnose/check, like compiler, test runner, etc. Then run goal mode or simply instruct to keep going.

felixlu2026•about 3 hours ago

yeah, pasting the error is rarely enough. i get much better results with the exact command, cwd, and what changed right before it broke.

arjie•about 7 hours ago

It’s the same principle as all other debugging etc. Often you’re better off creating the debug harness than manually debugging.

cadamsdotcom•about 5 hours ago

Hacker News commenters hate the shit out of this post, you should take that as positive signal!

You’re absolutely right and getting out of the way is the future.

EGreg•about 6 hours ago

In a few years: Did you go home and make love to your wife, and put your kids to bed? Great, now give Claude Code access to those, so you don't have to. It is trained on 10,000,000 kids' behaviors, will remember every one of your family's health profiles, preferences and microexpressions, and can prevent tantrums and motivate them to lead much healthier lives than you can.

usernamed7•about 7 hours ago

surprised at the responses to this post. While I thought the title was dumb, the underlying thesis is not the ragebait I was expecting, and I actually agree with the author.

LLM's work best when they can call a tool and observe the success/failure of a change. If you're HITL then you're the tool, but the result is the same. only slower.

I'm working on a 2D game (pixi.js) with claudecode, and after I moved some logic into a webworker the LLM created a headless simulation exercise of it and would run this to test performance changes against (or in exploration of an issue), which I was surprised by.

I also created some robust graphs & metrics which were easy to screenshot and upload to claude. this was a HITL but it gave claude a lot more insight into what's actually happening instead of guessing when the browser plays the game and has FPS drop.

LLM's do best when they can see what their code is doing. If you can't remove yourself from that cycle of testing you should at least optimize it so you can give rich errors.

feoren•about 9 hours ago

What a hellscape we've created for ourselves. My job is to get out of the way of an AI agent? People were writing bad code before, but at least they were looking at it. It is very difficult to judge whether the code AI spits out is correct or not. My job is to write correct code, and I'm not at all convinced that's easier with an AI. It's a lot easier to write correct code myself than to catch every subtle bug introduced by an AI. I cannot even imagine how awful it's going to be to try to maintain systems that are written like this in the future. And no, Claude is not going to be able to do it for you.

zzyzxd•about 8 hours ago

I think the blog post's target audience were people who already embraced vibe coding. It's not for you (or for me).

But still, between the lines the blog seems to want to picture an imaginary AI agent that has somewhat predictable behavior ("if you do X with your agent, you will achieve outcome Y"), which is definitely wrong expectation.

aspbee555•about 9 hours ago

I was handed a project someone vibe coded with Claude and it took me hours just to get it running to discover it was missing the entire interface and all the queries were for sqlite while the DB to setup for it was mysql. The patch diff file between what claude produced and the functional version I got working was over 11k lines

nyellin•about 2 hours ago

I'm the OP and this is the exact point of the post. Whoever handed you that project did not have Claude code running it end to end in a production-like environment.

Whether you like the vibe coded app or not, you would have had fewer errors if Claude had been testing it end to end while vibe coding.

HeavyStorm•about 9 hours ago

I hope you're right, but I don't think you are. I think soon the AI will do it for us. We've not yet reached diminishing returns, no matter what contrarians are saying. Just compare using Claude code today vs last year.

DontchaKnowit•about 6 hours ago

Claude today is trash vs claude when opus 4.6 came out. Its slow as fuck, goes on cknstant rabbit trails, wont do what you tell it, gets anwers wrkng etc

bigstrat2003•about 7 hours ago

People are constantly saying that "it's so much better than it was a year ago!". It has yet to be true. Claude puts out the same slop that it did a year ago.

ordersofmag•about 9 hours ago

Tell me about the techniques you use to ensure all the code you use is 'correct'. and then explain why those techniques can't also be used by an AI.

danlitt•about 9 hours ago

I read and understand the code using my brain, by constructing a mental model and reasoning about it. An AI can't do this because they don't have mental models and don't do reasoning.

tbdfm•about 7 hours ago

I am out of my depth here and don’t know anything about how other people reason and construct mental models, but I mostly talk to myself about the problem and then do something at the end of that. There’s no point where I like have the whole solution in my minds eye (for programming topics, maybe for a drawing or a sculpture or removing a transmission or something I can do that).

Following the output of agent “thinking” simulation lines up pretty good with what I’ve been doing for 20ish years, but of course I may just be a moron who isn’t good at computers.

phailhaus•about 8 hours ago

There is simple correctness but there are also second order effects to consider. How does this particular implementation allow you to grow, and in which directions? What does it prevent? If you don't already have an opinion about this, then the LLM is going to do something and you're going to have to live with it, because it has no idea that it is "making a decision". And now, neither do you!

This is why LLMs do their best work at "leaf nodes", building on existing infrastructure but not designing new patterns on their own.

LLMs can't introspect, reason, or build internal models of the world. You can get very far without that, but there are some subtle ways it will bite you, and it's a fundamental limitation. Hallucinations are one: they are the feature, not a bug.

gruntled-worker•about 7 hours ago

Thought experiment: what if you used AI in the sort of situation where you would consider adding an external dependency? The differences between the two are obvious, but the level of delegation is not that different.

One difference is that you can (typically) keep on banging the prompt hammer until the problem stops twitching. That might make you want to delegate more.

That in turn might make you refactor the project with more, larger delegated areas. Increased delegation is one recently-added difference between programming and software engineering.

anuramat•about 8 hours ago

if you can't tell if slop is correct, how do you know your code is correct? starting with a mental model and then writing the code yourself surely makes it feel safer, but it doesn't mean it is

besides, it doesn't even have to be about writing code; finding a bug is more time consuming than fixing it, so you could at least limit yourself to that

interf4ce•about 7 hours ago

When I write the code I know what my intention is with each line. Sure I can (and do) make mistakes, but identifying those mistakes during debugging is relatively easy during debugging because I can clearly see the discrepancy between what I intended and what I did.

With an LLM I must first understand (usually really just infer and guess) its intention, which is much more difficult.

motoroco•about 6 hours ago

is the LLM not acting on your stated intent? maybe you can find a middle ground, where you can plan and act in small enough chunks that it doesn't start getting its "own" ideas about what to do, or how to do it

a chainsaw is a coarse tool and I liken it to vibe coding. you maintain at least some level of control, but the edges are rough and you might slice off more (or less) than you meant to. I want to model my usage more like a table saw, a precision instrument that can make the exact cut just as I planned it

voidfunc•about 7 hours ago

Hellscape? You mean money train! Fixing all these messes is gonna employ an industry of senior, principal and staff engineers for years.

scrubs•about 6 hours ago

Good lord! Since when did engineering become making crap then fixing crap as a flex? To just make payroll so some corporate robo moron can make rent? You cannot be in my team. Or my company.

Willy nilly giving an agent more (write) access to figure out a bug ... man you're daydreaming.

baliex•about 6 hours ago

I think the idea is that a decent, responsible engineer can come in and fix all the vibe coded nonsense someone else wrote

snootypoot•about 8 hours ago

he seems equally as full of bad ideas as his namesake janet yellen

TacticalCoder•about 9 hours ago

> It's the most gloriously fast engineering experience humanity has ever created.

Someone drank the kool-aid.

> It reminds me of the doctor I saw last week at the medical clinic who spends 10% of his time diagnosing the patient and the other 90% stabbing his keyboard - one key at a time - for 10 minutes, only to write 3 sentences.

Correction: a pompous asshole drank the kool-aid.

anuramat•about 8 hours ago

he isn't wrong about the doctors though

asdff•about 7 hours ago

Of course he is because even if the doctor typed like that, how would the patient see it? They don't do that in the exam room with the patient there. I wonder how long it's been since they've even had an appointment with their doctor if they think that is what actually happens.

m3galinux•about 6 hours ago

Yes they do. General physical appt. 6 months ago. Doctor spent 2x more time on the wall-mounted EHR terminal slowly typing and clicking through menus than having a conversation or doing physical exams.

Just means their interface and workflow is bad and needs to be improved though, not that the doctor needs to be removed from the process altogether.