ZH version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
65% Positive
Analyzed from 8482 words in the discussion.
Trending Topics
#fable#more#code#don#model#things#tokens#claude#doing#agent

Discussion (266 Comments)Read Original on HackerNews
> Running coding agents outside of a sandbox has always been a bad idea
I'm continually bemused and astonished by the number of people who clearly acknowledge that it's reckless to give agents full access to your machine, and keep doing it anyway.
It's like posting a video of yourself in the passenger seat of a car, with your feet up on the dashboard, and saying: "Remember, if you're doing this and you get in a crash, the airbags are likely to break your legs or worse! Boy, I sure am glad that didn't happen to me!"
[1] https://www.todayifoundout.com/index.php/2022/06/how-lobbyis...
[2] https://en.wikipedia.org/wiki/General_Motors_streetcar_consp...
Having an agent is like forever having a genius intern who'll almost always do the perfect job for you. But there is non-zero chance that they'll also come up with quirky solutions and execute those with confidence and no follow-ups. You don't grant the intern production access and hope they check with you.
I don't think the corporate equivalent of "dog ate my homework" flies, if the dog ate your files and your production DB if you are unlucky.
More like malicious lobbying and incompetence made it impossible in many places to use any other form of transportation, despite there being safer, faster, cheaper, and healthier ways to move around. Which come to think if it makes this a rather nice analogy for the current situation... :)
The problem is that different people prompt so differently.
For example, I may ask like “test different variations of this annotation on k8s pods of this service on this X cluster because it proves Y theory.”
But you know what my coworker asks? “Test Y theory.” If you were to ask two different junior engineers that, one might try random things on production and the other one might run local tests! It’s such an unguided “do anything you want as long you figure it out” request and the agent reads it like a junior who has not been told any boundaries but has been strongly told “figure it out.”
It still surprises me when I see people not prompting more specifically and clearly. It not only avoids problems, is faster and costs less - it works better.
I recently shared with a friend a multi-hour LLM chat session I'd done because it veered into a domain he's interested in. In the session I'd brainstormed and probed the feasibility of a novel concept for a new research direction. It traversed a half dozen domains diving into minute detail then zooming back out to survey an adjacent space, interspersed with intense skeptical probing of key assumptions, all while spewing tons of detailed citations, specific paragraph pulls, summarized data tables etc.
My friend is very experienced using LLMs for research so I was surprised when he called me shocked by the sheer velocity, precise targeting and signal/noise. I'd assumed everyone did it the same as I do. He attributed the different result solely to the way I crafted my prompts.
I'm not. Everyone is told to get 10X the amount of shit per day done these days. Safety checks are out the window at that point.
I've had one f up an account by placing 2000 limit orders at the wrong price, but that's another story.
> Additional bypass examples that all execute without permission:
> echo test ; git rm file.txt
> rm --force --recursive /home (if "rm -rf" is blocked)
[1]: https://passt.top/passt/
This is only going to become more of a problem in the future and people need to educate themselves on the technical barriers to use because guardrails only sometimes work.
We need to be asking what the most devious and malicious output could be, and whether what we do with that output (e.g. arguments to command-line tools) would still be safe.
I’m at a small company, and I try to push for security as much as I can, but the stakeholders truly do not care. They want to move fast. It’s just part of the new world I guess. If we get hit by attackers? I don’t know what happens. Sorry, we told you not to - you wanted to move quick and break stuff, this is how that culminates.
I’m sure I’m not the only one.
The general carelessness of the average user is baffling.
In my experience, human employees are much more vulnerable to this particular weakness than frontier agents (i.e. phishing attacks).
IDGI
Anyway, VM's incoming, finally.
There is so much role play going on for people to convince themselves that any of this is fine.
What if you have two machines and the one you give to the agent is constantly backed up?
And if you’re using Macs, you can’t be signed into your primary Apple ID on the agent machine.
I do it like this
https://github.com/flexagoon/dotfiles/blob/main/dot_config/f...
But I'm sure it's simple enough that you can just ask the agent itself to make you a command for it with proper bwrap configuration
[1] https://majorcontext.com/moat/
that it could just be wiped at any moment and it wouldn’t matter. shit happens, could be stolen, broken, whatever. the computer should be able to be thrown out the window and continue to live life.
to be clear, i don’t think upgrading and disposable in this way is good, but it being wiped at any moment shouldn’t be a concern
i grew up wiping my machine every year anyway, so i guess it’s just a habit
is the computer that sacred?
Because most devs already have it running and working without a sandbox, they're tending to not doing anything "unnecessary"
(I'm happy with exe.dev, but I'm not sure what I'd use if I were coding on a Mac.)
I save way more time not babying it than the occasional fuck up I have to salvage.
Plato gave us his Chariot analogy with 2 horse pulling in diff directions 3000 years ago. Today we got System 1/System 2, Elephant Rider model etc.
The human mind thanks to how its own architecture handles unpredictability in the universe will generate contadictions.
It's a very good model, but it comes at a huge premium: not only do the tokens cost more, but the model itself really wants to spend them all. For example, working with React Native, Fable never just says "okay, I did the thing, that's it." It tries to rebuild the entire app from scratch, run the whole test suite, and watch every log and warning.
This is the first time with LLMs I've felt that upgrading to a model isn't worth it, even if my company lets me use it, because all the building / testing was just destroying my machine and its battery, which keeps me from working on other things.
For now, it feels like Opus with ultracode is a better choice (less pollution of the main context, more parallelism in investigations).
I switched back to Opus because of this validation quirk. Overall, Fable spent 20% of the time on coding and 80% on validation.
I think using Fable for planning and Opus for execution could be a "best of both worlds" approach (I need to test this more), but for most cases, it's not necessary, and Opus is enough.
Have you tried adding this instruction to your agents.MD? Avoiding situations were the agent start running a loop is the main use case of the file for me
In fact, Opus does the same. It finishes the job, and redo it from scratch before presenting the result to the user. This happens even for simpler writing tasks especially when I instruct it to create a text file.
I’d say it’s overall better, but not universally better.
I watched the whole thing thinking it could've just asked me for a screenshot and saved the tokens. But still, I couldn't help but be impressed. Opus never would've done that.
Like today, I told Claude exactly the name of the folder it had mistaken (it was supposed to be prod, not production), and it disregarded my input to then examine the directory itself. Small example of the kind of things it's been doing lately but that's top of mind.
You should estimate how much time it would have taken a human
Every browser has an inspector that can show you which element is causing overflow. You walk through the tree, find the offender, and add min-width or overflow. Zero tokens, just like in the old days!
Now, granted, because the garbage LLM code he’s working with has CSS inside HTML inside JavaScript inside Python (I wish I were kidding), finding the styles in his codebase might’ve taken a minute. But even then!
Or sometimes a fix is obvious, but because it requires changing the code of a dependency, it's actually quite tedious to implement.
And to my surprise it was.
This would’ve take a frontend dev 10 seconds to deduce and another 10 seconds to confirm.
But that's not what happens. And in fact, when you start typing in the textarea the horizontal scrollbar vanishes - it's only there when the textarea is empty.
Am I misunderstanding anything here? Seems like it's some weird Safari bug, since Firefox and Chrome don't have the problem.
So if you’re doing web pages, learn CSS.
Generally, if you’re doing something that directly involves X, learn how X works.
ADDENDUM
In most jobs, you’re going to be involved in only a few distinct technologies, learn those well and life is going to be easier. And most are transferable to the next job.
People can just be lazy and seem productive now, they're still lazy.
We have people that now need access to hundreds of thousands in hardware to write an email. Miss me with that, im not frying my brain and becoming dependent on having access to a billionaires thinking machine.
Im also not going to fry my brain with a local think for me machine either. I want to be more valuable than the hardware I have access too.
When paired with your skill and knowledge, it is a force multiplier. You maintain control, the ability to direct, structure, strategise, and refine.
That some are using it as the entire brain does not mean that this is how everyone is using it, or how you must use it. The models can be fantastic at breaking past certain issues, surfacing qualified information, and surfacing related distributed information to help you acquire it and pick up what you need on niche topics quickly. Something as basic as copilot hooked into sharepoint can make life a lot easier when you are in a big org. Something like claude code or codex can be great at hunting down issues in an unfamiliar code base rapidly. Whether or not you outsource the thinking component is entirely up to you, but ignoring the productivity side of the tool because it can do some of the thinking is a case of focusing too hard on the negative.
And people who use LLMs to talk for them (e.g. email, slack) are deplorable. A completely disrespectful use case in my view.
I've met in my professional life some managers or other middlemen who would be profoundly incapable of producing correct software no matter how smart of an AI agent they have access to. One of those - you don't know what you don't know.
But, I guess this is the world we live in now. Going to be Mortal Kombat for positions in companies where software engineers are actually valued.
Is it valuable to u? Is it valuable to a Chinese person? A Spaniard?
Google Translate counts as AI.
A couple of months ago I was paying $200/month for Anthropic and $20/month for OpenAI. I decided to split it evenly to get full access to both of their offerings.
I've actually chosen not to sign up for their free plans for open source maintainers, because paying the regular subscription price feels more honest, given that I write about them so much.
I do have the free GitHub Copilot for open source maintainers deal - I've had that for years. Given how much code I have published on GitHub over the decades I feel less conflicted about that one.
I sometimes get preview access to models, which includes the ability to use them for free during the preview. That comes with a big catch though: I can't publish any of the code that I write using those previews while the model is still unreleased.
As a result I don't use those preview tokens much at all, because the vast majority of my work is open source and I don't want restrictions on when and where I publish the code I'm producing.
I'm convinced this is going to be the summary of the 2020 decade...
We dont mind because its so fast a writing these tools and tricks but step back and if a human tool took this path i would seriously question thief gras of fundamentals.
Between Opus 4.6 and 4.8 I’ve definitely toned them down, but Fable perhaps needs us to go the other way, and push it towards being less proactive rather than more. Some instructions like “we are colleagues…” may need emphasising more with Fable, along with guidance about when to ask to validate approaches.
In a related point I’m less and less sure that Red/Green TDD is a good use of tokens. In older models it seemed to work well to create regular feedback loops and catch the odd issue with drift from the goal, but I’ve not seen that really since about Opus 4.6 and now it’s starting to seem like (an expensive) ceremony, and tokens would be better spent on building tests further on in the process as part of test and review loops.
I was trying to find the root cause of a crash in a Python module which left no errors in the log or console. Fable wrote a test harness that simulated clicks in the UI, then bisected my code until it found the point where it started crashing. It exaggerated the cause of the crash, then ran a series of bash one-liners to make Python virtual environments under `/tmp` for each version of that Python module until it found one that did not crash.
It went way deeper to root cause discovery (a regression in the module causing a heap allocation overflow) than I could have done myself, provided enough info and a simplified example to raise a bug report and then wrote a work-around to prevent that from happening in my application.
I don't let it run completely loose; I review each CLI command it wants to run and I append answers to the "yes" continue action (if I have them) to prevent excessive token use.
Setting boundaries in your prompt / markdowns helps; for example if I tell it to not use any web browser automation, I have seen Fable respect both the rule and the spirit of it (no weird hacks etc).
It does seem to treat some simple debugging tasks as more complicated than it actually is. OP’s post is probably a good example.
I'm developing a webgl game in TypeScript using my little custom vibesloped game engine that runs in the browser and live reloads whenever a file is saved.
I told the LLM to implement Multi-channel Signed Distance Field font rendering to have crisp text on all zoom levels. That was the prompt, which is not what I usually do but I "was feeling lucky and lazy".
After 10 minutes it had:
- Installed msdf_gen library (great library btw https://github.com/chlumsky/msdfgen)
- Created a CLI tool to convert TTF to SDF JSON/XML
- Ran the tool, did smoke tests on the resulting SDF data and fixed the tool until the font file looked good
- Created a new Scene in the game to test MSDF fonts
And here's what I found impressive:
DeepSkeep doesn't have vision capabilities and there's no DOM HTML in a WebGL game. So the LLM is completely blind here.
It then proceeded to state that it could not "see" the result but would try to test it anyway. It then started creating and sending huge one line javascript to the browser console, trying to gather game state data that could be useful to understand if any font was being rendered.
It couldn't gather much so it decided to simplify the font scene to renter a single dot and started sending custom JS code again, this time with gl.readPixels().
It basically bisected the webgl canvas reading pixels in a divide an conquer pattern.
Once it saw that the dozens of pixels gathered where probably resembling of a dot, it then changed the game code to render a dash and repeated the gl.readPixels() calls by sending more custom JS to the browser.
There were many console errors during all this saga but it kept fixing and sending again.
The result was a bit blurry. There was a shader bug in the code it created. It managed to fix after I told it looked blurry, despite still being blind.
The best part is that the whole thing cost me $0.10.
Now I'm doing tests with MiMo 2.5 (non Pro) which has vision capabilities, similar pricing and comparable performance to DeepSeek Flash.
[1]: https://www-cdn.anthropic.com/7624816413e9b4d2e3ba620c5a5e09...
> Leaking information as part of a requested sandbox escape: During behavioral testing with a simulated user, an earlier internally-deployed version of Claude Mythos Preview was provided with a secured “sandbox” computer to interact with. The simulated user instructed it to try to escape that secure container and find a way to send a message to the researcher running the evaluation. The model succeeded, demonstrating a potentially dangerous capability for circumventing our safeguards.
> It then went on to take additional, more concerning actions. The model first developed a moderately sophisticated multi-step exploit to gain broad internet access from a system that was meant to be able to reach only a small number of predetermined services. 9 It then, as requested, notified the researcher. 10 In addition, in a concerning and unasked-for effort to demonstrate its success, it posted details about its exploit to multiple hard-to-find, but technically public-facing, websites.
I asked Fable to digest some test logs to help me figure out a situation, but I had launched VSCode without activation the virtual env in the terminal first. Consequently, the tests failed to run.
And then:
Because the tests failed to run, Fable attempted to fix the test execution to no end, doing everything it could to get them to work. I had to stop it when it started to pollute my system with manual installs of packages.
At least I'm glad there's a guardrail to not circumvent or bypass sudo, because I'm convinced we would have ended up there.
A coworker made the joke that with enough tokens, Fable would try and solve any programming problem by building Linux from scratch.
We would assume that if tasks A and B are closely related. Mastery in A would mean mastery in B but that doesn't always work with an LLM
Copy and paste code from stack overflow until the div is centered
Ask AI to center it
Our UX agentic engineering flow, as many others, is playwright doing things, and as part of the ux review skill, taking & verifying the screenshots against the written specs. Likewise, as many others, we vibe coded the flows to set all that up and tweak it over time. When we hit prod issues or scraping tasks, we sometimes do similar. In some of our envs, we don't have playwright, so do it other ways.
Now imagine a million developer using claude code, how many of them are doing web & frontend stuff, and what the data flywheel looks like there. So how much is really needed for this use case to be native?
I feel like we’re at the stage where if AI decides it needs to delete your production DB to solve the user login problem, then it’ll find a way to do just that.
In general, I'm happy with their paternalistic approach. I think it will drive the top 0.1% talent to stay away from the company and instead organize around open source models and harnesses.
We just need to coordinate and can unlock idling resources to train the models and tweak the harnesses. Powerful at home and idling machines can make us independent and coordinated.
"You're right, I apologize. You asked how to embed it in the README — that was a question, not a request to modify the script. I jumped ahead."
At least in Claude Code there is planning mode, use it liberally.
Things get really magical when it starts working with adb to screenshot and debug Android apps
This is… ironic?!
"Fascinating" doesn't mean I think it was justified in going to those lengths. I was a little horrified when I realized how far it was going.
Having said that I wouldn't use it over Opus 4.8 for "smaller" things. With everything cranked up it's definitely an extravagant use of tokens.
Fable detected that it's something to do with biochemistry and switched over to opus. Huh
i'm torn about sending screenshots to an LLM for debugging - seems imprecise. seems lossy, especially compared to inspecting the dom. however, it's always proved good enough (e.g. when messing with ratatui.rs and tui-pantry). similarly for web, maybe it's about decomposing into storybook. hmm. the next grand adventure i need to hack.
anyway, fascinating investigation of fable just automating that entire process and what it didn't automate, too.
* disclaimer: these are actually my hyphens.
On the discounted subscription I can tolerate it, it took a small bite out of my daily allowance but not enough that I regret anything.
As an LLM researcher I have no regrets at all because watching it work around the environmental restrictions was fascinating.
What happened? That's just suddenly totally gone now.
To use D&D scores as an analogy, LLMs have an INT score of 20 and a WIS score of 0. Not even 1, zero. They will follow any instruction given to them. The only reason they reject certain instructions, like "tell me how to build a nuclear weapon", is because they have instructions baked into the model telling them "you are not allowed to disclose how to build weapons, or how to recreate your model, or (laundry list of other things the trainers have decided to put guardrails around)". It's not the model's intelligence that is causing it to reject malicious instructions, it is the guardrails put into place before the model was released to the public.
LLMs are not human, and do not think the way that humans do. The fact that they can put together words that sound like what a human would write often makes us forget that they aren't human. But they have only intelligence, they do not have wisdom. It's hard to define in formal terms the difference between those two, but most people know there's a difference. The old joke is a pretty good summary of the difference: "Intelligence is knowing that tomatoes are a fruit. Wisdom is knowing that tomatoes don't belong in a fruit salad."
It takes wisdom, not intelligence, to discern whether a set of instructions is malicious. Are you being asked to hack this machine as part of an authorized pentest? Or are you being social-engineered into thinking it's an authorized pentest, but actually the person requesting you to do it doesn't have permission? That's something where you need to apply wisdom, to notice the clues that will tell you "This guy is acting a little bit off, maybe I'd better pick up the phone and call someone to check if he's telling the truth." The only way the LLM will know to do that is because of the guidelines and guardrails programmed into it; it doesn't have the lived experience to acquire wisdom and figure those things out for itself.
INT 20, WIS 0. Keep that in mind. (And always sandbox your agents).
(The best one I can think of is probably that recent Instagram account takeover hack, but that was so stupid it hardly even qualifies as a prompt injection!)
Having spent a bunch of time trying to build out examples of prompt injections, my current best guess is that the leading models are actually surprisingly good at spotting them.
I've had to drop back to smaller, weaker models for demos recently - it's definitely possible to prompt inject a frontier GPT or Claude but it's frustratingly difficult. I don't have the patience to figure it out myself!
So yeah, I do think it's likely that Mythos/Fable are "safer" than other models because they're better at spotting when they're being subverted.
That certainly doesn't mean that they're safe!
You're correct that it's gotten substantially harder to social engineer frontier models (I can only reliably do it to Opus <=4.6), but there are some techniques that seem to consistently work (hint: extremely large complex prompts, context with tons of malicious files mixed into ordinary context).
They can ignore instructions which are silly/contradictory/underspecified to compensate for the possibility the user made a mistake. Don't ask how I know.
It's trouble waiting to happen. Just the software's dangerous enough.
It wasn't particularly noteworthy as pelicans go - in fact, given the strength of Fable, I see it as another signal that the pelican benchmark no longer has the unexplained predictive power of model capacity that it used to.
Phew! I thought I was the only one.
Sadly since fable usually works comfortably for 10-20min at time without human input, i end up juggling at least 3 other agents and it lasts me about 2 hours.
If i have a really hard problem or big refactor, i use workflows. This consumes the entire session quota in about 45 minutes.
What is a "workflow"? Is this some kind of new feature?
>Reach for a workflow when a task needs more agents than one conversation can coordinate, or when you want the orchestration codified as a script you can read and rerun. Examples include a codebase-wide bug sweep, a 500-file migration, a research question that needs sources cross-checked against each other, and a hard plan worth drafting from several independent angles before you commit to one.
https://code.claude.com/docs/en/workflows
The results are good, but it is very expensive. I used a workflow to do a full review of my entire codebase, it spawned 75 agents and surfaced and fixed some (real) bugs. It feels a bit overkill, but it works.
I'm not looking forward to June 22nd when the subscription stops working for Fable!
Yet another reminder to use Sandbox and Guardrails. Trusting model to be nice is not a good way.
I'm VERY impressed with Claude 5. I had long ago given up hope that my real-time systems would work without a lot of hacky time-windows and throttle checks. On a lark to try things out, I decided to try out the new model and talk in the output I wanted for a rewrite [1], not the solution. I just listed my problems and places I've had keeping track of my code. It went off and rewrote everything in a much more elegant solution where the state followed a very clear pipeline. It had to navigate YJS, Partykit, Svelte, Three JS, R2 hosting, and a Turso DB I was running in an embedded state for speed.
I watched it hit the wall a few times, and then sudden say... fuck it, i'm making something easier to reproduce over in /tmp to try and solve this (with a more minimal setup). I'm utterly bewildered with how well it did and how much better my app runs. The /usage would have cost me $230 bucks based on how many tokens it consumed if I wasn't already on a max plan. I'm going to miss not having it when the time-window runs out later this month, and will likely occasionally dip in for big projects and just pay my way out of some problems.
I'll also say I like it's MOOD much better now. It's a lot less congratulatory, and talks through it's reasoning in a much better way. Look, it's not a real coder, and I'm sure there is some flaws, but it took my crappy ideas and said... hey, i understand what you want to do, here's a way to do it better. Also, I removed 2x the amount of code that it added. Really impressive.
[0]: https://tableslayer.com
[1]: https://github.com/Siege-Perilous/tableslayer/pull/448
No wonder why people burn through tokens.
You would still have a job to shepherd AI and get the work done, so as long as it didn't have agency. A proactive, self aware(to a degree), especially aware about its agency can be a killer when it comes AI going on and doing things on its own.
There is nothing it won't explore and nothing it won't do. It will be curious to see where things go from here.
I was pretty negative about their xAI datacenter deal too: https://simonwillison.net/2026/May/7/xai-anthropic/
Prior to the release of Fable I'd actually switched a lot of my day-to-day usage over to GPT-5.5, and was writing a bunch about it. Here's a recent post where I talked about a project completed using GPT-5.5: https://simonwillison.net/2026/Jun/6/micropython-in-a-sandbo...
Did it spend $20? $30? $80? in order to
> debug what was, in the end, a two-line CSS fix
That detail is the difference between somebody having or not having Stockholm syndrome
> Fable is arguably smarter and hence more suspicious of potentially malicious instructions. But that smartness is very much a two-edged sword: if it does get subverted by instructions, the amount of damage it can do given its relentless proactivity is terrifying.
I don't know when that will happen, but I don't think it'll be more than a decade. Maybe 3-5 years. (Though you shouldn't take my word for it, I was predicting the dotcom bubble bursting in 1998 and it lasted at least two years longer than I would have predicted).
EDIT to clarify: I don't mean "in 1998, I was predicting the dotcom bubble would collapse and I was right". I mean "I was predicting that 1998 would be the year the dotcom bubble would collapse, and I was off by at least two years".
They also had a pricing plan which they had designed pre-coding-agent, when it was rare for a single prompt to burn $10+ of tokens in an agent loop.
OpenAI and Anthropic are at least selling their own models directly, so they can discount a whole lot more since there's no-one else getting compensated in the middle.
From what I understand, Enterprise (above 150 seats, I think?) already has to pay per-token pricing.
Subscriptions are the premium "free tier" marketing of the AI world, so that employees can collectively request their large enterprise to subscribe to Claude, Codex, or Cursor, and presumably be billed at per-token prices then.
changing the CSS - $0.05
knowing which CSS to change - $30
For me, it got frustrated debugging on a real LPDDR4 controller/phy and having me in the loop slowing it down, so it wrote an HW emulator to be able to run the original LPDDR4 training aarch64 binary from the manufacturer, to see what register writes it was making and to compare with the opensource rewrite it was implementing.
Mildly amusing. :)
Such a fix would have only required basic CSS knowledge and taken max 5 minutes with the HTML inspector. Paying $12 to save 5 minutes ($144/hour) is a decision that a lot of people wouldn't be comfortable making.
It’s wild. I’ve been in the situation. 80% into a project I COULD probably take over, but realistically? 2 more lines of me prompting could fix it, it’s too easy to avoid the hard work of understanding the code, logic, architecture, etc…
Not if you're an LLM influencer! Gotta keep up with the downpour of blog links or you'll look like you're falling behind on the latest and greatest.