HI version is available. Content is displayed in original English for accuracy.
Here’s a demo: https://www.youtube.com/watch?v=0cDpIntmHAM. Docs start at https://libretto.sh/docs/get-started/introduction.
We spent a year building and maintaining browser automations for EHR and payer portal integrations at our healthcare startup. Building these automations and debugging failed ones was incredibly time-consuming.
There’s lots of tools that use runtime AI like Browseruse and Stagehand which we tried, but (1) they’re reliant on custom DOM parsing that's unreliable on older and complicated websites (including all of healthcare). Using a website’s internal network calls is faster and more reliable when possible. (2) They can be expensive since they rely on lots of AI calls and for workflows with complicated logic you can’t always rely on caching actions to make sure it will work. (3) They’re at runtime so it’s not interpretable what the agent is going to do. You kind of hope you prompted it correctly to do the right thing, but legacy workflows are often unintuitive and inconsistent across sites so you can’t trust an agent to just figure it out at runtime. (4) They don’t really help you generate new automations or help you debug automation failures.
We wanted a way to reliably generate and maintain browser automations in messy, high-stakes environments, without relying on fragile runtime agents.
Libretto is different because instead of runtime agents it uses “development-time AI”: scripts are generated ahead of time as actual code you can read and control, not opaque agent behavior at runtime. Instead of a black box, you own the code and can inspect, modify, version, and debug everything.
Rather than relying on runtime DOM parsing, Libretto takes a hybrid approach combining Playwright UI automation with direct network/API requests within the browser session for better reliability and bot detection evasion.
It records manual user actions to help agents generate and update scripts, supports step-through debugging, has an optional read-only mode to prevent agents from accidentally submitting or modifying data, and generates code that follows all the abstractions and conventions you have already in your coding repo.
Would love to hear how others are building and maintaining browser automations in practice, and any feedback on the approach we’ve taken here.

Discussion (45 Comments)Read Original on HackerNews
2. playwright code generation based on 1, which captures a repeatable workflow
3. agent skills - these can be playwright based, but in some cases if I can just rely on built-in tools like Web Search and Web Fetch, I will.
playwright is one of the unsung heroes of agentic workflows. I heavily rely on it. In addition to the obvious DOM inspection capabilities, the fact that the console and network can be inspected is a game changer for debugging. watching an agent get rapid feedback or do live TDD is one of the most satisfying things ever.
Browser automation and being able to record the graphics buffer as video, during a run, open up many possibilities.
"Claude, reverse engineer the APIs of this website and build a client. Use Dev Tools."
I have succeed 8/8 websites with this.
Sites like Booking.com, Hotels.com, try to identify real humans with their AWS solution and Cloudflare, but you can just solve the captcha yourself, login and the session is in disguishable from a human. Playwright is detected and often blocked.
We used to deal with RPA stuff at work. Always fragile. Good to see evolution in the space.
EDIT: To clarify, I realize there are skill files that can be used with Claude directly, but the snapshot analysis model seems to require a key. Any way to route that effort through Claude Code itself, such as for example exporting the raw snapshot to a file and instructing Claude Code to use a built-in subagent instead?
I'm also using Playwright, to automate a platform that has a maze of iframes, referer links, etc. Hopefully I can replace the internals with a script I get from this project.
- Libretto prefers network requests over DOM interaction when possible, so this will circumvent a lot of complex JS rendering issues
- When you do need the DOM, playwright can handle a lot of the complexity out of the box: playwright will re-query the live DOM at action time and automatically wait for elements to populate. Libretto is also set up to pick selectors like data-testid, aria-label, role, id over class names or positional stuff that's likely to be dynamic.
- At the end of the day the files still live as code so you could always just throw a browser agent at it to handle a part of a workflow if nothing else works
// Let AI click await stagehand.act("click on the comments link for the top story");
the issue with this is that there's now runtime non-determinism. We move the AI work during dev-time: AI explores and crawls the website first, and generates a deterministic legible script.
Tangentially, Stagehand's model may have worked 2 years ago when humans still wrote the code, but it's no longer the case. We want to empower agents to do the heavy lifting of building a browser automation for us but reap the benefits of running deterministic, fast, cheap, straightforward code.
The implementation is also pretty different:
- libretto gives your agent a single exec tool (instead of different tools for each action) so it can write arbitrary playwright/javascript and is more context efficient
- Also we gave libretto instructions on bot detection avoidance so that it will prefer using network requests for automation (something that other tools don’t support), but will fall back to playwright if it identifies network requests as too risky
libretto gives a similar ability for agents for building scripts but:
- agents automatically run, debug, and test the integrations they write - they have a much better understanding of the semantics of the actions you take (vs. playwright auto-assuming based on where you clicked) - they can parse network requests and use those to make direct API calls instead
there's fundamentally a mismatch where playwright-cli is for building e2e test scripts for your own app but libretto is for building robust web automations
Edit: nevermind. I see from the website it is MIT. Probably should add a COPYING.md or LICENSE.md to the repository itself.
For more complex cases where libretto can't validate that the network approach would produce the right data (like sites that rely on WebSockets or heavy client-side logic) it falls back to using the DOM with playwright