Back to News
Advertisement
ffchishtie 2 days ago 62 commentsRead Article on minicor.com

RU version is available. Content is displayed in original English for accuracy.

Hey we’re Faiz and Saheed and we built Minicor so AI companies who need to integrate to desktop systems with no API can quickly build scalable desktop RPAs. Demo: https://www.youtube.com/watch?v=MD0GHZIJ1cw

Edit: RPA is an acronym for Robotic Process Automation - things like triggering mouse clicks and key strokes to perform tasks programmatically. Sorry if this wasn't clear!

We were working on non-RPA integrations when a customer promised to sign a deal in 2 days if we could unblock a sale of theirs that involved integrating with a clinic’s Windows based medical record system. We didn’t know it at the time but it turns out that building desktop RPAs at scale is extremely difficult because scripting is hard (learning the system, defining the automation, UIs changing constantly), orchestration is hard (is the VM up? queuing, parallelizing) and debugging is hard (zero observability, false positives, cascading failures). 30%+ failure rates are not uncommon. At scale we’ve seen cases of failed RPAs leading to thousands of support tickets a month.

To solve the problems we were facing, we built an MCP that Claude Code/Codex can use to navigate a virtual machine running desktop software with Python to create RPA workflows. The RPA workflows run as Python scripts for speed, cost, and determinism. These workflows can be triggered by API following any input/output schema specified, with video replays and logs stored with each run. The MCP can debug RPAs and make changes to the underlying code, all of which are version controlled. We also built tools for cloning VMs for parallelizing RPAs, and handling 2FA/OTP challenges. Plus since workflows are code based: we were also able to add triggers for Slack notifications, human-in-the-loop steps, or call an LLM to verify the state of a VM by passing a screenshot.

Would love to hear your feedback and if you have any RPA horror stories! (:

Advertisement

⚡ Community Insights

Discussion Sentiment

79% Positive

Analyzed from 1446 words in the discussion.

Trending Topics

#rpa#system#legacy#code#more#screenshots#windows#thank#automations#automation

Discussion (62 Comments)Read Original on HackerNews

willwadeabout 6 hours ago
Interesting. It sounds like something I was doing : https://github.com/Smartbox-Assistive-Technology/app-automat... my use case though is people with disabilities. Not robots.
oveja1 day ago
For those confused like I was, RPA stands for Robotic Process Automation.
jorisw1 day ago
And on goes the needless use of abbreviations
polonbike2 days ago
Congrats on the launch. One complaint: RPA this, non-RPA that, but you never explain what it means. I would write down the acronym fully once at the first mention on the landing page.
sheepscreek1 day ago
I think their target audience is medium to large enterprises. The biggest tell tale sign of that is a missing Pricing page.

Most of these customers would already know the meaning of RPA, if they are researching companies for it. In a way, it self qualifies their leads into higher quality ones, that are more likely to convert.

vasco1 day ago
Their target audience here is random HN readers though.
numpad01 day ago
RPA was a big thing just towards the tail end of 2010s, though. MS had free official tools for Windows 10 to do it. I think at some point a Japanese bank had a robotic arm set up to flip a contract binder, scan the page and stamp it page by page using RPA.
fchishtie2 days ago
Thank you!! Yeah that's a good point - it's been so engrained in our brains, appreciate the feedback
ai_slop_hater1 day ago
It sounds like they are automating clicking through Windows GUI applications. Pretty pathetic.
throw031720191 day ago
Biggest question is how much of this can be stored / processed on our own infra and with our own lifecycle rules? For example, this can touch a lot of PHI. Screenshots, videos, JSON inputs/outputs etc.
fchishtie1 day ago
logs get written in our customers' premises on a bucket of their choosing (: so PHI doesn't leave their VPC
throw031720191 day ago
That’s perfect. Is this documented somewhere? Would love a deep dive on security / setup tweaks for data.
fchishtie1 day ago
happy to share more over email! you can reach me at faiz@minicor.com
throw031720191 day ago
Does this only revert back to LLM Vision when it catches an error? I.e once the RPA / workflow is built once, it’s efficient for running multiple times (until it catches an error state)?
fchishtie1 day ago
yes effectively, but we use LLM vision in multiple places - for context, there are multiple ways an RPA can fail:

1. RPA code breaks (ex: throws an exception if a window does not exist) 2. RPA reports success but was clicking / typing in the wrong place 3. Underlying system breaks (virtual machine / legacy software)

the skill we have in our MCP is to build the RPA code to throw exceptions where possible so an LLM can understand the context and recover

to avoid false success states we add LLM vision steps in the workflow itself to error out if it sees that the system is in the wrong state

and for the underlying system breaking it can be as simple as having a CRON job that checks the status of the process / the health of the VM and running a script to reboot the system

it depends on the system but the pattern we've seen with RPAs is you can catch maybe 80% of the edge cases in the first week it's been rolled out

dragonsenseiguy2 days ago
Small website nitpick: I feel like the "In production with" section's companies logos should be a bit darker, I could barely tell there was something there.
fchishtie2 days ago
yes good call out - that customer wheel is so overdue for an update
absk821 day ago
The underlying mechanism is different from something like computer use? Where can I find more details about how it works ?
fchishtie1 day ago
the underlying automations that are running on the Windows computers are just python scripts actually!
nthdesign1 day ago
Can you compare Minicor to Convey? They seem very similar. We had a product demo of Convey wherein they showed us how you could train the agent to use legacy software using a simple shared screen capture and verbal instructions.
fchishtie1 day ago
great question - Minicor doesn’t offer any agents for performing the automation - we focus exclusively on connecting coding agents to the legacy system and using our MCP to build code based automations for doing so at scale - trying to solve the 30% error rate in robotic process automations
bob7781 day ago
How does this compare to Automation Anywhere? They’ve been in the RPA market a long time but did a hard pivot to AI.
fchishtieabout 21 hours ago
we’ve found that existing RPA platforms were built more for business users/citizen developers at really massive enterprises - we’re more focused on the developer experience for building RPAs at scale
ilundin1 day ago
Is the cloud LLM the judge based on screenshots with patient/customer data included ? That seems like a no-go for many countries given privacy concerns ?
fchishtie1 day ago
No patient/customer data included in the screenshots - in production we’d basically find some “region” of the screen to screenshot that would help an LLM say yes/no to - ex: “is the nav bar green and does it say Insert Note”
ilundin1 day ago
Ok, that seems doable if working perfect, but what is the tool output "i can see the patient list with search functionality and 30 patients" thing in the demo ? Is this not vision detected ? Or are you digging into windows api (making non standard windows components/widgets non working)?
fchishtie1 day ago
ah for the fake EMR demo I was a bit less strict with claude, it was vision detected likely - our MCP has tools for taking screenshots of the screen and inspecting the windows accessibility tree etc
a-dub1 day ago
i'm curious: how does the steady state error rate of a stochastic automated system like this compare with the downtime and errors that come from a (brittle) deterministic bridge that can fail with upgrades? what does the observability look like? (i'm guessing one feature is that the execution log including images/screenshots for each transaction gets saved, which is probably a huge improvement.)
fchishtie1 day ago
it’s a good q - we experimented a lot with computer use / agentic automation and found that at scale a hybrid solution where the automations run as deterministic code with agents for recovery is the best - running automations as code is faster & cheaper & when you’re doing critical tasks (like updating patient records) you don’t want an agent to potentially mess something up.

previously writing RPA code used to take a long time - using AI (and its infinite patience) we can write more durable code that covers more edge cases

And since they’re code based it’s pretty straightforward to an agents monitor them and update their code when upgrades to the underlying system happen etc…

for observability - we have workflow execution logs that store text, videos and screenshots so an agent or a human can debug them - lots and lots of webhooks when things break ! (:

_crowecawcaw1 day ago
I also experimented with vision/screenshot based computer use tools for similar use cases but had inconsistent results. LLMs had trouble getting precise pixel coordinates from a screenshot to move a mouse. And the screenshots took extra tokens. I had a lot more success using accessibility APIs to replace screenshots + input simulation since accessibility data is easier for LLMs to process. The accessibility functionality is now released as a separate library for building automation tooling: https://xa11y.dev/
fchishtieabout 20 hours ago
cool! thank you for sharing - will check it out
Advertisement
theaniketmaurya2 days ago
Congrats on the launch! Legacy system users are also one of the slowest to adopt AI. How do you navigate that?
mediaman1 day ago
I've found that legacy system users (or at least the execs) are pretty excited about AI because they hate their legacy systems but can't really do anything about it (ERP changes are an extreme nightmare, and often no better system exists with all the capability they need). They want to wrap it in AI to automate stuff without changing out the core system.

This seems like a good approach to me, I work with a lot of legacy ERP-using companies in the manufacturing sector and can immediately see how we could put this to use for our customers.

I especially like that it's not doing computer use for everything which so far doesn't really seem to be working, especially outside the browser.

fchishtie1 day ago
yeah those core systems of record are so locked in place - I can't even imagine the change management
debarshri2 days ago
Legacy system users are also the one who pays the most for tools and services. We sell to enterprise, I can attest to that. If it is relevant usecase and positioning for the market, it should be fine.
fchishtie2 days ago
yeah it’s been interesting to watch, we were surprised initially at how much legacy users actually wanted to adopt AI - I think it’s because of how awful the old software can be to interact with
fchishtie2 days ago
100% right - we support the AI companies who are selling to the legacy end users - for ex: we don’t sell directly to hospitals, but an AI scribe for doctors that already has a hospital as a customer, we help them integrate to the hospital’s EMR
mingabunga2 days ago
Could you use this to test new releases of software for bugs? A bit like TDD but for GUI interactions
fchishtie2 days ago
Yes! we have customers doing that
throw031720192 days ago
So AI companies would install this on their customer (practices) computers?
fchishtie2 days ago
Yes, more likely on a virtual machine running the legacy software
throw031720192 days ago
Thanks. Most practices are not tech savvy. So how would the VM setup work in their own network / machines?
fchishtie2 days ago
Yeah that’s true - in those cases we’ve either worked with their outsourced IT provider to spin up VMs for us or have had to spin up our own VM and connect through a VPN - IT can be very fun…
shrianshagabout 21 hours ago
Congrats on the launch !
fchishtieabout 21 hours ago
thank you!
throw031720191 day ago
How does this compare with CyberDesk (also YC)?
fchishtie1 day ago
We think CyberDesk is great - our main difference is I believe the primary driver for their automations is computer use / agent based - whereas our automations use agents to create/maintain/support python code
mayanksingh091 day ago
Congrats on the launch!
fchishtieabout 21 hours ago
thank you!
furyman1 day ago
Just 1 day ago a person reached out to me making exactly same product. Exactly the same. They were based out of Europe. But it's good to see you got into YC. Congratulations.
fchishtie1 day ago
thank you! it’s a cool problem space (:
throw031720192 days ago
Please make your trust center public if you are focusing on healthcare AI companies…the footer link is dead.
fchishtie2 days ago
Thanks for flagging this!
snozolli1 day ago
Computer use agents that run on Windows VMs or in the browser. On-premise, cloud

I think you meant premises.

https://brians.wsu.edu/2016/05/30/premise-premises/

throw031720191 day ago
I’ve never heard a customer say “on-premises” when talking about servers they run. On-premise is usually the term regardless if it is “correct”.
snozolli1 day ago
54% of Americans read below a 6th grade level. People word-mash "alot", "atleast", and now "eachother" all the time. Sports commentators use verse when they mean versus.

I'm not suggesting that you correct your customers, but there's no reason to sink to the lowest common denominator when writing.

fchishtie1 day ago
thank you - good catch (:
kul1 day ago
congrats on launch!
fchishtie1 day ago
thank you!
Advertisement
Boxxed2 days ago
What the deuce is an "RPA"?
sakin2 days ago
Its an acronym for Robotic Process Automation. It usually means triggering mouse clicks and key stokes to perform tasks
fchishtie2 days ago
It's a script that simulates clicks/keystrokes on Desktop/Web