HI version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
48% Positive
Analyzed from 1509 words in the discussion.
Trending Topics
#don#systems#pay#source#part#same#tools#years#llms#failure

Discussion (18 Comments)Read Original on HackerNews
For stateful systems, tests named after setup details often get weakened over time. Tests named after the claim they are trying to falsify are harder to water down.
The part I’d be most interested in is how well this works for business invariants like idempotent posting, no lost acknowledgements and recovery after partial failure.
No-lost-ack is conceptually the same shape with a simpler property (every acked write shows up at the end), but it breaks the same way most checkers break — if the recorder treats timeouts as success or failure instead of "unknown," real lost writes silently disappear.
Recovery after partial failure is where the AI-agent angle gets shaky honestly. Quiescence is the hard part. Agents will declare a system "recovered" while compaction is still running in the background. The skill forces a three-part check (no in-flight ops, no pending background work, replicas converged) before the invariant runs. How reliably that holds up against a specific SUT, I'm still figuring out.
The failure mode of these tools is self destructive in many cases.
-- edit --
I've seen clients and some colleagues working on things like this, and I can't seem to put into words how disheartening it is. With the exception of some private analysis work, I've shared everything I've built, with everyone, for free. Papers like Elle took years to think through, implement, test, and write. That's free. High-quality checkers, Knossos, Jepsen itself, and the analyses I've put my life into: all public, all free. I put a lot of time into docs and support; essentially all unpaid. I teach classes and give conference talks to make these techniques broadly accessible because I want other engineers to be able to make high-quality systems.
At the same time, I've got a giant pile of debt from an old house that just won't quit throwing curveballs at me, and it's gonna be a few more decades before I can retire. The fact that my clients are willing to pay for this work is why I can invest so much time in R&D and give it all away. When I see someone roll in and just tell an LLM "Go use Jepsen and Elle and figure this out", it's like... well fuck. Is this even possible any more?
Thankfully, LLMs are still really bad at my job, but I don't know if, or how long, that will last. They also don't need to be good to be useful.
And if these LLM tools work, it's good, right? They find bugs, systems get safer. I want systems to be safer. On the other hand, I'm motivated to share what I do because I really want to help people. If it's just LLMs... it feels hollow. I think about this every time I've tried to work on open-source in the last few months. When I spend hours trying to figure out how to keep naming consistent, how to preserve compatibility over a decade, how to make complex code approachable through quality documentation... I have a person in mind. Someone I'll never meet, but they'll see that work, and their life will be a little easier, and maybe they'll smile. I've been talking with my therapist about it: how the work I used to do thinking about other human beings now feels purposeless. How the effort I put into making these tools and ideas accessible will inevitably cannibalize my own employment, because someone, somewhere, is going to tell an LLM "Hey, go do that", and I work in a very, very small niche. It feels like incipient depression.
Recently I've been thinking about taking Jepsen and its supporting libraries closed-source, and changing the way I write reports--instead of teaching people how to test and what to look for, just telling people the results. I don't want to do this. It's bad for everyone, but maybe it buys me a few years of runway. Enough to pay down some of the debt and figure out what I can do next with this body.
Fuck.
I don't think this replaces you. The hard part of reliability is understanding the failure modes in the context of the business. No one has unlimited time or money, we always have to make tradeoffs. Only experienced humans have both the ability to interrogate the stakeholders and a vision broad enough to understand what to pursue versus what to give up.
Tools like this make the grind part of the job easier. They do not replace the holistic view you need to be able to confidently tell someone "worry about X, do not worry about Y".
I discuss a lot of stuff, but that is because I am a nerd at heart, and rather play with technology, read papers, podcasts and stuff, than whatching depressing TV content.
However in the world of enterprise consulting a similar trend has been happening during the last 20 years.
First offshoring, then raise of cloud based infra, serverless, SaaS and iPaaS, and now AI based orchestrations on top of iPaaS and serverless.
Meaning for the same kind of requirements, a team playing puzzle with those kind of products can be reduced to one third of what it used to be required about a decade ago.
Then what happens to the other two thirds that now don't have anything to do, and whose salary is used instead on those licenses?
I’m not sure i wanna stay on the ride much longer, at least in a corp setting. I guess i don’t have much of a choice.
Thanks for Jepsen, though, it’s made a couple of my applications much better in ways I wouldn’t have managed without it; even if I have to relearn clojure every time I pick it up, and those applications resulted in real jobs and careers for a bunch of people. It’s not going to pay for your house, but it’s all I’ve got.
hugs.
I honestly think the rise of LLMs will be the death of open source in the long run. Already, apparently, quality of OSS has dropped significantly since 2025 (so most models stop training on github after this).
I don't think a lot of OSS authors quite understand the extent to which models like claude/codex rely on their work. I'd bet money there are extensive curated tasks using your tooling for post-training. With 0 attribution or anything, these models are using your work wholesale to build sophisticated agents that can do your job.
Yeah it's depressing as hell. I guess it's the same thing for artists and musicians and writers.
P.S. I can symphathise with the old house issues! I bought a 1901 terraced property, it's an absolute money pit.
I get that you have a financial issue, but perhaps you don't need to be conflicted about about open-sourcing your work as far as helping people goes? LLMs are tools for people. Code, research, standards, etc... are all means to an end. Maybe the agent operator doesn't read or understand your work, but the guy who built the agent skills likely did. Progress moves upward, while standing on the work of those who came before us.
LLMs have lowered the barrier to creating software and can hide a lot of source material, but your work is clearly having an impact here. If your goal is to help people make better software, that's still what's happening. The industry shift is happening regardless, so we might as well embrace the positives instead of focusing on the negatives IMHO.
Moving to a closed-source model for financial reasons is a totally separate issue IMO, and I wish you good luck and prosperity regardless of your decision.
In other words: fascism is coming and you either lick the boot or you get stomped.
1. You chose to give your work away for free.
2. You are complaining that you haven't made money from your work.
Is that a fair interpretation of your argument?
I’ve built a similar workflow (but for system design/execution) and it works surprisingly well with the frontier models.
The skill includes scripts to ensure the work was actually done/followed, but I’ve been testing it without the scripts and it does a decent job.
Yesterday in GPT-5.5 xhigh[0] however I noticed some hallucinations, where the model stated it had created files, when in fact it hadn’t.
A small hiccup like this is usually fine, as the model realizes the files don’t exist sometime later, but in this particular instance, it claimed the files were created and then just continued on.
tl;dr - I fell into the trap of trusting markdown-only workflows, just to be bitten by the models hallucinating steps.
[0]xhigh is on, but in this particular turn there was no reasoning presented, so it may have been a degradation of the LLM/harness.