HI version is available. Content is displayed in original English for accuracy.
I spent a fortnight using Claude to create specs for every version of RAR, then another using gpt-5.5 to write compressors in Rust.
It's not fast and it's not pretty, but it works.

Discussion (51 Comments)Read Original on HackerNews
As mathematicians say, optimization is left as an exercise to the reader. You did the hard part.
How can you shout at Claude when it’s
1) foobaring, bamblabooing and fghrtawing all the time without telling you what’s going on
2) when it finally interacts, it’s asking for a permission you told it 30 seconds ago "yes and do not ever ask me again until heat death of the Universe"
3) and after all of that, it just spits out: "you’re out of tokens, give up your liver or wait until next Trump’s war"
Does it? How are you legally intending to use copyright to license this machine output? How would you know it's not encumbered in any way?
It wasn't even a disasm/pseudocode to formal spec flow, and then a separate human implementation. The same human has been in the loop throughout, and large parts of it were generated directly.
It's basically guaranteed tainted.
Edit: I should have skimmed a bit more patiently, there was in fact no "disasm/pseudocode + the human getting tainted" part to this apparently.
"This is copyright-encumbered and nonfree because it's a derivative work of the legacy RAR binaries" is a different argument (and seems like it depends on details of the setup that were somewhat glossed over in the post).
You can get these LLMs to generate copyrighted outputs both intentionally and accidentally. This is a known fact; therefore, if you're not checking the output to see if this has occurred then you're potentially generating legal risks for yourself and anyone who uses your code.
To not only ignore this for your own use case but to then release the code under a proclaimed license seems legally problematic if not ethically concerning.
If you did get sued for infringement I can't imagine that your defense would be that you find the argument tiresome? Honestly, do you think this would never happen, or how would you go about defending your actions here?
For actual correctness verification in the strong sense, you'd need to start from a specification written in a formal language so that it's machine checkable, which if I had to guess not even win.rar GmbH has.
You know what I meant: How can we have confidence that this implementation of RAR is functionally identical to what it's based on? What would give me the confidence to use it in a critical piece of infrastructure?
You also know what I meant, since I spelled it out in more detail a comment below. But even though you're being facetious, yes, that really is the case. If it works it works. That's the bar for the vast, vast majority of software, demonstrated practical correctness. If you stumble into a bug, you log it as a defect and fix it. That's all that regular people ever have.
It's literally no different to e.g. validating the NTFS driver that ships in the Linux kernel, or validating any other (re)implementation of anything. You just do a bunch of empirical testing and hope for the best. It is also why reimplementations always lag behind.
Hell, I'm 99% sure this is exactly what the actual vendor does too, or at least I sure hope that they do have tests at least. Cause they're sure as shit not using a formally verified compiler toolchain, meaning they definitely don't have a formal proof about whether even the official implementation in itself is correct. Only empirical data at best too.
Because it's a defined format there can be binary exact comparisons between the input and output files - we already have an oracle in the form of proper RAR format software, so if they are identical, you don't need to look further for that specific case.
You can see a version of this that I did quite similarly, for postgresql wire format, here: https://github.com/pgdogdev/pgdog/tree/main/integration/sql
It validates that sql with the same setup, teardown, and test results in perfectly exact compatibility between raw postgresql as the control and various configurations of PgDog, with both the text format and binary format, so ultimately a 6-way multivariate test that should always result in binary-exact results.
There's much more about correctness of a piece of software than: "produces the same output as the original on x test cases".
I'm not saying it's a bad implementation and, if anything, LLMs are much better at translating/porting existing code (and finding bugs) than at writing things unheard of.
You're basically saying, if I may make a pun: "rust me bro, it's correct".
Were you flagged for a cybersecurity violation?
You can draw your own conclusions as to what this says about the state of agentic development.
Kudos to the author. A fun read, thank you for sharing.
Maybe just cut the unprompted whining?
HN is better than most in this regard thanks to community flagging, but even then there's a lot of it. Ultimately, it'd seem that the ratio you're describing skews a whole lot more towards the anti-ai sentiment side, than towards the anti-anti-ai one (or towards a stalemate). Or rather, that the latter sentiment is not common enough necessarily to thwart such comments. And so you see it reflected verbally instead.
One thing I have been curious at is are there any ways to stop a rar compression mid way and then continue it later?
Like suppose I have a compression happening for a large file, then would there be a possibility with this project to shut down the computer mid compression and continue it after starting it again?
I would really love it if you can add this functionality!
I suppose the question is whether the author had ever entered into a contract limiting reverse engineering...