DE version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
50% Positive
Analyzed from 1369 words in the discussion.
Trending Topics
#https#found#varinteger#com#github#leanstral#datrs#test#bug#more

Discussion (31 Comments)Read Original on HackerNews
> One such bug was in the sign function for zigzag decoding of the datrs/varinteger library. On input Std.U64.MAX, the expression (value + 1) overflowed, causing crashes in debug mode and silent corruption in release mode—an edge case that testing and fuzzing would typically miss.
In what way would this boundary condition case be considered something that "testing [...] would typically miss"? It's certainly something that bad tests would miss or not think about, but I find that (a) careful people and (b) ML coding systems are actually really good at "oh, I should test the extreme values". Especially for things that parse user input.
I'm curious if they found other bugs that were more interesting, but found them too hard to explain quickly.
--- edit
concretely, I made a very simple round-trip test with proptest, and got dozens of failures and this in less than a second:
It does speak to the benefits of using lean in that you don't need to be clever about the different examples you test.
Every property-based testing system (invented ca. 1980) will explore boundary values. The semantics (or lack thereof) of C and C++ can make this difficult to actually test for because the compiler is allowed to say "test passed" to any input leading to UB.
that library is: https://github.com/datrs/varinteger
it seems probably correct, as there's an identical issue filed on that repo a week before this was published: https://github.com/datrs/varinteger/issues/8 (is this a leanstral employee? they have almost no info and only very sparse activity. or did leanstral perhaps just pick up this issue?)
it's a tiny, surprisingly-poorly tested, long-untouched (8y) library: https://github.com/datrs/varinteger/blob/master/tests/test.r... that has about 1k downloads per day: https://crates.io/crates/varinteger [1] which seems rather low.
I don't think I'd consider that such a smashing success that it's worth bringing up as the sole example tbh. though automated detection is certainly useful. or is this a noteworthy accomplishment for this sub-field? I haven't played with proof-writing LLMs, but given the paucity of training data I wouldn't be surprised if they're a bit rough compared to general coding.
1: https://crates.io/crates/varinteger lists it as https://github.com/mafintosh/varinteger-rs which redirects to https://github.com/datrs/varinteger , so despite looking different at a glance it does appear to be the same library
However, Lean is currently gaining significant momentum as an alternative, particularly due to its capabilities as a general-purpose functional programming language.
Personally, I think something based on Hoare or separation logic would be more practical as it'd be easier to align requirements with specifications.
https://news.ycombinator.com/item?id=48738938
GitHub: https://github.com/henryrobbins/open-atp
Docs: https://open-atp.henryrobbins.com
I created this tool for my own research and have found it really helpful to benchmark different automated theorem provers (my experience so far has been that Claude Code + Codex still out-perform Leanstral). My genuine aim is to share that usefulness with others, not self promote!
My thought was: Good job, this is tasteful personable marketing for a product with genuine value. I wish more marketing were done this way. So I think it's totally fine to be talking about the cool thing you're working on. I for one found it interesting and added to the discussion.
Probably the most annoying part about Reddit and HN and X (although it let you mute people) is the abundance of "expert" opinions from people who aren't experts at all. You just end up with a bunch of false signals that you shouldn't even be listening to.
all in all I say invest in spreading the words via other channels, maybe even X is better and even the right time zone (besides US working hours, I find European time the worst statistically for sharing your work).
The best and the brightest from Europe have no incentive to build in Europe when they can do it in America and be compensated and treated far better
I've found that you can get wildly different quality results from these sorts of models due to seemingly insignificant differences in prompt construction. It would be much easier to guess at what it wants if I could just see some RL transcripts -- and so the model author is in a much better position to provide initial advice.
Congrats on the release regardless! Excited for the direction Lean + automated AI proofs are headed.
Disclosure: I work at OpenAI.
This is a little bit like someone pointing the moon and you look at the finger.
The formal proof domain goes way beyond just finding bugs.
It has tons of usages in term of functional safety, protocol validation, cryptography, etc...
The fact Mistral tackle this kind of problem is both smart and not so surprising.
Smart because it is niche enough that they do not front face the big competitors (yet).
No so surprising because the French labs have a well known and long time expertise with formal proof tools (Coq and all its Ocaml associated tools). It has been historically mainly pushed by the aerospace and train industries (Airbus, Dassault, Alsthom).
Honestly: Think twice before dragging your firm into what you say.
Disclaimer: I speak for myself. Not any firm I am associated with.
this sounds like a great tool to add to the toolbelt, as part of the "how do we handle all the code output from LLMs" problem