You Don't Know Jack About Formal Verification

eeatonphil about 6 hours ago 19 commentsRead Article on queue.acm.org

FR version is available. Content is displayed in original English for accuracy.

⚡ Community Insights

Discussion Sentiment

47% Positive

Analyzed from 2231 words in the discussion.

Discussion (19 Comments)Read Original on HackerNews

taeric•30 minutes ago

I'm not entirely sure what this is showing people don't understand? Especially when going with such silly ill defined concepts as "financial conservation". Just what?

Now model in that it was shipped, but an earthquake caused the delivery truck to be destroyed. Or it was shipped, but the person that ordered passed away before delivery and the estate is refusing to accept packages.

People will want to somehow transfer the model of an online order as similar to an in store purchase. Does that mean that as soon as a customer takes an item through the door that the store is free of any and all obligations on the item?

The answers in all of these will have to be that there are processes in place to be executed. Some may require overrides on state of execution that have to be applied to get things back to a resolved status.

Now, do we want to make sure that normal execution of some code does not leave us in an unresolved status? Of course we do. And many people want to think they can find a way to model the world such that no contested states can exist. I have my doubts. But I welcome efforts to make it so that we surprise ourselves fewer times with some outcomes.

dfabulich•about 3 hours ago

Formal verification is still too limited to be useful for most app developers. The article gives an example of an e-commerce platform using it to prove the correctness of managing refunds, but then acknowledges:

> As of today, the formally verified core can handle most effect-free logic—invariants, transitions, conflict resolution. But the UI, network calls, and database interactions typically sit outside the verification boundary. Verification makes the core airtight but doesn’t guarantee end-to-end correctness.

So you can formally prove that your e-commerce refund management logic is correct, except for proving that you processed the refund. You can't even prove anything about recording the refund in your database, say nothing of proving anything about your interactions with your payment processor.

If your app is mostly tricky logic with just a bit of I/O, your app is very unusual, and it's almost certainly not an e-commerce app. E-commerce apps are mostly CRUD apps; I/O with the database, the UI, and third-party APIs (e.g. payment processors) is 99% of the code.

Even property-based testing is mostly unhelpful for e-commerce apps like these.

Instead, think of formal verification as a runtime performance improvement of property-based testing. If property-based testing is useful for your app (it probably isn't), then you may be able to convert some of your property-based tests into formal verifications.

But, honestly, you probably can't do it, not even with a high budget of tokens.

I'd love to be proven wrong, but the way to do it would be to formally prove the correctness of non-trivial open-source code with property tests. Perhaps you could formally verify significant chunks of Postgres! (But I doubt it.)

teach•about 2 hours ago

So much this.

I actually did take a formal verification course in college. Our final project was to use the techniques we'd been learning to verify some classic critical-section locking algorithm. I chose to verify an implementation of Lamport's bakery algorithm[0] in C (this was the 90s -- a lot of code was still being written in C).

The problem is that Lamport's algorithm makes an assumption that the "ticket number" is unbounded and any finite implementation in C will almost certainly use a value which is limited to 32 bits or so.

So I was able to formally verify that the algorithm fails to protect the critical section if enough processes are kept waiting to overflow the counter. :)

This probably just means that Lamport's algorithm isn't a great choice for such environments, but I'm still bummed that the professor gave me a B.

[0] https://en.wikipedia.org/wiki/Lamport%27s_bakery_algorithm

nilkn•19 minutes ago

There's a lot of really important software out there where being able to easily verify effect-free core logic would certainly be very useful. An e-commerce web app is not a good example. Anything safety-critical -- aerospace, defense, medical devices, power generation, industrial machines -- already requires a certification process. Auto-generating proof evidence as part of the cert process (which generally requires a rigorous spec anyway) in the near future seems like a no brainer.

gf000•about 2 hours ago

Well, I'm someone who barely knows more than jack about formal verification, but in pretty much every case you have to have some kind of model that you are actually verifying.

How close that model sits to the real thing you have modeled is an important question, and you are free to be as close or distant as you want -- e.g. for verifying different properties of a programming language you might decide to not care about CPU instructions, registers, etc, and only care about the semantic model. This has absolutely many use cases (e.g. whether a particular optimization is sound) where this "model mismatch" doesn't matter, this doesn't make formal verification useless in any way or shape, imo.

Getting back to at the "e-commerce refund management" -- you can absolutely have a model that does e.g. a particular database IO call that either succeeds or not. With such a model in place, you can have the rest of your codebase formally verified and know that 'with a properly working database it will always work correctly' [1]. Is that not a very significant and useful finding in and of itself? Would you be more confident in your end-to-end tested software than the above?

Especially that one can then separately test that particular call site as deeply as they want, to determine that the assumed property (it either returns success or fails) is sound.

[1] Given a correct specification, which is not easy to get right

dfabulich•about 1 hour ago

Formal verification is a siren song. The siren sings, "bug-free code is possible in principle!" But it's a trap. Even with LLMs, bug-free code is impractical.

I argued that property-based testing is mostly unhelpful for e-commerce/CRUD apps, and that formal verification is a performance improvement on property-based tests.

In a property-based test, you identify some rule (an invariant) that you want to apply to your code. Then, you fuzz your app, testing it with autogenerated inputs, failing the test if the rule is broken at any point. In formal verification, you prove that the code always satisfies the rule, so you don't have to try millions of inputs.

Whether you're doing property-based testing or formal verification, it's extremely difficult to think of any non-trivial business logic properties that should apply to CRUD apps, even if they could be written in English, translated perfectly into code, and verified formally, instantly.

An actual rule that should always be followed, inflexibly, such that a mathematical proof would be useful (and that actually matters to your business) is so rare in CRUD apps that I'm not sure I've ever seen one.

Even with general-purpose rules (the app should never crash, the app should not leak memory), the property-based fuzzers tend to find bugs that have never happened in production, and probably never will. It's rarely economical for an e-commerce app developer to spend time fixing those bugs, even if finding them cost nothing at all (which is not remotely true, even with LLMs).

And what about UI? Maybe you'd want a rule like: "The title of the product for sale should never overflow its container rectangle in the UI."

OK, well, what if the title is one very long word? But… none of the products you sell happen to contain any words that are 500 characters with no spaces. I guess you could add code to prevent that product from ever being created? (And ensure that data in the database will never allow product titles that violate your business rules… how, exactly?)

Formal verification shines where property-based testing is already useful. It's already useful for many software platforms. It's useful for databases, where reliability is essential. It's useful for parsers, particularly when you expect the end user to be attempting to send you hostile code.

But e-commerce apps? CRUD apps? Not so much.

inaseer•23 minutes ago

Have you looked at model-based testing? One way to think of it as property-based testing for stateful system, though that's underselling it a little. It's surprisingly easy to come up models/specs for most stateful systems, including CRUD apps.

Source: I've modeled a number of CRUD like and non-CRUD like systems through the Accordant framework (https://github.com/microsoft/accordant)

toast0•about 2 hours ago

The first part of formal verification is getting a formal specification. I don't know about most developers, but I rarely get a written specification for anything I work on, and when I do, it's no where near what would be needed to turn it into a formal specification.

Anyway, the specification is subject to change at the whim of a hat, so putting a lot of effort into verifying it is foolish.

I do see value in formal verification of IPC/threading communication primitives (locks, semaphores, queues, whatevs), but then formal verification usually require assumptions for hardware behavior and those aren't always correct, so. But I've never used formal methods outside exposure in an undergrad survey class, so I dunno.

harperlee•about 2 hours ago

I don't know a lot about formal verification, but:

> So you can formally prove that your e-commerce refund management logic is correct, except for proving that you /processed the refund/. You can't even prove anything about recording the refund in your database, say nothing of proving anything about your interactions with your payment processor.

You could say the same thing about the viability of functional programming on a CRUD webapp, but languages like clojure have been used to great effect here. The fact that thera are important, even fundamental, bits that you cannot verify, doesn't take out value from the fact that you can eliminate whole dimensions of issues.

tikhonj•about 1 hour ago

Property based testing is useful for finding bugs even in these kinds of CRUD heavy apps. There can be a surprising number of bugs and unexpected behaviors in the interaction of multiple sub-systems or APIs, and one way to find those bugs is to come up with properties on traces of calls.

For example, I saw a paper on using metamorphic testing (a particular technique for defining properties to test) to find bugs in the web APIs of Spotify and YouTube[1]. I don't have time to reread the paper just now, but I remember that they found weird behavior in pagination of search results. Not a big deal in that particular case, but I've definitely seen internal APIs with behavior that could be similarly wrong but with a much larger real-world impact.

[1]: https://ieeexplore.ieee.org/document/8074764

Personally, I see property-based testing and formal specification more as tools for design and debugging more than full-on correctness. Even with AI models it's still really hard to fully prove something correct, but having even a partial logical specification can let you trade design time for debugging time and lets you find inconsistencies or potential edge-cases when you're initially figuring out a system, rather than waiting until you have a massive codebase to debug and redesign in production.

It's not a panacea and you still have to be careful at the boundary between your nicely modeled system and the real world, but, once I got the hang of working in that style, having some formal properties or partial logical specifications of the behavior I needed has been really nice to have, as much for saving effort as for ensuring correctness.

I've mostly worked in slightly different domains, but I've found property-based testing useful both as a tool to catch bugs but also as a tool for communication. I spent a couple of years building and supporting a supply chain simulation at Target, where I frequently got requests for new metrics from the supply chain planning team. By teaching them how to specify either the whole metric or, at least, some of the expected behaviors of the metric as mathematical properties, it took far fewer back-and-forth conversations to correctly implement the metric in the simulation. We could then test these things by checking these properties over a bunch of random simulation traces. Day-to-day this saved a bunch of time in debugging small simulation mistakes. In the longer-term, having this test suite also let us rewrite the simulation code in a new style in Rust to drastically increase performance. All of this would have been possible without the set of properties, it would have just involved a whole lot more slow, tedious work.

microgpt•about 2 hours ago

If you have a set of axioms that Postgres works as designed, you can prove that your code updates the database. If you define "the refund was processed" to mean "the refunded column of the order is true" you can prove that.

thewillowcat•29 minutes ago

I've been thinking about formal verification a lot, recently. I've dabbled in it before, but it was clear that it was only used by a small research community, and the effort required to verify anything larger than toy code would be immense. I agree with the author that there is enormous potential to use AI to automate the annoying parts of the verification process. What's more, the current security environment, in which the tiniest security flaw can quickly be exploited, suggests that provably secure code might be the future.

Others are correct to point out that formal verification is too difficult to apply to many types of application code. But there are domains where it is applicable today, and the main reason it is not used there is that developers lack the time and know-how. For example, many file format parsers are exploitable, but they are simple enough that they could be formally verified.

JacobAsmuth•about 1 hour ago

Anyone interested in this should check out my Qed project I've been working on, a formally verified web frontend. https://github.com/JacobAsmuth/qed

pron•about 2 hours ago

> It’s no longer just for safety-critical systems with the budget for specialized proof engineers. It’s for anyone who has a property worth proving

... and the budget to pay the AI to prove it.

I have quite a bit of experience with formal verification, but I don't understand the claim made in the article. As an aside, AI's ability to reliably prove the correctness of significantly large programs is still theoretical at this point, but let's assume it's possible. The claim in the article is that writing 10,000 lines of proof to prove a 100-line program was very expensive, and that's why it isn't done. But this increase in cost continues with AI! Whether you pay people to write the proofs or you pay an LLM to write the proof, you still have to pay for it. If I run a software company, saying that "verificaton is the AI's problem" isn't much different from saying, "it's the engineers' problem." Either way I'm not doing the work myself, but I am paying for it.

If the premise is that writing proofs was 100x more expensive than testing, I see nothing in this article to even suggest why it wouldn't still be 100x more expensive when an LLM is doing the work.

(BTW, the reason there aren't many specialised proof engineers is because they aren't in high demand; they're not being paid that much more than other engineers at a similar level)

rurban•about 1 hour ago

> writing 10,000 lines of proof to prove a 100-line program was very expensive, and that's why it wasn't done.

We are not that silly. We are writing compilers (ie model checkers) which translate the source code to formal proofs. No cost at all, you just need to limit loop sizes and function call depths, to keep the cost of the proof down. And then extrapolate the little proof to the general proof.

pron•about 1 hour ago

Whatever the cost multiplier is, I see no reason why that same multiplier won't remain with AI.

Personally, I don't think that picture is quite accurate. Yes, there is a high cost multiplier for small programs, albeit perhaps not so prohibitive. But for large programs, that multiplier is, for most intents and purposes infinite, unless, perhaps, you have experts who know what's worth proving and what is not.

Anyway, I'd like to see that put to the test. Have an LLM write a 50-100KLOC program and prove all correctness properties - with the properties themselves approved by an expert human - and tell us what it cost. A colleague of mine stopped his AI proof experiment when he got an email from some functionary at the company to stop doing what he was doing with the model, because it was costing too much money.

expo98•about 3 hours ago

I had fun in a college class that used Dafny, building a pseudo digital wallet, it wasn't the main focus of the class, so didn't get that much out of it

03284782470•about 2 hours ago

ACM now stooping to the level of clickbait youtubers. Just great.

yunnpp•about 1 hour ago

You don't know jack, that's why you should subscribe to my ACM channel. As for me? I know two Jacks.