I'm not sure if the author is here, or if my comment attempt was successful. So, can I suggest you take a look at a third leg of the formal methods stool?
If you are familiar with C, check out Frama-C (https://frama-c.com/) and the WP and RTE plugins. The approach is based on Tony Hoare and EWD's axiomatic semantics (https://en.wikipedia.org/wiki/Hoare_logic). It does not have a good memory management story, as far as I know, but is very good for demonstrating value correctness (RTE automatically generates assertions for numeric runtime errors, for example) and many memory errors.
If you are familiar with Ada, check out SPARK (https://www.adacore.com/about-spark), which is similar to Frama-C but has a much better interface in the AdaCore GNAT toolkit and IDE.
Both work similarly: Assertions in normal Ada or C code as well as the code itself are translated into SMT statements and fed to a SMT solver to find counterexamples---errors.
If you are not familiar with Ada or C, Dafny (https://dafny.org/) is another option based on .NET and devoleped at Microsoft. It seems nigh-on perfect for this approach. (The language uses a garbage collector.) At the time I was looking, there was little documentation on Dafny, but that seems to have improved.
Sorry to hear you had issues with my comment system. Comments are invisible until I manually approve them, but I'm not seeing anything new in the database right now. Did you get any sort of error code? The system should let you know if e.g., the answer to the captcha was wrong.
I do mention SPARK at the end of the article. I'm not familiar with Frama-C so I'll check that and your blog posts out. Thanks for sharing!
Sorry I missed the mention of SPARK. TLA+, although I haven't fiddled with it as much, is more similar to Alloy than anything else. SPARK actually works on Ada code.
If I had complete enough specifications for any interesting program that formal methods would be interesting, I think I would just be easier to hit compile on that spec.
Yes, there are a subset of interesting bugs related to edge cases, but the real problematic bugs are when what the software does does not match what was expected, even if nobody really could articulate what was expected in the first place. The art of software engineering isn't the coding, it is asking enough questions to know what the actual problem to be solved is.
You don't need to prove everything against a complete spec, you can prove that things have particular properties you care about, from higher level behaviors like user data can only be accessed by the user down to there are no out of bounds array accesses.
You could prove some complex data-structure to be a correct implementation of some logical data-structure and then prove that your service won't deadlock.
There are a ton of interesting, impactful properties to prove that aren't "create a complete specification and prove the implementation refines it", and even that usage I think is more impactful for removing bugs and vulnerabilities than you're giving it credit for - even something as simple as sorting, Java had a bug in their Timsort implementation for a decade which was discovered when trying to verify it with formal methods: http://envisage-project.eu/wp-content/uploads/2015/02/sortin... ). Note too that if you compiled the simplest/easiest-to-understand spec for something like sorting you'd get something like bubble sort, not Timsort, though you are much more likely to want the latter.
I think the interesting part isn't in a single component's edge cases at all, but in how complex webs of many, many things interact.
I've yet to see a hard debugging problem that didn't boil down to something like, in the best of possible worlds:
"component A expected something to meet promise FOO when it calls component B, because that's obvious, right?" combined with
"component B just passes requests and routes results, dude, component C is messed up" combined with
"component C never promised that behavior and if you look at our spec it even says so!"
That's an example of your "does not match what was expected" case.
I think most of formal methods is an academic waste of time. Whether they take a more formal or a more practical approach, I see high value in contract tests and data flow analysis because they're more likely to catch the bugs that cross Conway's code/org boundaries.
What would be helpful is if formal methods were powerful enough to be able to prove contracts and assertions about expectations and invariants; however, all formal systems I've experienced are not powerful enough to do this at scale in any interesting program, not without spending way more time reconstructing the entire system in the formal world, with the attendant likelihood of introducing spec bugs in that reconstruction.
It is entirely different in the world of hardware, where the specification in many cases is the RTL and the thing being proved is the, supposedly identical, physical implementation.
That's actually what many formal tools excel at. Of course, you have to do have to define "never promised that behavior" which is extra work that nobody wants to do.
Yes. That's "design by contract", which is extremely useful for determining who's at fault.
This came out of aerospace, where it's taken seriously. If A won't interoperate with B, you check the interface spec. If A is out of compliance, A has to be fixed. If B is out of compliance, B has to be fixed. If you can't tell, the interface spec has to be fixed. This is why you can unplug Pratt and Whitney engines from an airplane and plug in Rolls-Royce engines.
This worked in the era when the buyer had more clout than the seller. Now, if A won't interoperate with B, it's the problem of whomever is smaller.
> This is why you can unplug Pratt and Whitney engines from an airplane and plug in Rolls-Royce engines.
Wait wut? Are you talking about their embedded software or the hardware itself? Because if you're talking about the hardware, I'm about to be dumbfounded that there are formal verifications of hardware, my mind is about to be blown.
You could think of many of the formal tools outside of dependently typed programming as design by contract plus a thingy that complains if the contract doesn't work.
This kind of, “Just what the heck in this pile of parts is impacting the other stuff and how is the other stuff impacting this part?” was precisely the angle we took with our V&V tools.
formal methods has value in safety related software. When you are developing software for airplane, rail, automobile and nuclear power station, any tools that might improve safety merit consideration.
One advantage of formal methods is in determining "what was expected" (including all the goofy edge cases) without having to burrow into the details of code.
> I think I would just be easier to hit compile on that spec
I have a hard time not being similarly skeptical about formal methods. They promise the ultimate holy grail of software development - release software with zero defects - but demand what appears to be years of study before you actually see results. On the one hand, there's lots of examples of techniques and tactics that promise amazing results if you "put in the work", even outside of software (like fitness and music) - but in over 30 years of software development practice, I've never met anybody who was applying formal methods at all, much less to achieve defect-free software. If it could deliver what it promised even with a "mere" five years of dedicated study, I'd expect it to be more widespread by now.
As someone who was working on this decades ago [1][2], here's a useful recap.
- Back then, CPU power for the prover was a big problem. That's been fixed.
We were too early in the days of a 1 MIPS VAX.
- Prover theory is much better. We were using the Oppen-Nelson prover, the original SAT system. Now that's routine technology.
- Loose languages are a problem. You have to nail down all "undefined behavior", and
either detect it and forbid it, or give it precise semantics.
- Integrate the verification annotation into the programming language.
Not as comments. It should be at least syntax and type checked when the
program is compiled, and it should use essentially the same syntax as the language to which it is attached. Otherwise the annotations get out of date and nobody uses them.
- Break big assertions into lots of little assertions. Don't AND them. This improves
the diagnostics.
- Debug the verification in the source language, by adding asserts. The
verification system usually treats asserts as a proof goal, and assumes they are true for
the code that follows. You narrow down the problem into a small area in that way.
If you need to prove something hard, get to the point where you have "assert(a); assert(b);", and need to prove that A implies B. Then you use an offline prover.
Don't work on the code problems in a prover directly.
- You're not done until you get 100%. If you write "assert(false);", there is only one
error but the verification is totally meaningless.
- Don't get carried away with the formalism. Verification systems tend to be built
by people who think formalism is cool. That is a negative for getting work done.
- Undecidability and the halting problem are not issues. If your program is anywhere
near undecidable, it's broken. Microsoft took the position with their Static Driver Verifier that if the verifier can't decide termination easily, it doesn't get to be a signed kernel driver.
- Some things are hard to specify, and some things aren't. A database is an example of
a complicated system that's not too hard to specify. The specification is a full table
search of giant arrays. The implementation doesn't do it that way, but it's supposed to
behave as if it does. The other extreme would be a GUI program.
Can a button be formally verified (or failed), if due to poor graphic design it may or may not look like a button?
Another "other extreme": create the source for a new database manager that formally matches the behavior of another database manager, that you only have object code for. :)
Years of study are definitely not required to get up and running! I attended a 3 day workshop for TLA+ and spent ~2 weeks modeling a new project that my team was working on. I found lots of potential edge cases that would result in data inconsistencies without even having to write tests or think about how to articulate the issue in the first place, as the model checker is able to output the exact program flow that results in the error.
You still need to make sure your implementation matches the spec, but that’s an easier task than squashing concurrency bugs, let alone figuring out how to repro in the first place! The hope is that by writing the spec you avoid a good chunk of errors from the start instead of encountering them one by one.
I have met tons of folks who have used this in practice. It tends more applied in domains where errors can be costly. E.g., this won't be that much useful in the JS/NodeJS.
You might already be using proven correct software somewhere in your stack without knowing it. You most certainly will be trusting your life with it flying Airbus.
When the latest incarnation of ChatGPT was out, I had the same thought. However, I then became convinced there might be some synergy between both.
Imagine a programming system where you can specify little components using Hoare or Separation Logic, i.e. pre/postconditions. An engineer could spend his effort specifying little components, but code and proof generation can be handled by GPT. Actually, many tasks can be offloaded to a SAT/SMT solver. Lots of verification and synthesis efforts already do the last bit, offloading to SAT/SMT solver.
This would be a win-win. Formal methods would become cheaper to use, and software would become built using rigorous engineering.
It’s a repeat of search & hope/trust (Google) vs define and query (semantic web) and yes they are winning and will win. Fine tuning your scope for extreme accuracy might be less efficient a strategy than fine tuning your loader for extreme speed
In most situations, a half-assed answer quickly will a beat a good answer slowly.
Most.
""Software should be reliable.""
Reliability is not usually worth the effort. Most software is not used enough, or not important enough, to make any effort worthwhile. In almost all systems, errors are created at every step and propagate through the system routinely without causing problems.
Ship early, ship often. First to ship wins. Flood the zone.
Often phrased as "worse is better."
Like, hypothetically, imagine you spent two whole weeks designing and implementing a new programming language, which was guaranteed to be installed on millions of computers overnight. Any language which took three weeks to finish, like wasting a week testing, would be DOA.
Like how the (brood parasite) cuckoo chick hatches early and then murders all the other chicks.
But when the robot uprising comes our killbots will be unstoppable terminators and their killbots will explode the first time they encounter a set of stairs.
There are several different niches, and HW formal is likely using different tools and methods than SW formal, though there is overlap. Anyways there is room for a wide range of experience/skills/background in a variety of different areas.
Also, the main conference in the area is FMCAD, you can find a lot of related work there. More recently the conference has moved more towards software, but if you look at some of the older proceedings you can find a lot of hardware related stuff.
There is decent amount of overlap between formal for hardware and software, so if you study one you will likely have enough background to get started in the other.
Not at apple: The following is about 10 years out of date, would be interesting to know what's changed:
10 years ago formal had just started to be viable in run of the mill semiconductor verification (as opposed to throwing a bunch of specialist PhD grads at the problem); Jasper Gold was the first tool I heard of that worked well enough. pre-formal, the approach was to write a bunch of tests with input stimulators and validity checkers and/or assertions, and then work on the stimulators until you reached 100%coverage (signing off the cases that couldn't happen). That's not just line coverage, but each branch of an if, each component of a boolean expression must toggle, each bit of a signal must toggle, etc.
Reaching 100% coverage was a tedious exercise so the first use of formal was as a sumplement, to find the last,most difficult test cases that completed your coverage (or prove that the cases were impossible). The next use was to try and prove directly that assertions or test failures could not occur, or provide a counterexample. this is all based on the tool being able to read low level verilog or systemC code.
What it couldn't do at the time, which would be interesting to know if it can now, is as follows: Often, you actually have a higher level model of the behavior of your system. The RTL code is written to have the same behavior (manually, because it takes expertise to choose RTL code that will work efficiently). Ideally, provers would simply prove that your RTL has the same behavior.
I wonder if AI systems like ChatGPT will be helpful in this area. Something that’s able to track all requirements of a system and understands the code enough to validate it against the requirements.
I've had the same thought. Formally proven code can give powerful security assurances (cf. Project Everest[1]), but it's also very, very labor-intensive. I've heard rules of thumb like, "100x as much time to prove the software correct as to write it in the first place."
If LLM systems are going to give us a virtual army of programmers, I think formally proving more systems software (device drivers, browser engines, etc.) would be a great use of their time.
CompCert is a proven correct C compiler proven correct by a very small team in a few years. Used by Airbus for avionics software. GCC after 25 years of work still has bugs.
Yeah, I'm cautiously excited about how AI and FM might work together. I don't think LLM's can ever be trusted to verify programs itself, but anything which can reduce the annotation overhead for programmers is a super useful thing!
I don't think a pure LLM approach will work (or maybe it will, who knows?). I am more thinking of a hybrid that combines LLM or other AI with some hardcoded knowledge for this domain
Most definitely. I've been playing with using ChatGPT to generate proof texts in Isabelle/HOL, since it lets me verify the correctness of the output before code generation.
To those wondering if there will be a successful "marriage" between popular incarnations of AI, like ChatGPT, and FM: no, at least not in the sense you would expect.
Sure, ChatGPT will successfully manage to find formal proofs that have been done before.
But synthesising proofs from scratch is NP-complete for the simplest proofs (or worse for more interesting cases), and LLMs are simply not clever enough to do it (except in cases where the proof has been seen before).
It could be possible to integrate an SMT solver with a LLM to make this work, but the core difficulty of the FM task will then be solved by the SMT solver.
When writing about a new topic (e.g., formal methods), it's generally better to keep other aspects of the presentation (e.g., code samples) simple and familiar to the reader.
Would have preferred the code samples to not be in Lisp (ostensibly also new to the reader, if formal methods are), and also more sparingly used.
I've always thought theorem proving to be about proving equality of the function and the result.
Lean to me is not only obscure but unapproachable by someone from a non-mathematical background.
Take this snippet for instance
theorem nat.add_comm : ∀ (n m : ℕ), n + m = m + n :=
λ (n m : ℕ),
nat.brec_on m
(λ (m : ℕ) (_F : nat.below (λ (m : ℕ), ∀ (n : ℕ), n + m = m + n) m) (n : ℕ),
nat.cases_on m
(λ (_F : nat.below (λ (m : ℕ), ∀ (n : ℕ), n + m = m + n) 0),
id_rhs (n + 0 = 0 + n) (eq.symm (nat.zero_add n)))
(λ (m : ℕ) (_F : nat.below (λ (m : ℕ), ∀ (n : ℕ), n + m = m + n) (nat.succ m)),
id_rhs (n + (m + 1) = nat.succ m + n) (eq.symm (nat.succ_add m n) ▸ congr_arg nat.succ (_F.fst.fst n)))
_F)
n
If we want to democratise formal methods and theorem proving, we should lean towards natural language expressions of proof and not mathematical expressions and greek characters.
That snippet is the result of printing out the internal structure of the proof, not how the proof was actually written. You'll notice that what a human actually wrote and would normally read is the code block mentioned directly above in TFA, which is much more sensible:
lemma nat.add_comm : ∀ n m : ℕ, n + m = m + n
| n 0 := eq.symm (nat.zero_add n)
| n (m+1) :=
suffices succ (n + m) = succ (m + n), from
eq.symm (succ_add m n) ▸ this,
congr_arg succ (add_comm n m)
Also, this example proof is explicitly about math (the commutativity of addition on natural numbers), so it's appropriate and not surprising that it uses mathematical notation.
I don't think you have any chance of approaching formal methods if the notation is a stumbling block.
The difficulty and verbosity of formal proofs is huge even with the very concise mathematical notation. And learning the notation itself is a very small stumbling block compared to learning the actual theorems and logic that you need to use.
So while I am not a fan of mathematical notation for general programming (which is often quite un-mathematical in fact) formal proofs are a place where this notation absolutely shines. I can't even imagine how much uglier the proof above would become if you had to replace `∀ (n : ℕ)` with something like `for any natural number n`.
I sympathize with your view, I genuinely do, but to write out formal methods using natural language expressions results in quite a few problems. For one, it results in incredibly long proofs, like this simple proof written out in natural language would take up pages.
Second, it makes it harder to identify patterns. Writing things out in a formalism lets you identify syntactic patterns and employ techniques like factoring, cancelling, seeing symmetries and other purely syntactic properties. When you write things out in natural language you lose the ability to see these things clearly. Being able to see purely formal syntactic properties helps reinforce your confidence in a proof or can help you identify potential flaws in it. It's like with programming styles and conventions, one thing I tell developers is an important reason for choosing one style over another is if it makes wrong code look wrong visually.
Third, and this is just a personal matter... it results in a type of culture of writing that I find is often pretentious and full of unnecessary baggage. When I read an academic paper I cringe at how bad most writing is and try my best to skip to the formal part of the paper.
Something else important to note: this proof of commutativity of addition on natural numbers is definitely _not_ how you normally write these proofs in Lean.
Here is a closer approximation to how someone would actually write the proof in Lean:
lemma add_comm (a b : ℕ) : a + b = b + a :=
begin
induction a with a IH,
rw (add_zero b),
rw (zero_add b),
trivial,
rw (add_succ b a),
rw (succ_add a b),
rw IH,
trivial
end
No, you can just type `\nat` and the Lean extension in VScode will turn it into `ℕ`. Similarly, you can type many LaTeX macros, and the will render in unicode. Examples: `\times` becomes `×` and `\to` becomes `→`, etc...
/*@
// There is a partial match of the needle at location loc in the
// haystack, of length len.
predicate partial_match_at(char *needle, char *haystack, int loc, int len) =
\forall int i; 0 <= i < len ==> needle[i] == haystack[loc + i];
*/
/*@
// There is a complete match of the needle at location loc in the
// haystack.
predicate match_at(char *needle, int n, char *haystack, int h, int loc) =
\valid(needle + (0 .. n-1)) && 0 <= n < INT_MAX &&
\valid(haystack + (0 .. h-1)) && 0 <= h < INT_MAX &&
n <= h && loc <= h - n &&
partial_match_at(needle, haystack, loc, n);
*/
/*@
requires \valid(needle + (0 .. n-1)) && 0 <= n < INT_MAX;
requires \valid(haystack + (0 .. h-1)) && 0 <= h < INT_MAX;
requires n <= h;
assigns \nothing;
ensures -1 <= \result <= (h-n);
behavior success:
ensures \result >= 0 ==> match_at(needle, n, haystack, h, \result);
behavior failure:
ensures \result == -1 ==>
\forall int i; 0 <= i < h ==>
!match_at(needle, n, haystack, h, i);
*/
int
brute_force (char *needle, int n, char *haystack, int h)
{
int i, j;
/*@
loop assigns i, j;
loop invariant 0 <= i <= (h-n) + 1;
loop invariant \forall int k; 0 <= k < i ==>
!match_at(needle, n, haystack, h, k);
*/
for (i = 0; i <= h - n; ++i) {
/*@
loop assigns j;
loop invariant 0 <= j <= n;
loop invariant partial_match_at(needle, haystack, i, j);
*/
for (j = 0; j < n && needle[j] == haystack[j + i]; ++j);
if (j >= n) {
return i;
}
}
return -1;
}
One thing which perhaps is not captured well in the article, is that
rather unlike when reading/writing source code, where stepping through lines of code is a relatively uncommon task only done debugging. For interactive theorem provers it is a given. He discusses the bottom up & top-down approach to the solving goals with tactics & otherwise, but typically in both cases you will be stepping through expressions to get a better understanding. They generally aren't written to be read from just the source text alone.
> Lean to me is not only obscure but unapproachable by someone from a non-mathematical background. ... If we want to democratise formal methods and theorem proving ...
For whom is that a goal? Are there really that many individuals at the intersection of "I need to use a theorem prover" and "I don't have formal mathematical training"?
If you are familiar with C, check out Frama-C (https://frama-c.com/) and the WP and RTE plugins. The approach is based on Tony Hoare and EWD's axiomatic semantics (https://en.wikipedia.org/wiki/Hoare_logic). It does not have a good memory management story, as far as I know, but is very good for demonstrating value correctness (RTE automatically generates assertions for numeric runtime errors, for example) and many memory errors.
If you are familiar with Ada, check out SPARK (https://www.adacore.com/about-spark), which is similar to Frama-C but has a much better interface in the AdaCore GNAT toolkit and IDE.
Both work similarly: Assertions in normal Ada or C code as well as the code itself are translated into SMT statements and fed to a SMT solver to find counterexamples---errors.
I have some blog posts from several years ago about Frama-C:https://maniagnosis.crsr.net/tags/applied%20formal%20logic.h... (And I really should get back into it; it's a lot of fun.)
If you are not familiar with Ada or C, Dafny (https://dafny.org/) is another option based on .NET and devoleped at Microsoft. It seems nigh-on perfect for this approach. (The language uses a garbage collector.) At the time I was looking, there was little documentation on Dafny, but that seems to have improved.