Google "certified programming with dependent types", "program logics for certified compilers" and "software foundations class". Also, work through the Dafny tutorials, here: http://rise4fun.com/Dafny/tutorial
There are some ways in which these tools are not economical. There is currently a big gap. On one side of the gap, you have SMT solvers, which have encoded in them decades of institutional knowledge about generating solutions to formula. An SMT solver is filled with tons of "strategies" and "tactics" which are also known has "heuristics" and "hacks" to everyone else. It applies those heuristics, and a few core algorithms, to formula to automatically come up with a solution. This means that the behavior is somewhat unpredictable, sometimes you can have a formula with 60,000 free variables solved in under half a second, sometimes you can have a formula with 10 that takes an hour.
It sucks when that's in your type system, because then your compilation speeds become variable. Additionally, it's difficult to debug why compiling something would be slow (and by slow, I mean sometimes it will "time out" because otherwise it would run forever) because you have to trace through your programming language's variables into the solvers variables. If a solver can say "no, this definitely isn't safe" most tools are smart enough to pull the reasoning for "definitively not safe" out into a path through the program that the programmer can study.
On the other end of the spectrum are tools like coq and why3. They do very little automatically and require you, the programmer, to specify in painstaking detail why your program is okay. For an example of what I mean by "painstaking" the theorem prover coq could say to you "okay, I know that x = y, and that x and y are natural numbers, but what I don't know is if y = x." You have to tell coq what axiom, from already established axioms, will show that x = y implies y = x.
Surely there's room for some compromise, right? Well, this is an active area of research. I am working on projects that try to strike a balance between these two design points, as are many others, but unlike the GP I don't think there's anything to be that excited about yet.
There's a lot of problems with existing work and applying it to the real world. Tools that reason about C programs in coq have a very mature set of libraries/theorems to reason about memory and integer arithmetic but the libraries they use to turn C code into coq data structures can't deal with C code constructs like "switch." Tools that do verification at the SMT level are frequently totally new languages, with limited/no interoperability with existing libraries, and selling those in the real world is hard.
It's unlikely that any of this will change in the near term because the community of people that care enough about their software reliability is very small and modestly funded. Additionally, making usable developer tools is an orthogonal skill from doing original research, and as a researcher, I both am not rewarded for making usable developer tools, and think that making usable developer tools is much, much harder than doing original research.
> This means that the behavior is somewhat unpredictable, sometimes you can have a formula with 60,000 free variables solved in under half a second, sometimes you can have a formula with 10 that takes an hour.
It sadly also depends a lot on the solver used and the way the problem was encoded in SMT. For a class in college I once tried to solve Fillomino puzzles using SMT. I programmed two solutions, one used a SAT encoding of Warshall's algorithm and another constructed spanning trees. One some puzzles the first solver required multiple hours whereas the second only needed a few seconds, while on other puzzles it was the complete opposite. My second encoding needed on hours for a puzzle which I could solve by hand in literally a few seconds. SAT and SMT solvers are extremely cool, but way incredibly unpredictable.
Absolutely! For another example, Z3 changes what heuristics it has and which it prefers to use from version to version. What happens when you keep your compiler the same but use a newer Z3? Researchers that make these tools will flatly tell you not to do that.
It's frustrating because this stuff really works. Making it work probably doesn't have to be hard, but researchers that know both about verification and usability basically don't exist. I blame the CS community's disdain for HCI as a field.
Thanks for taking the time to write the suggestions and detail the pain points that exist at the moment.
I had heard about Dafny but hadn't seen the tutorial!
> Additionally, making usable developer tools is an orthogonal skill from doing original research, and as a researcher, I both am not rewarded for making usable developer tools, and think that making usable developer tools is much, much harder than doing original research.
When you're saying they're orthogonal, are you effectively saying that researchers generally don't have 'strong programming skills' (as far as actually whacking out code). If so, how feasible would it be for someone who is not a researcher, but a good general software engineer, to work on the developer tools side of things?
There's more I could write about researchers and their programming skills, but to keep it brief: researchers aren't directly rewarded for being good programmers. It's possible to have a strong research career without really programming all that much. However, if you are a strong programmer, some things get easier. You aren't directly rewarded for it though. For an extreme counter-example, consider the researchers that improve the mathematics platform SAGE. Their academic departments don't care about the software contributions made and basically just want to see research output, i.e. publications.
I think that this keeps most researchers away from making usable tools. It's hard, they're not rewarded for making software artifacts, they're maybe not as good at it as they are at other things.
I think it's feasible for anyone to work on the developer tools side of things, but I think it's going to be really hard for whoever does it, whatever their background is. There are lots of developer tool development projects that encounter limited traction in the real world because the developers do what make sense for them, and it turns out only 20 other people in the world think like them. The more successful projects I've heard about have a tight coupling between language/tool developers, and active language users. The tool developers come up with ideas, bounce them off the active developers, who then try to use the tools, and give feedback.
This puts the prospective "verification tools developer" in a tight spot, because there are only a few places in the world where that is happening nowadays: Airbus/INRIA, Rockwell Collins, Microsoft Research, NICTA/GD. So if you can get a job in the tools group at one of those places, it seems very feasible! Otherwise, you need to find some community or company that is trying to use verification tools to do something real, and work with them to make their tools better.
Compilers, in particular optimising compilers are notoriously buggy,
see John Regehr's blog. An old dream was to verify them. The great
Robin Milner, who pioneered formal verification (like so much else),
said in his 1972 paper Proving compiler correctness in a mechanized
logic about all the proofs they left out "More than half of the
complete proof has been machine checked, and we anticipte no
difficulty with the reminder". Took a while before X. Leroy filled in
the gaps. I though it would take a lot longer before we would get
something as comprehensive as CakeML, indeed I had predicted this
would only appear around 2025.
It sucks when that's in your type system
Agreed, and that is one of the reasons why type-based approaches to
program verification (beyond simplistic things like
Damas-Hindley-Milner) is not the right approach. Speedy dev tools are
vital. A better approach towards program verification is to go for
division of labour: use lightweight tools like type-inference and
automated testing in early and mid development and do full
verification only when the software and specifications are really
stable in an external system (= program logic with associated tools).
making usable developer tools is much,
much harder than doing original research.
I don't really agree that the main remaining problems are of an UI/UX nature. The problem in program verification is
that ultimately almost everything you want to automate is NP-complete
or worse: typically (and simplifying a great deal) you need to show A
-> B where A is something you get from the program (e.g. weakest
pre-condition, or characteristic formula), and B is the specification.
In the best case, deciding formulae like A -> B is NP-complete, but
typically much worse. Moreover, program verification of non-trivial
programs seems to trigger the hard cases of those NP-complete (or
worse) problems naturally. Add to that the large size of the involved
formulae (due to large programs), you have a major theoretical problem
at hand, e.g. solve SAT in n^4, or find a really fast approximation
algorithm. That's unlikely to happen any time soon.
We don't even know how effectively to parallelise SAT, or to make SAT
fast on GPUs. Which is embarrassing, given how much of deep
learning's recent successes boil down to gigantic parallel computation
at Google scale. Showing that SAT is intrinsically not parallelisable,
or even just GPUable (should either be true), looks like a difficult
theoretical problem .
as a researcher, I both
am not rewarded for
I agree. But for the reasons outlined above, that is right: polishing UI/UX is something the commercial space can and should do.
he community of people that care enough
about their software reliability is very
small and modestly funded.
This is really the key problem.
Industry doesn't really care about program correctness that much
(safe for some niche markets). The VCs of my acquaintance always tell me: "we'll fund your verification ideas when you can point to somebody who's already making money with verification". For most applications type-checking and solid testing can
get you to "sufficiently correct".
You can think of program correctness like the speed of light. You can
get arbitrarily close but the closer you get the more energy (cost)
you have to expend. Type-checking and a good test suite already catch
most of the low-handing fruit that the likes of Facebook and Twitter need to worry about . As of 2017, for all but the most safety
critical programs, the cost of dealing with the remaining problems
does is disproportionate in comparison with the benefits. Instagram or
Whatsapp or Gmail are good enough already despite not been formally
verified.
Cost/benefit will change only if the cost of formal verification is
brought down, or the legal frameworks (liability laws) are changed so that software
producers have to pay for faulty software (even when it's not an
Airbus A350 autopilot).
I know that some verification companies are lobbying governments for such legislative changes. Whether that's a good think, regulatory capture or something in-between, is an interesting question.
Another dimension of the problem is developer education. Most (>99%) of contemporary programmers lack the necessary
background in logic even to think properly about program
correctness. Just ask the average programmer about loop invariants
and termination order. They won't be able to do this even for 3-line
programs like GCD. This is not a surprise as there is no industry demand for this kind of knowledge, and will probably change with a change in demand.
Thanks for the long reply! I don't have the time continue this. I mostly agree with what you say.
I do think that making verification tools easier is something that researchers could and should be thinking about. Probably not verification and logic researchers directly, but someone should be carefully thinking about it and systematically exploring how we can look at our programs and decide they do what we want them to do. I have some hope for the DeepSpec project to at least start us down that path.
I also have hope for type-level approaches where the typechecking algorithms are predictable enough to avoid the "Z3 in my type system" problem but expressive enough that you can get useful properties out of them. I think this is also a careful design space and another place that researchers lose because they don't think about usability. They just say "oh, well I'll generate these complicated SMT statements and discharge them with Z3, all it needs to work on are the programs for my paper and if it doesn't work on one of them, I'll find one it does work on and swap it out for that one." Why would you make a less expressive system if usability wasn't one of your goals?
I'd be interested in your (brief if you have not time) suggestions what kind of novel interfaces you have in mind. I have spent too much time on the command line to have any creative thoughts about radically different interfaces for verification.
Why isn't it economical yet?