Very little on abstract interpretation and SAT/SMT solving. I'm effectively star...

munin · on Feb 5, 2017

Google "certified programming with dependent types", "program logics for certified compilers" and "software foundations class". Also, work through the Dafny tutorials, here: http://rise4fun.com/Dafny/tutorial

There are some ways in which these tools are not economical. There is currently a big gap. On one side of the gap, you have SMT solvers, which have encoded in them decades of institutional knowledge about generating solutions to formula. An SMT solver is filled with tons of "strategies" and "tactics" which are also known has "heuristics" and "hacks" to everyone else. It applies those heuristics, and a few core algorithms, to formula to automatically come up with a solution. This means that the behavior is somewhat unpredictable, sometimes you can have a formula with 60,000 free variables solved in under half a second, sometimes you can have a formula with 10 that takes an hour.

It sucks when that's in your type system, because then your compilation speeds become variable. Additionally, it's difficult to debug why compiling something would be slow (and by slow, I mean sometimes it will "time out" because otherwise it would run forever) because you have to trace through your programming language's variables into the solvers variables. If a solver can say "no, this definitely isn't safe" most tools are smart enough to pull the reasoning for "definitively not safe" out into a path through the program that the programmer can study.

On the other end of the spectrum are tools like coq and why3. They do very little automatically and require you, the programmer, to specify in painstaking detail why your program is okay. For an example of what I mean by "painstaking" the theorem prover coq could say to you "okay, I know that x = y, and that x and y are natural numbers, but what I don't know is if y = x." You have to tell coq what axiom, from already established axioms, will show that x = y implies y = x.

Surely there's room for some compromise, right? Well, this is an active area of research. I am working on projects that try to strike a balance between these two design points, as are many others, but unlike the GP I don't think there's anything to be that excited about yet.

There's a lot of problems with existing work and applying it to the real world. Tools that reason about C programs in coq have a very mature set of libraries/theorems to reason about memory and integer arithmetic but the libraries they use to turn C code into coq data structures can't deal with C code constructs like "switch." Tools that do verification at the SMT level are frequently totally new languages, with limited/no interoperability with existing libraries, and selling those in the real world is hard.

It's unlikely that any of this will change in the near term because the community of people that care enough about their software reliability is very small and modestly funded. Additionally, making usable developer tools is an orthogonal skill from doing original research, and as a researcher, I both am not rewarded for making usable developer tools, and think that making usable developer tools is much, much harder than doing original research.

solidangle · on Feb 5, 2017

> This means that the behavior is somewhat unpredictable, sometimes you can have a formula with 60,000 free variables solved in under half a second, sometimes you can have a formula with 10 that takes an hour.

It sadly also depends a lot on the solver used and the way the problem was encoded in SMT. For a class in college I once tried to solve Fillomino puzzles using SMT. I programmed two solutions, one used a SAT encoding of Warshall's algorithm and another constructed spanning trees. One some puzzles the first solver required multiple hours whereas the second only needed a few seconds, while on other puzzles it was the complete opposite. My second encoding needed on hours for a puzzle which I could solve by hand in literally a few seconds. SAT and SMT solvers are extremely cool, but way incredibly unpredictable.

munin · on Feb 5, 2017

Absolutely! For another example, Z3 changes what heuristics it has and which it prefers to use from version to version. What happens when you keep your compiler the same but use a newer Z3? Researchers that make these tools will flatly tell you not to do that.

It's frustrating because this stuff really works. Making it work probably doesn't have to be hard, but researchers that know both about verification and usability basically don't exist. I blame the CS community's disdain for HCI as a field.

dorian-graph · on Feb 5, 2017

Thanks for taking the time to write the suggestions and detail the pain points that exist at the moment.

I had heard about Dafny but hadn't seen the tutorial!

> Additionally, making usable developer tools is an orthogonal skill from doing original research, and as a researcher, I both am not rewarded for making usable developer tools, and think that making usable developer tools is much, much harder than doing original research.

When you're saying they're orthogonal, are you effectively saying that researchers generally don't have 'strong programming skills' (as far as actually whacking out code). If so, how feasible would it be for someone who is not a researcher, but a good general software engineer, to work on the developer tools side of things?

munin · on Feb 5, 2017

There's more I could write about researchers and their programming skills, but to keep it brief: researchers aren't directly rewarded for being good programmers. It's possible to have a strong research career without really programming all that much. However, if you are a strong programmer, some things get easier. You aren't directly rewarded for it though. For an extreme counter-example, consider the researchers that improve the mathematics platform SAGE. Their academic departments don't care about the software contributions made and basically just want to see research output, i.e. publications.

I think that this keeps most researchers away from making usable tools. It's hard, they're not rewarded for making software artifacts, they're maybe not as good at it as they are at other things.

I think it's feasible for anyone to work on the developer tools side of things, but I think it's going to be really hard for whoever does it, whatever their background is. There are lots of developer tool development projects that encounter limited traction in the real world because the developers do what make sense for them, and it turns out only 20 other people in the world think like them. The more successful projects I've heard about have a tight coupling between language/tool developers, and active language users. The tool developers come up with ideas, bounce them off the active developers, who then try to use the tools, and give feedback.

This puts the prospective "verification tools developer" in a tight spot, because there are only a few places in the world where that is happening nowadays: Airbus/INRIA, Rockwell Collins, Microsoft Research, NICTA/GD. So if you can get a job in the tools group at one of those places, it seems very feasible! Otherwise, you need to find some community or company that is trying to use verification tools to do something real, and work with them to make their tools better.

mafribe · on Feb 6, 2017

Maybe I'm particularly easy to make happy.

Compilers, in particular optimising compilers are notoriously buggy, see John Regehr's blog. An old dream was to verify them. The great Robin Milner, who pioneered formal verification (like so much else), said in his 1972 paper Proving compiler correctness in a mechanized logic about all the proofs they left out "More than half of the complete proof has been machine checked, and we anticipte no difficulty with the reminder". Took a while before X. Leroy filled in the gaps. I though it would take a lot longer before we would get something as comprehensive as CakeML, indeed I had predicted this would only appear around 2025.

   It sucks when that's in your type system

Agreed, and that is one of the reasons why type-based approaches to program verification (beyond simplistic things like Damas-Hindley-Milner) is not the right approach. Speedy dev tools are vital. A better approach towards program verification is to go for division of labour: use lightweight tools like type-inference and automated testing in early and mid development and do full verification only when the software and specifications are really stable in an external system (= program logic with associated tools).

   making usable developer tools is much, 
   much harder than doing original research.

I don't really agree that the main remaining problems are of an UI/UX nature. The problem in program verification is that ultimately almost everything you want to automate is NP-complete or worse: typically (and simplifying a great deal) you need to show A -> B where A is something you get from the program (e.g. weakest pre-condition, or characteristic formula), and B is the specification.

In the best case, deciding formulae like A -> B is NP-complete, but typically much worse. Moreover, program verification of non-trivial programs seems to trigger the hard cases of those NP-complete (or worse) problems naturally. Add to that the large size of the involved formulae (due to large programs), you have a major theoretical problem at hand, e.g. solve SAT in n^4, or find a really fast approximation algorithm. That's unlikely to happen any time soon.

We don't even know how effectively to parallelise SAT, or to make SAT fast on GPUs. Which is embarrassing, given how much of deep learning's recent successes boil down to gigantic parallel computation at Google scale. Showing that SAT is intrinsically not parallelisable, or even just GPUable (should either be true), looks like a difficult theoretical problem .

    as a researcher, I both 
    am not rewarded for

I agree. But for the reasons outlined above, that is right: polishing UI/UX is something the commercial space can and should do.

   he community of people that care enough 
   about their software reliability is very 
   small and modestly funded.

This is really the key problem. Industry doesn't really care about program correctness that much (safe for some niche markets). The VCs of my acquaintance always tell me: "we'll fund your verification ideas when you can point to somebody who's already making money with verification". For most applications type-checking and solid testing can get you to "sufficiently correct".

You can think of program correctness like the speed of light. You can get arbitrarily close but the closer you get the more energy (cost) you have to expend. Type-checking and a good test suite already catch most of the low-handing fruit that the likes of Facebook and Twitter need to worry about . As of 2017, for all but the most safety critical programs, the cost of dealing with the remaining problems does is disproportionate in comparison with the benefits. Instagram or Whatsapp or Gmail are good enough already despite not been formally verified.

Cost/benefit will change only if the cost of formal verification is brought down, or the legal frameworks (liability laws) are changed so that software producers have to pay for faulty software (even when it's not an Airbus A350 autopilot). I know that some verification companies are lobbying governments for such legislative changes. Whether that's a good think, regulatory capture or something in-between, is an interesting question.

Another dimension of the problem is developer education. Most (>99%) of contemporary programmers lack the necessary background in logic even to think properly about program correctness. Just ask the average programmer about loop invariants and termination order. They won't be able to do this even for 3-line programs like GCD. This is not a surprise as there is no industry demand for this kind of knowledge, and will probably change with a change in demand.

munin · on Feb 6, 2017

Thanks for the long reply! I don't have the time continue this. I mostly agree with what you say.

I do think that making verification tools easier is something that researchers could and should be thinking about. Probably not verification and logic researchers directly, but someone should be carefully thinking about it and systematically exploring how we can look at our programs and decide they do what we want them to do. I have some hope for the DeepSpec project to at least start us down that path.

I also have hope for type-level approaches where the typechecking algorithms are predictable enough to avoid the "Z3 in my type system" problem but expressive enough that you can get useful properties out of them. I think this is also a careful design space and another place that researchers lose because they don't think about usability. They just say "oh, well I'll generate these complicated SMT statements and discharge them with Z3, all it needs to work on are the programs for my paper and if it doesn't work on one of them, I'll find one it does work on and swap it out for that one." Why would you make a less expressive system if usability wasn't one of your goals?

mafribe · on Feb 7, 2017

I'd be interested in your (brief if you have not time) suggestions what kind of novel interfaces you have in mind. I have spent too much time on the command line to have any creative thoughts about radically different interfaces for verification.