I am really enjoying lean 3 as a theorem prover; lean 4 as a programming language is a really intriguing/enticing prospect. The "functional but in place" paradigm is something quite new, as far as I can tell. For reference, see the "Counting Immutable Beans" paper[1] by the designers of the language, that details how the runtime makes use of reference counts to do destructive updates in pure code whenever possible.
Lean has an array type, which the docs say is implemented much like a C++ vector or Rust Vec. But data types in functional programming languages all expose an immutable interface, so what happens when someone changes the list in two different ways? Is a new copy of the array made, is some variant of a persistent array used, or does the program just fail to compile?
Thanks. I assumed that option would be a major footgun (since accidental copies can be very expensive) but if it works for Swift, it can't be that bad.
A similar-but-different approach exists in (largely Dutch) research around uniqueness typing, which uses static analysis guarantees to allow for destructive updates wherever the compiler can prove something is uniquely referenced. To my knowledge, efforts have been made to integrate that approach into GHC.
I think the novel thing is that this is doing it dynamically, which bypasses the static analysis and makes it applicable to situations where static analysis wouldn't be useful.
As I understand it, it does some static analysis still. The big idea with the Perseus algorithm, I think, is that instead of lending a reference at a function call, it is given. This leads to numerous optimizations where memory can statically be reused. They still do dynamic checks that an object has only one reference, too, when it's not statically known.
Are you referring to Linear Haskell[0] or some other effort? Linear Haskell can definitely express destructive updates, but IIRC it needs unsightly stuff like constructors which take continuations and method signatures along the lines of
length : Array a -o (Array a, Unrestricted Int)
which have to thread the unique value through them, since there's no notion of borrowing.
I'm talking about uniqueness typing. It's similar to linear types, but not the same. If you search the term, you can find a few research papers from Dutch universities, particularly around Clean, a programming languages which uses uniqueness typing instead of monads for IO.
I can't claim to know the details, but my understanding is that the interaction of linear typing with dependent types is an open research problem (see the quantitative type theory papers, for example). It's also a burden on the user rather than the runtime, so it's a tradeoff.
Is it still a big burden if the theorem prover can automate checking that a function can be evaluated without ref counting / garbage collector? I understand that it's a very hard problem though.
I'm trying to understand why dependent types can't represent linearity since you can model Nat at type level in dependent types. Dependent types can do capability modeling at the type level and linearity (or affinity) seems like a capability. From this thread (https://www.reddit.com/r/haskell/comments/20k4ei/do_dependen...) it seems like it can be done. I'm looking for something more formal to explain the orthogonality (or simulatability) of linear types inside a dependent type system.
From my understanding, dependent types can model linearity, but you'd need to use that model type system rather than the 'native' type system (similar to how, for example, Bash can model OOP (e.g. using a mixture of naming conventions, associative arrays, eval, etc.), but isn't natively OOP). If we go down that route, we're essentially building our own programming language, which is inherently incompatible with the dependently-typed language we've used to create it (in particular: functions in the underlying language cannot be used in our linear-types language, since they have no linearity constraints).
A common example is a file handle: I can prove that the 'close' function will produce a closed handle, e.g.
close : Handle Open -> Handle Closed
I can prove that those handles point to the same file:
close : (f: Filename) -> Handle f Open -> Handle f Closed
I can prove arbitrary relationships between the input values and the output value. Yet nothing I do restrict the structure of a term, like how many times a variable is used, to prevent e.g.
foo h = snd (close h, readLine h)
For that sort of control, we need to build the ability to track things into the term language itself; either by adding linear types (or equivalent) to the language, or building a new incompatible language on top (as I mentioned above).
Cool. Actually I should know more about this since I took a lecture series with Krishnaswami on this topic. He combines the two type systems in a way that makes sense and can produce a programming language.
But yes, for the linearity control you'd need to model variables within the type system and indirect to a function handling h's resource relationship instead of using it directly.
Formally I would like to compare linear types and dependent types and their ability to encode or not each other. I don't know what the tools for that are.
I know Lean as a theorem prover. Can someone explain why it is now also a programming language? Is it so you can write a program, then prove its correctness in the same language? Or are there other synergies?
Yes to all. In Lean 3, they found it was great being able to write tactics in Lean itself, which has led to many nice tactics in mathlib. Lean 4 consolidates many lessons learned, and most of Lean itself is written in Lean now. In principle they'll be able to prove different parts of the Lean system are correct (eventually).
Lean has a C backend, so you can write in Lean, prove your program meets the spec, then output reasonably optimized (but impenetrable) C code.
I made this raytracer partly to see how efficient that code would be (I haven't gotten around to comparing it to anything yet, though), partly to try writing a real program in Lean rather than my usual theorem proving, and partly to better understand ray tracers.
In principle, one could prove this raytracer (once fully debugged :-) ) converges to the theoretical solution to the rendering equation, with arbitrary precision floats. This would be an ungodly amount of work to carry out right now in Lean 4, but it's in principle doable, I think.
I ended up writing one in the meantime -- check out the end of the readme. For this workload it seems to be a factor of 30x. The C program gets away with no heap allocation and inlining the entire render loop, though Lean does do quite a lot of unboxed float arithmetic. I tried to optimize the Lean and only shaved 25% off the runtime (code in a branch).
I suspect that it could be made a bit faster if each thread had its own copy of the scene. Lean handles reference counting of shared objects differently.
Essentially: all programming languages are logics, and all logics are programming languages; all programs are proofs, and all proofs are programs; all (static) types are propositions, and all propositions are (static) types; and so on for every facet of computer programming and theorem proving.
It's usually pointless to apply this to existing languages/logics/etc., since they end up being pretty terrible. For example, in a dynamically typed language like Python there is only one static type; hence if we view it as a logic there is only a single proposition; all Python programs are proofs of that same proposition, including trivial programs like '42', and hence the logic is completely trivial.
More sophisticated languages can still end up being pretty trivial. For example, in Java the value 'null' is a member of every type; hence from a logical perspective 'null' is a proof of every proposition. Hence Java also collapses into triviality (from the programming perspective: no matter how complicated we make our class hierarchy, or how long we make the dependency chains between our constructors, users of our API can just give 'null' for every argument and Java's type checker will gladly accept it)
Even more 'hardcore' languages like Haskell, which don't have the 'null' problem (but do have a less-severe problem called "bottom", see http://chriswarbo.net/blog/2020-02-09-bottom.html ), end up being pretty trivial too. For example, Curry-Howard tells us that function types correspond to implication propositions, so 'T1 -> T2' is a function type with argument type T1 and return type T2, and also the proposition that T1 implies T2 (i.e. if T1 is true, then T2 is true).
At first glance this looks really useful, e.g. we if we have a function with type 'CustomerRecord -> JSON' it's tempting to think of this as proof that CustomerRecords can be represented as JSON. Yet that's not what it says: in particular, the return type doesn't have any relationship to the input type. This function is actually stating that "if you give me any CustomerRecord, I can give you some JSON value". This is much less useful; e.g. we could implement this function by always returning the JSON value '"hello world"'.
Hence the Curry Howard Correspondence only tends to be useful for languages which were designed to take advantage of it. Lean is one example; others are Agda, Coq and Idris. In all of these languages, proofs and programs are the same thing; functions and implications are the same thing; conjunctions and tuples are the same thing; and so on. This lets us prove propositions about programs, programatically generate proofs, and any combination of the two.
Incidentally, the way that these languages avoid being trivial is by having dependent types. These let us state relationships between the input and output types of a function, like '(x: Int) -> Even (Double x)' ("if x has type Int, then doubling x is even"), or between the elements of a tuple, like '(x: Int, Even x)' ("a value x of type Int, and a proof that x is even"). I wrote a bit about this at http://chriswarbo.net/blog/2017-11-03-dependent_function_typ...
The logic equivalent of a dependent function type is a universally quantified proposition (i.e. "for all x, ..."). The logic equivalent of a dependent tuple type is an existentially quantified proposition (i.e. "there exists an x such that..."). This tuple idea can also be extended to name/value records too.
This seems really cool, but can someone who knows both Lean and Haskell (the former is probably just a subset of the latter, lol) explain how this is materially different from a Haskell implementation of the same algorithm? I don't know where to look for the cool dependently-typed parts.
I did this to see how it was as a practical functional programming language, and I find I like the syntax and typeclass system more than Haskell's. One neat feature is that do notation had mut variables and for loops that get compiled to folds automatically. It also has Idris-style arrow notation for binding inside pure expressions. (It's also really easy to compile to C!)
The Lean runtime implements the Perseus algorithm for functional in-place updating when possible. It's a very clever reimagining of reference counting from basic principles. The loop that combines all the results from all the threads takes advantage of this and doesn't allocate intermediate arrays.
Proof-carrying code is an interesting area for privacy and security. It would be nice to know that your differential privacy algorithm doesn't leak private information or that your hashing algorithm doesn't expose side channels. Being able to keep the spec in line with the compiled code, so called deep spec, work is super dope.
Can theorem proving languages actually prove that crypto code doesn't have side channels? That kind of thing isn't in most type systems. The only project I know that has thought about is Cryptol.
[1]: https://arxiv.org/pdf/1908.05647.pdf