It is a pity Concurrent ML didn't take off. F# has a great library called Hopac that implements it, but it is 50 times less popular than its closest competitor Rx.
Also seconding that other post. Actor models and async concurrency are only useful if you need to send messages between machines, but otherwise you want to use synchronous concurrency as it is easier to deal with.
Another thing that could happen is a fundamental algorithmic advance to replace backprop. The brain doesn't use it, and backprop doesn't have answers for how to do long term credit assignment and continuous learning. It works poorly with RL as well.
Better algorithms would require writing new ML libraries which would weaken the stranglehold GPUs have on ML. New niches opening would inevitably be bad for Nvidia, but good for upstarts.
Here is a demo of a backend for a Process-In-Memory chip. It is a demonstration of Spiral's ability for heterogeneous programming. I want to do more backends for different kinds of devices, and posting this on HN to gauge the interest in this kind of work. While x86 devices have a ton of good languages, new and niche hardware is likely to have poor language support and tooling, so I am hoping to fill that gap with Spiral. Whether you are a company that has some new kind of hardware or a user of it and you are fatigued of programming it in C, that is what I aim to alleviate. Going from a low level language like C to a high level one like Spiral would significantly boost your productivity.
The backend I am demoing here is for UPMEM, but now that I have it as a template, I could easily make similar backends for other kinds of devices assuming I have access to at least a simulator.
The whole point of programming is automation. Having a tool that would make programming much more efficient is not a failure condition for programmers.
Not related to the Aluminum Scandium Nitride memristors from the current paper, but this article I just found fascinating. I've been a stealth fan of memristors for a long time, and this finally answers why HP's efforts failed in such a crazy fashion. It is because they were using metal oxide memristors, and they rust easily.
Maybe if the devices from the current paper turn out to be good that could finally act as the fuel for the next AI wave after deep learning? The paper looks good to my eyes, but I am not a materials expert.
If you have a function, there is a certain equivalence (in terms of compilation) where you can either allocate it as a closure at runtime or emulate the effect of that by just tracking it fully at compile time and inlining it a the site of use. If you do the later, you can avoid having to do a heap allocation that having a runtime closure would require.
I said it is an equivalence in terms of compilation because, whether it is allocated as a closure on the heap, or tracked at compile time has no bearing on the correctness of the program. It only affect its memory allocation and performance profile.
If you are a C programmer, or working at a similar level of abstraction, this concern over heap allocations might seem academic, but to a functional programmer such as myself there is a lot of importance because we use function composition for all of our abstractions and don't want them to heap allocate as closures. The more abstract the code we are writing is, the more inlining matters.
The way I am describing this is confusing because inlining is not an event that happens. Rather inlining is the default. Tracking functions at compile time is the default. Heap allocation of them and their conversion into closures is an event, after which the compiler stops tracking variables of a function on an individual basis and keeps track of them at a lower resolution.
> I would have thought the opposite of heap allocation was "stack allocation"?
| Stack | Heap
------|-------------
Full | x |
Box | | x
To get a full picture of how Spiral's partial evaluator sees functions and recursive unions, it consider the table above. Spiral is designed so that it is very easy to go between the fully known and boxed. You don't actually control heap and stack allocations explicitly, but instead shift the perspective of the partial evaluator through the dyn patterns and control flow.
It just so happens that a very natural way of generating code for fully known functions is to leave their variables as they are on the stack. If the function needs to be tracked at runtime, then those variables are used to make a closure. The natural way of compiling things that are fully known (functions, unboxed unions) is to put them the stack. While things that are boxed (closures, boxed recursive unions) are on the heap.
This is just on the F# backend. If I were compiling to LLVM for example, I would not use a stack, but the infinite amount of virtual registers instead. And non-recursive union types are compiled as structs, meaning they should be on the stack even in boxed mode.
This view of memory is more complicated than the usual heap vs stack allocation one, because I am playing a game here where I attach a bunch of other concepts to it, but it is very useful and simplifies actual programming.
It sounds like you're using "inlining" to mean "stack-allocating the closure", but when I think of "inlining" I think of "expanding a function body into its call sites to elide function-call overhead". Maybe there's an implicit relationship between the two that I'm not understanding? I guess in order to stack allocate a closure you have to generate new function instances for each "type" of closure, and since these types rarely change between call sites we just inline them? Apologies if I'm just dense.
Regular function (not heap allocated closures) do not have a runtime footprint. Their body just gets tracked by the partial evaluator, but never manifests unless they are applied. Their variables (usually stack-alloc'd) get tracked on an individual basis.
So it is not the case that functions are stack allocated. They are fully known.
A closure is something that is opaque to the partial evaluator. Closures do need to be heap allocated because they could be recursive data types.
> "expanding a function body into its call sites to elide function-call overhead"
If you do this, not only do you not need the function call overhead, but you also do not need the overhead from heap allocating a function a runtime (converting it into a closure.)
I am not sure how good I am at explaining this. I would appreciate somebody going over the docs and giving me some feedback on it. When the last Spiral was posted on HN, there was some initial excitement for a few days and then it was crickets.
If you are willing to give it a try, I'd be happy to answer all your questions in the Spiral issues.
Sorry, that Spiral is an entirely different language that has nothing to do with mine! The only reason I linked to it is because somebody in the Reddit PL sub thread brought my attention to it.
Having been inspired by that video by Kevin Buzzard and finally finding something that would be worth formalizing, I am trying out Lean at the moment. I have about 2-3 months of Coq experience, so I can say that even without the quotient type, Lean is much better designed than Coq. I can't vouch for how it will do at scale since I've only started it out, but from what I can see, Lean fixes all the pain points that I had with Coq while going through Software Foundations.
It has things like structural recursion (similar to Agda), dependent pattern matching (the biggest benefit of which would be proper variable naming), unicode, `calc` blocks, good IDE experience (it actually has autocomplete) with VS Code (I prefer it over Emacs and the inbuilt CoqIDE is broken on Windows), mutually recursive definitions and types, and various other things that are not at the top of my head.
If I were to sum it up, the biggest issue with Coq is that it does not allow you to structure your code properly. This is kind of a big thing for me as a programmer.
Also seconding that other post. Actor models and async concurrency are only useful if you need to send messages between machines, but otherwise you want to use synchronous concurrency as it is easier to deal with.