Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
What I wish I knew when learning OCaml (2018) (baturin.org)
117 points by cribbles on May 3, 2022 | hide | past | favorite | 86 comments


ML really is a beautiful and under-appreciated family of languages. Haskell is a bit too supernatural for my taste, but ML and its derivatives hit the sweet spot between powerful type systems and natural syntax. As a Kotlin developer, I find reading OCaml code a breeze and the compiler is refreshingly snappy (much faster than Gradle). I just wish it had better developer tools.


I fully agree. F# is my favorite language, and it almost feels like a statically typed Scheme at times, just with built-in pattern matching and pipes thrown in. It's a really great way to program, and it's easy to go imperative or OOP when needed without friction.

The only thing missing from F# is an Elixir/Erlang/OTP-like process system (`MailboxProcessor` is pretty good though), but then again, every language except Elixir/Erlang/OTP is missing that.


I am not further enough in my F# journey to have tried this, but you could technically pair it with orleans for actor oriented systems ?

Love the language though I have spent only a week working with it. It seems to have a lot of things I liked in Kotlin.


F# does have the `MailboxProcessor`. It is quite simple and capable.

https://fsharp.github.io/fsharp-core-docs/reference/fsharp-c...

https://fsharpforfunandprofit.com/posts/concurrency-actor-mo...

I haven't looked at Orleans in a while. Last time I did, I came away with the viewpoint that it was very much designed to have a C# interface, which while manageable in F#, doesn't really provide for an idiomatic F# experience. I'm also not too sure, but I think it basically requires a cloud or distributed environment. Can it just be run on a local computer?

But even the `MailboxProcessor` doesn't do what Elixir and Erlang do on top of the BEAM VM. An instance of a BEAM VM is a single OS process, but the BEAM can handle millions of Erlang processes, which are not OS processes or even threads. They are their own thing that are extremely lightweight. The BEAM scheduler is very nice as well, such that it switches among all the running processes.

Even though I develop in Elixir, I am not yet a BEAM expert. So before I say something incorrect, I recommend taking a look at the talk The Soul of Erlang and Elixir by Sasa Juric.

https://www.youtube.com/watch?v=JvBT4XBdoUE

It fully encapsulates and describes what makes processes so special in Elixir and Erlang.

There is the Gleam language, which is a statically typed language on the BEAM VM. However, they moved away from the ML-dialect syntax, which is unfortunate.


... and also Caramel which is a Ocaml/Reason dialect so you get to keep the Ocaml syntax

I have never tried it though

https://caramel.run/manual/introduction.html


Maybe one day Fable will add an Erlang backend.


I like Rust for syntax, tooling, and ML ideas but I wish there was a "Rust-lite" that was garbage collected.


It's got its warts, but Ocaml (and ReasonML if you want a more familiar syntax) are right in that sweet spot for me.


I need to give ReasonML another shot. It's been a few years, but at the time it really only seemed to work if you were targeting BuckleScript and even then you had to write a bunch of Makefiles and jump through other weird hoops. Beyond that, it suffered from some other OCaml issues--multiple "standard" libraries, different packages using different async libraries, some gratuitously abstract libraries, etc.

I'm really hoping it breaks into the mainstream such that it gets the investment that other mainstream languages get (hopefully said investment will smooth out these rough edges).


with its shared-nothing-by-default thread model, rust would make for an excellent GC'd language as if it gave the ability of opting into GC on a per-thread basis with thread local heap and GC. Some relaxing of the shared-xor-mutable probably would need to be relaxed though.

I don't know how practical it would be to implement in practice.


Gleam


Yeah, this is one I’m keeping my eye on.


how do you feel about nim?


Haven’t looked much at it yet.


that’s pretty much F# :)


As a long-time F# user and current Rust learner, I would say the similarities are mostly cosmetic.

First and foremost, F# is functional-first, while Rust isn't really functional. Rust takes some useful features from functional languages, to be sure, but the overall paradigm is more procedural. A lot of key functional idioms and design patterns don't really fly in Rust because it's difficult-to-impossible to make them play nice with its memory management model.

Rust has traits, but not OOP. F# has OOP but not traits or typeclasses.

F# is immutable by default, and discourages mutability. Rust is immutable by default, but embraces (and tames) mutability.


If F# had static, native compilation by default, a closer Cargo analog, and eschewed the OO stuff (probably mostly just there for C# interop?) and the OCaml syntax.


I tried OCaml a long time ago, and one of the things that really turned me off of it was all the inscrutable error messages. I went to #ocaml on Freenode for help, and when I had the error messages explained to me I asked how the person who explained them knew what they meant. He told me that the reason he knew was because he took a couple of semesters of type theory courses at his university. I didn't want to have to take a couple of semesters of type theory courses in order to be able to program effectively in this language. I hope the situation has improved since then.

The other thing I didn't like (which was something shared with other statically-typed languages) was feeling like I had to wrestle forever with the compiler to get my program to run. It just always felt so much easier to write programs in dynamically typed languages. Sure, my programs might have bugs in them, but I could iron them out over time, and my programs change so much anyway that pieces of buggy-but-working code in those dynamically typed languages might be replaced wholesale anyway before I even ran in to the bugs.. so the pace of prototyping in dynamically typed languages is much faster, in my experience.


I think both of these points can be alleviated with certain coding habits that come with practice. For instance, adding type annotations in some places will help with error messages. And using "assert false" (which has type 'a) will let you run an incomplete/broken program.

As for the error messages requiring expertise in type theory, it sounds as an exaggeration, esp. when not using advanced features.


Hey, thanks for that "assert false" trick! I never knew that!

I don't know much about type theory but I did eventually manage to learn to interpret OCaml's type errors well enough to get things to compile. It still feels like driving a car by scraping it along a highway guardrail but it's often better than testing. I have to write twice as much code as in Python, it takes me longer to get it to run, and I still can't call it from C or Lua, but it runs twenty times as fast as Python and the process of thinking through the types helps me a lot to write code that works.

I keep hoping I'll internalize the type system enough that my programs run the first time they way they do in Python, but it hasn't happened yet.

I still often write a prototype in Python, though.


OCaml also has a built in function failwith: string -> ‘a, so you can print an error in those cases. If it’s a code path you ever think will execute, failwith “some error here” is probably better than assert false


"adding type annotations in some places will help with error messages."

I annotated the hell out of my programs, and completely avoided OCaml's type inference as much as I could because I saw that it could not guess what I meant. I still had tons of problems understanding the error messages.

Error messages like "This expression is of type X but an expression was expected of type X" were not uncommon, and super frustrating.


> I annotated the hell out of my programs, and completely avoided OCaml's type inference as much as I could because I saw that it could not guess what I meant.

The type checker and the type inference parts of the compiler are one and the same. If the type checker can’t infer what you mean you are most likely writing invalid code.

> Error messages like "This expression is of type X but an expression was expected of type X" were not uncommon, and super frustrating.

Ocaml is a strongly typed static language. As such your code has to respect type constraints. It’s very much a feature not a bug.

Still to be less defensive I too sometimes wish the error messages were more clear in highlighting why a certain type was expected but at least Ocaml provides tooling to live check the inferred type of any sub-expression while you edit your code.


"you are most likely writing invalid code"

Of course I'm writing invalid code: that's why I'm getting error messages. The problem is that I'm having trouble understanding why it's invalid. Error messages are supposed to help me here, but quite often they didn't.. they just led to more confusion... especially in complex code, where I needed help and clarity most.


> Ocaml is a strongly typed static language. As such your code has to respect type constraints. It’s very much a feature not a bug.

That's reasonable if the error message were "This expression is of type X but an expression was expected of type Y". But the worst error message in the OCaml interactive toplevel is when they're the same. This happened when you defined a new type named X (which, incidentally, must be lowercase), but there were still references to things of the old type X.

I think this has been fixed in recent versions of OCaml.


You can get the same thing in Java essentially: two classes with the same name but loaded by different class loaders are incompatible. You’d get errors at runtime that you can’t cast X to X. I think they’ve made the exception messages more helpful in more recent Java versions, but in older versions the errors were very confusing to newbies because they didn’t mention the class loader difference which was the actual cause of them


In OCaml I think this could only happen with the interpreter (e.g., an interactive toplevel). At least today the compiler complains if you try to define two types with the same name in the same context.

The problem is that it would happen often in the interpreter when you were interactively trying stuff out, because when you're trying stuff out, you change the definitions of things.

Python actually sort of has the same problem not only when you use an interactive interpreter but even when you reload a module: the new class definitions don't modify the old one, they just get bound to the same name. So it's easy to end up with two alglayout.Vbox classes or two diff.Formula classes in the same interactive interpreter at the same time. But it's much less of a problem in Python because Python usually doesn't care what class things are, just what methods they define, so objects belonging to both classes can coexist peacefully. The usual exception is when you have an isinstance check somewhere.


Yeah if you did this now you’d get something like “this expression is of type t/1 but an expression of type t/2” to disambiguate.


The fact that this problem took 25 years to fix is maybe the more damning thing. It wasn't a bug, exactly; it was a usability problem. Clearer demonstrations of development priorities could hardly be given.


That was mostly a man-power issue. The good news is that nowadays such nonsensical error messages are near the top of my personal development priority for OCaml.


During those 25 years OCaml gained native-code compilers for new architectures, labeled arguments, and polymorphic variants, among other things. How could it be mostly a manpower issue?


Work and time contributed by open source collaborators on their free or academic time cannot be magically converted from one subsystem to another. It is honestly very easy to have progress on the aspects that spark interest while some subsystem are starved from attention when there are no full-time developers working on a project.


That's true, but that doesn't mean there was no manpower to fix that error message; it meant that fixing it wasn't a priority to the people who were working on OCaml at the time. You're just offering an explanation for why it wasn't a priority: fixing it didn't spark their interest.


And the REPL should warn you that if you have a value of type "t/n" it is probably a left-over from a previous definition of type t in the session.


There is a balance to find between not enough annotations and too much annotations that one has to find. My algorithm is to write few annotations by default, and my answer to error messages I do not understand is to add "obvious" annotations (e.g. type of arguments) to tell type inference what I expect. Rarely does that solve the underlying issue, but it often makes the error message much clearer.

With experience, I am more or less able to preemptively add the annotations that will probably help (e.g. aiming avoid accidentally polymorphic functions, the return type of mutually recursive functions, etc.).

I don't think I have ever gotten "This expression is of type X but an expression was expected of type X" outside of the toplevel with type re-definitions, and recent OCaml versions have a nice error message for this now:

    Error: This expression has type t/1 but an expression was expected of type
             t/2
           Hint: The type t has been defined multiple times in this toplevel
             session. Some toplevel values still refer to old versions of this
             type. Did you try to redefine them?
In general the error messages have been continuously improving in recent years.


> Error messages like "This expression is of type X but an expression was expected of type X" were not uncommon, and super frustrating.

That particular error message has been gone from OCaml for years now.


Ah! Didn't know the `assert false` trick. I believe `failwith "..."` will not always allow you to run the program


I guess inscrutable error messages are a "feature" of ML-like languages. I have not used OCaml yet, but the first few years with Haskell probably have been just as rough in terms of parsing type errors.

Haskell has a few niceties when trying to debug type errors nowadays. For example typed holes that allows one to spice in `_` within code and the compiler will tell you what type it expects you to replace that hole with, alleviating many issues I had previously. But in the same sense I'm in agreement with you, and I've been more productive when writing Guile/Chicken Scheme code for small tools.

Interestingly I've binged on OCaml videos recently, because I would find it interesting to work with it some day. From a distance what I like about it (aside from the functional language goodness) is that it supports objects, has row polymorphism, optional arguments, and imperative constructs (I sure like my mutation heavy for/while loops when prototyping code).


I haven't actually used OCaml in any depth, but I have found that F# tends to have quite good error messages. In the Programming Languages Coursera course, I found that SML has pretty bad error messages.


TIL that Ocaml/SML are compiled into lambda expressions... I literally thought lambda calculus is only used in CS classes.

OCaml's syntax is pretty annoying but type inference is actually amazing.. I changed my mind over it, as previously I thought explicit type annotations are simpler. Turns out, it would be humanly impossible to explicitly annotate every piece of OCaml, just let the compiler do it for you


GHC's Core language is amazingly small https://gitlab.haskell.org/ghc/ghc/-/wikis/commentary/compil..., and it's just polymorphic lambda calculus (with coercions added a while ago).


Did you learn that from some other article? I couldn't find "lambda" in this page.

I think it's too much to say that all Standard ML compilers work or one way or another. There are 6 major compilers and a number of minor ones.

They don't really follow the same approaches in general.


One of the intermediary representation used by the OCaml compiler is a lambda calculus (https://github.com/ocaml/ocaml/blob/trunk/lambda/lambda.mli#...). But yes, this is only the OCaml compiler.


At a higher level that that, it's cool that even in SML's already sparse surface-level syntax, quite of a bit of it is just sugar for each other: https://i.imgur.com/pkSg4xm.png


That may be the semantics of it and for simple compilers that may be a true transform. But as compilers get more mature they tend to specialize everywhere they can so the actual compiler doing this under the hood may or may not happen.


What do you not like about the syntax? I hear this often, and I vaguely remember not liking it myself at one point. But it was also my first ML and now I can't remember what specifically I found unpleasant.

I do think I've partially ascended into senior dev galaxy brain around syntax though. For the most part I think they're all fine and also all bad and also don't matter very much at all.


A good list. ML languages push you (kicking and screaming) into the pit of success.


Kicking and screaming? ML dialects, or at least F#, are usually a pleasure.


They're barely used in industry.


I wonder why this is the case. Ocaml syntax doesn't seem very arcane, it is not hell-bent on functional purism and pragmatically allows for imperative code. Performance seems quite reasonable.

What does it lack that prevented it from gaining wider acceptance? Surely buy-in from a large company was an issue, but even F# hasn't seen any significant uptake. Is it really the lack of curly brackets? That didn't stop Python...

Is it just that being labelled 'functional' was a huge stigma for a very long time?


The syntax is not great, the standard library is very sparse, and the error messages from the compiler and interpreter are terrible (and used to be worse). The OCaml team used to treat the native code compiler (ocamlopt) as a second-class citizen, but it's the implementation that matters for performance. Since the end of Dennard scaling about 15 years ago, OCaml's lack of multithreading has also been a pain point, one which is finally starting to improve in the last year or two.

I don't think being labeled "functional" was ever a huge stigma. Being labeled "Haskell" is a huge stigma in some circles, but OCaml's never been labeled "Haskell". Until about 10 years ago functional programming didn't become popular enough to have any kind of popular opinion, positive or negative, and the people who knew about it generally thought it would be nice to do more of it.

Mostly I think people use a programming language either because they already know it or it's the scripting language for an environment they have to use. Assembly was the scripting language for your CPU, especially in mainframe days. BASIC was the scripting language for personal computers. Visual Basic was the scripting language for WIMPs. VBA was the scripting language for Excel and Word (previously Excel had a table-based macro thing). sh was the scripting language for the Unix filesystem; C was the scripting language for the Unix system call interface, and a nice upgrade from assembly. JS was and is the scripting language for the browser. Lua is the scripting language for WoW, Roblox, and Minetest. Perl was the scripting language of the WWW, then PHP was, or Ruby if what you really want to script is Rails. SQL is the scripting language for your database. Objective-C was the scripting language for NeXTStep; now Swift is. MATLAB was the scripting language for EISPACK and later *PACK and BLAS, though it had some competition from IDL for a while. Now Python is the scripting language for Numpy (and thus BLAS), Matplotlib, and TensorFlow. Only a few popular languages are exceptions to this rule: Fortran, COBOL, C++, Java, R, Pascal, Golang, and C#.

OCaml? OCaml is the scripting language of ocamlyacc and Coq. If you write OCaml then it's probably because you like Coq. This may have been a public relations problem in the Anglosphere, especially before same-sex marriage.


All those things you mentioned in the first two paragraphs are fixed by F#, however, it is still not popular.

The reasons in that case are that .NET was not originally cross-platform, which was a major blunder for Microsoft, C#, and especially F#. The other is that people have some weird stigma against Microsoft, despite it not applying in many cases. Although they should have started off cross-platform, the transition from .NET Framework, to .NET Core, and now to .NET 5/6/7 has been really impressive. That along with GitHub and Visual Studio Code, it's pretty amazing how much progress Microsoft has made for developers.

I just wish it would bring more people to F#, as it's sort of a Goldilocks language since it hits so many sweet spots.


Not being labeled "Haskell" is fixed by F#?

And I don't know that F#'s syntax is better than OCaml's. Arguably it's worse. (I haven't tried using it so I don't know how the error messages are.)


Why do you think that the native compiler was treated as a second class citizen? This is quite strange take from my point of view. For instance OCaml native compiler was available on the M1 Macs two months after the M1 release. Similarly, all major CPU architectures (x86, ARM, PowerPC, RISC-V, s390x) have been supported for years.


20 and 25 years ago, the attitude (as I remember it) was that the native-code compiler was not very important because the bytecode interpreter was fast enough for most uses.

It's true that that is no longer the attitude. But if we want to understand why one language is more popular than another, we usually need to look at things that happened in the past, not just things that are happening right now. Even C's meteoric rise took 15 years from the first C compilers until it was the undisputed queen of programming languages, partly as a result of missteps made by the communities of ALGOL-68, ALGOL-W, Lisp, PL/I, SNOBOL, TRAC, FORTRAN, BCPL, and, perhaps most interestingly, MULTICS, 20 years earlier.

(It's also possible that my memory that the OCaml team treated the native-code compiler as a second-class citizen is incorrect, either because I misremember what they were saying or because I misinterpreted it.)


> ...if we want to understand why one language is more popular than another, we usually need to look at things that happened in the past, not just things that are happening right now

Figuring out why OCaml didn't get popular in the past is not a super useful exercise. It's more useful to understand what is needed to popularize it today.


Ah yes, it might be possible that during the time of Caml light, the bytecode compiler was for a time considered to be fast enough. I am not sure that a ephemeral stance from 25 years ago on an ancestor language still really matters nowadays.


I'm not talking about Caml Light or even Caml Special Light, which is when the native-code compiler was initially added. 25 years ago was 01997. Objective Caml was released in 01996 (and renamed OCaml in 02011).

Projects like KDE, the GIMP, GNOME, Lucene, Jython, LLVM, Asterisk, Audacity, CMake, and Danger that got started in C, C++, or Java in the 01997–02002 period are still based on those languages today. OCaml would have been a reasonable language for all of these, though a new interpreter for Python in OCaml would have lacked Jython's key feature, the ability to easily script other things written in Java.

(Danger? Well, Danger as such ended in 02011, but Andy Rubin founded Danger in 01999 and left to found Android in 02003, which ended up making a Java-based smartphone much like Danger's Java-based smartphone, but less locked down.)

The fact that Java focused heavily on performance starting in 01997, when Sun bought Animorphic, meant that a lot of things were able to be written in Java in the 02000s, things that were previously just unthinkable. Lucene predates HotSpot by a bit, but the whole Hadoop ecosystem that grew out of it only makes sense in a HotSpot world. Minecraft obviously pushes performance heavily. SPARK is written in Scala, but OCaml would have been a good fit too—if Hadoop had gone that direction.


I have to ask: you're concerned about the year 9999 problem?


Hey, a man needs a hobby.


Then COBOL would be the scripting language for the mainframe, and C# would be the scripting language for Windows


The Win32 API is defined in terms of C functions. Not sure about Win64. At best C# is the scripting language for .NET libraries like WinForms.


It's not that there's stigma, but that it is probably more difficult to think functionally when most programming education teaches imperative methods.


Libraries!!! Python is "batteries included" while OCaml barely has insulation around electrical wires. EDIT: okey that actually maight be a chicken-egg reasonning


If Ocaml had been designed in the USA it would have been widely successful. It’s main drawback was being mostly a French project.


> If Ocaml had been designed in the USA it would have been widely successful

I doubt that. There are plenty of American SML dialects and F# is about as American as it gets, but neither of these languages saw any commercial success.


Even if functional programming itself isn't used, a lot of its patterns are extremely useful even outside of functional programming.

Code returning side effects rather than having side effects, for example, is great, and I find myself returning to the principle in a lot of the stuff I design as an architectural principle.


But look at their influence: TypeScript and Rust, not to mention Elm, PureScript and other less mainstream languages


+ Scala


TypeScript is very much a C++ descedant, IMO.


shameless plug: learn with ocaml by example[1]. It is still heavily work in progress though, please feel free to create a PR :)

[1]: https://o1-labs.github.io/ocamlbyexample/


Could someone please explain ? : type 'a list = 'a :: 'a list | []

The article says "::" is a Data Constructor. I can make sense of type 'a = Left of 'a | Right of 'a where Right and Left are the Data constructors but I don't see the link with the part I don't understand.


The definition of list is:

  type 'a list = [] | (::) of 'a * 'a list
When you write a list like:

  [x; y; z]
This is syntactic sugar for:

  x::y::z::[]
Which is syntactic sugar for:

  (::) (x, (::) (y, (::) (z, [])))
One can imagine using more ordinary constructor names instead:

  type 'a list = Nil | Cons of 'a * 'a list
And then the above would be:

  Cons (x, Cons (y, Cons (z, Nil)))
In OCaml, data constructor names mar be either a capital letter followed by set or more capital/lower letters/underscores/apostrophes/digits, or one of the following:

  []
  ()
  true
  false
  (::)
Type directed constructor disambiguating means you can do funky things like:

  type 'a nonempty = (::) of 'a * 'a list
And write such a value just like you would a normal list.


Thanks for the detailed explanation. One more question though, when I write

    type 'a = 'a :: list 'a
It is syntactic sugar that allows me to get the * product type for free, right ?


That's actually invalid syntax. '::' can't be used infix in a type declaration. And type application is in reverse. So it's not list 'a but 'a list.


I think that is made up syntax from the article but maybe I’m wrong.


A list is either the empty list [] or

Two pieces of data (with the constructor ::) with 1 being the head of the list (type 'a) and one being the tail of the list (type 'a list, a recursive definition).

:: is an allowed identifier in oCaml. If that is not allowed, the typical names are Cons for :: and Nil for []


Thanks. I had not understood it could be recursive.

Do you know Cons' meaning ?

Also why does :: act as a separator... ? Is that something you ca do with any Data constructor, I mean can I write type 'a = 'a Cons of 'a | Nil of 'a ?


Cons comes from Lisp: https://en.wikipedia.org/wiki/Cons . In practice a cons cell is a list node represented as a pair of pointers, one pointing to the contained element, the other to the next node (or null, or another element for degenerate lists).


You would write something like List a = Cons a (List a) | Nil

As :: doesn't start with a letter, it's infix by default. But basically a binary constructor. If you want to create the List [1,2,3] in the prefix notation you might be used to: ::(1,::(2,::(3,[])))


Here is the Wikipedia page: https://en.m.wikipedia.org/wiki/Cons


It is equivalent to the following:

  type 'a list = Cons of 'a * 'a list | Tail
...but with a fancy syntax for Cons and Tail.


Ok. So in your example the construction action (placing two things next to each other IIUC) is done by the * product type Operator and Cons has no special meaning whereas if I had used ::, I actually get an the (::) name to refer to the left hand of the union and the construction action. Did I get that right ?


Sort of, I think?

`type 'a t = Cons of 'a * 'a t | Tail` is defining a type with a constructor `Cons` with two arguments and a constructor `Tail` with zero argument. You write values of type `'a t` as `Cons (a, b)` and `Tail`.

`type 'a u = (::) of 'a * 'a u | []` is defining a type with a constructor `(::)` with two arguments and a constructor `[]` with zero argument. You build values of type `'a u` as `(::) (a, b)` and `[]`.

The rest comes from the special syntax support in OCaml for the `(::)` and `[]` names. Namely, `a :: b` is parsed as `(::) (a, b)`, and `[a; b; ...; z]` is parsed as `a :: b :: ... :: z :: []`, and hence as `(::) (a, (::) (b, (::) (..., (::) (z, []))))`.


`|` is read as "or". `::` is syntax for what Lisp and Scheme call `cons`, which places an element onto either a list or the empty list, `[]`. The syntax `'a` is a type variable, which means that `'a` can be any type but it must all be the same type everywhere `'a` appears.

So this type definition means that `'a list` is a list of elements of type `'a`, and it's defined by being something of type `'a` consed onto either another list of type `'a` OR the empty list, represented by `[]`.


The golden rule is missing:

> Don't try to understand the error message except if you have no other choice.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: