Your questions are probably mostly rhetorical, but I'll give a brief answer to them.
> Are we assigning here with the =?
Kindof you are assigning what the type `Tree a` is.
> What does the pipe symbol mean?
The pipe is symbolizing or/either here. A tree is either a Leaf or it is a Node with two subtrees.
> Why pipe instead of another =?
You are building up a single type with the pipes, having an extra = wouldn't really make sense when you are thinking about building a type algebraically.
> Why the weird formatting?
The formatting is optional. It is free to be all in one line if you want it that way. For the given example, I would probably make it a single line, but I'm not a Haskell veteran.
Python is more popular than Haskell, which is why it's easier to google.
Do note Haskell tutorials and communities abound, and you have excellent online tools such as Hoogle (in which you write the type of what you think you want and it responds with "these are functions with a similar type signature, with their documentation"). It's easy to google Haskell things, just not as easy as googling Python things :)
Do note the type definitions from the example are Haskell 101 and will be covered very early in almost every tutorial, for example Learn you a Haskell.
PS: it's not a "pipe operator" you're looking for. This isn't an operator at all! The "|" you're looking for it's in a definition, and it means a union of alternatives (this type can be "this" or "that" or "this other thing"). If you think about it, this "union-or" is written the same as the bitwise-or from more popular languages :)
Because everyone knows what an enum is, and that <A> will be a generic type, from first glance. There's nothing to guess, apart from Box being some kind of pointer abstraction.
Looking at the second definition it's not immediately apparent what 'a' is and "Node a (Tree a) (Tree a)" seems just like a bunch of words concatenated by spaces, it has no apparent structure or meaning, unless you're used to writing Haskell/ML/Lisp/etc.
> Because everyone knows what an enum is, and that <A> will be a generic type, from first glance
That's false. A programmer coming from Python or Go won't know this. Nobody who hasn't been exposed to the extremely arbitrary generics syntax in Java-like languages will know about <A>.
There is a huge unreadability right there staring at me: What does that Box<> do, and why? Rust is borrowing more and more of the obscurities of C++, and thats not a good thing...
There's nothing unreadable (and certainly nothing obscure conceptually) about 'Box<>'. It's just unfamiliar if you don't know Rust. But we'll get nowhere fast confusing readability with familiarity.
I'd be interested to know if anyone has done interesting conceptual and/or empirical work on readability. It seems like a very slippery and difficult concept to me. Readable to whom? Readable in the small or the large?
Readability and familiarity are not the same but closely related. There is nothing inherently more or less readable in the rust or haskell tree example.
I think there is work on "readability", just in another context: Its called typography and orthography. And I think the gist of it is: Do it like everybody else does, first and foremost, strange and unfamiliar equals unreadable.
> I think there is work on "readability", just in another context: Its called typography and orthography.
I think that's a very different case. Maybe some analogies might be drawn between some of that work and some of the lower-level aspects of reading code (related to syntax noise etc), but code readability, if it's a defensible concept at all, is a far more complex and layered phenomenon than letter & word recognition.
The first thing a researcher would need to establish is whether or not readability even exists as a natural kind apart from familiarity. I don't know the field, so this might already have been pursued somewhere.
> There's nothing unreadable [...] it's just unfamiliar if you don't know Rust
Agreed. Note the same applies to Haskell's syntax :)
People confuse "readable" with "based in my knowledge of Java and C, I can't make head or tails of this notation without reading a tutorial first", which in my opinion is not a sensible conclusion.
It's not entirely sensible, but it is understandable. I don't think most programmers are truly aware of how much they know, and how deeply automatic their recognition of programming constructs has become.
Box puts something on the heap and returns a pointer to that thing. Here it is necessary because otherwise the Rust compile wouldn't be able to determine the size (in memory) of a Tree.
Calling it Box is confusing. unique_ptr would be an improvement for the name, or maybe HeapRef. The problem is exactly that rust chose a misleading name Box (boxed types are something entirely different in most languages) instead of the obvious C(++)/Java-like _ptr, ref, * or & notation/convention
I thought the name was pretty clear; when I saw it in some list of different kinds of Rust pointers, I knew what it was immediately.
It doesn't matter if some people are confused, because you can just explain what it is in 3 seconds. What's important for such a ubiquitous type is that the name is short.
I always found the use of "enum" for things that are not really enumerable in a useful way to be very confusing. Or are Rust "enum"s enumerable in some subtle way that I don't recognise? Is it just some vestigial term that now has no relation to its original meaning? At least in C, "enum"s are enumerable because they are just integers.
Rust's "enums" look more like tagged unions to me. I guess the tag is enumerable? Although I also don't understand why Rust called tagged unions "enums."
They enumerate a finite set of disjoint cases, so in some sense they are an enumeration. But the real reason is, of course, that sum types can be seen a generalization of C enums, so the syntax was chosen to maximize familiarity.
They enumerate a possible set of valid values. Hence “enumeration.”
They are tagged unions, but sometimes, the tag doesn’t exist. Or rather, invalid parts of values can be used so that the tag isn’t an extra bit of data, but instead is built into the same space. “Tagged union” gets too deep into only-mostly-accurate implementation details to be a good name.
They seem to enumerate a possible set of valid structures which can hold arbitrary values. I guess it's just so different from C enums I'm having trouble understanding why the name was repurposed. It's probably less different from C++/C#/Java enums (I know at least one/some of those languages have more complicated enums than C).
Sure, tagged union implies a particular implementation that may not always be required, but it's conceptually easy to understand and doesn't have the historical baggage. (Maybe a better description would be strongly typed union? I want to make clear I'm unfamiliar with Rust and just guessing based on the syntax presented.) I think the biggest problem with "tagged union" (or the even longer "strongly typed union") is that it just isn't a good keyword name — it's two (or three) words and fairly long. No one wants to type out 'tagged_union' and from that sense, 'enum' is better. I don't have a better suggestion for you, and IIRC Rust 1.0 has now frozen the language to some extent.
They can be any data type; just a name, a struct, a tuple struct, or even another enum.
I think also, likewise, “union” sounds strange unless you have C experience. Many of our users do not know C, and so that name doesn’t help them either.
I would say the second one is more intuitive to me, it reads like I have a Tree type and it can be either a Leaf or a Node.
The rust example to me isn't immediately obvious that it's an either / or situation other than that must be how an enum works (I know enums from other languages)
Literally the punctuation ("<>{}()") and lexical structure is more familiar to anyone who's written any C-family language (C, C++, Java, ...). I say this as a C programmer with more or less equal (near-zero) Rust and Haskell experience.
Given that GP already stated they don't know enough Haskell to translate all the examples, it seems pretty clear to me that by "familiar" they mean "in a language I'm familiar with".
The word “should” can also be “used to indicate what is probable” (according to the OED). I think that's the way GP intended to use it. As in, "why is it probable that the syntax is more familiar?"
is obviously nested (a Node contains an A and two Trees) and the trees are optional (they are contained in a Box, whose purpose is obvious without even knowing the language)
| Node a (Tree a) (Tree a)
might or might not be nested, given the cheerful taste for currying and juxtaposition without punctuation that prevails in Haskell syntax, and it isn't obvious what purpose the parentheses serve (just grouping ?) and whether the trees are optional.
To be obvious that it’s nested, you need to know that < and > are used as approximations for ⟨ and ⟩. You need to know that they’re brackets, not operators.
You need to know that “enum” means that commas in the next section are different than usual, but only one layer deep—check nesting carefully. You need to know about type parameters in either version.
I had a similar experience learning my first ML: I couldn’t even tell how many words each word would gobble up, because I didn’t know the reserved words get. Syntax highlighting helped, and it’s not a problem after a week or two. It’s no worse than figuring out what’s a binary vs unwary operator in C and its descendants.
Also: it’s not obvious to me that Box means optional/nullable. I’d expect it to mean a required non-null pointer to a heap element.
Children of a non-leaf tree node must be optional because otherwise the node would be forced to have both children.
It's therefore obvious that Box means optional; otherwise there would be a bare Tree<A> to represent a mandatory reference.
I don't know Rust (though I'm familiar with Java & C) and there's nothing obvious about Box. In fact, I'm just learning from your comment that it means optional.
(Box does not mean optional, the parent is wrong. Option is the type for optional. Box is basically a mallloc, placing a value in the newly allocated memory, and then a call to free automatically when it goes out of scope. The box itself is a pointer to the heap. It’s not allowed to be null, in some sense, the opposite of optional.)
How is the Haskell example not "nested" according to your definition? It contains an "a" and two Trees, just like the Rust example. Nothing is optional. (There might be a Maybe or similar type, possibly hidden behind a type name, but it would still not be optional.)
Currying makes no sense in type definitions. It's like saying that in Java you aren't sure if "String name" will run something, "given Java's cheerful taste for running things".
To me, it's obvious the parentheses in "(Tree a)" are grouping things, which is the most immediate (and correct) interpretation, but I'll agree this is more debatable.