Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Why should

    enum Tree<A> {
        Leaf(A),
        Node(A, Box<Tree<A>>, Box<Tree<A>>),
    }
be more "familiar" syntax than

    data Tree a
      = Leaf
      | Node a (Tree a) (Tree a)

?


I know what enum is, but data for me is just 1 and 0.

Are we assigning here with the =? What does the pipe symbol mean? Why pipe instead of another =? Why the weird formatting?

To me Haskell's look is too off-putting. With the Rust example I have a good guess what will the resulting object will look like.

But I know it's just a learning experience. Once I would know Haskell your example looks more elegant to me. I just don't get it from looking at it :)


Your questions are probably mostly rhetorical, but I'll give a brief answer to them.

> Are we assigning here with the =?

Kindof you are assigning what the type `Tree a` is.

> What does the pipe symbol mean?

The pipe is symbolizing or/either here. A tree is either a Leaf or it is a Node with two subtrees.

> Why pipe instead of another =?

You are building up a single type with the pipes, having an extra = wouldn't really make sense when you are thinking about building a type algebraically.

> Why the weird formatting?

The formatting is optional. It is free to be all in one line if you want it that way. For the given example, I would probably make it a single line, but I'm not a Haskell veteran.


The formatting is no weirder than Python's.

You could ask many of the same questions of Python's non-C/non-Java like syntax:

- What is "def"?

- Why the weird formatting? (And unlike Haskell, Python's tends to be stricter!)

- Why do I need to write ":" after some lines but not others? It doesn't work like the semicolon in C-like languages!

- What's with the "if __name__ == '__main__'" weirdness I see in some Python programs?

- What's this weird [f(x) for x in ...] syntax? It doesn't look like anything in C. What's with the brackets anyway?

Etc.

Yet Python with its "weird" syntax and constructs is a hugely popular language...



Python is more popular than Haskell, which is why it's easier to google.

Do note Haskell tutorials and communities abound, and you have excellent online tools such as Hoogle (in which you write the type of what you think you want and it responds with "these are functions with a similar type signature, with their documentation"). It's easy to google Haskell things, just not as easy as googling Python things :)

Do note the type definitions from the example are Haskell 101 and will be covered very early in almost every tutorial, for example Learn you a Haskell.

PS: it's not a "pipe operator" you're looking for. This isn't an operator at all! The "|" you're looking for it's in a definition, and it means a union of alternatives (this type can be "this" or "that" or "this other thing"). If you think about it, this "union-or" is written the same as the bitwise-or from more popular languages :)


Hence Eich’s famous « I was under marketing orders to make it look like Java » :)


Well, the marketing folks were right after all :)


Haskell: A Tree ="IS" a Leaf |"OR" a Node

In the Rust syntax ',' means both OR and AND.

A Tree {"IS" a Leaf ,"OR" a Node

A Node ("IS" an A ,"AND" a Box ,"AND" a Box.


Because everyone knows what an enum is, and that <A> will be a generic type, from first glance. There's nothing to guess, apart from Box being some kind of pointer abstraction.

Looking at the second definition it's not immediately apparent what 'a' is and "Node a (Tree a) (Tree a)" seems just like a bunch of words concatenated by spaces, it has no apparent structure or meaning, unless you're used to writing Haskell/ML/Lisp/etc.


> Because everyone knows what an enum is, and that <A> will be a generic type, from first glance

That's false. A programmer coming from Python or Go won't know this. Nobody who hasn't been exposed to the extremely arbitrary generics syntax in Java-like languages will know about <A>.


One needs to know Rust to actually understand that Box is the Rust's way to do heap allocation.


There is a huge unreadability right there staring at me: What does that Box<> do, and why? Rust is borrowing more and more of the obscurities of C++, and thats not a good thing...


There's nothing unreadable (and certainly nothing obscure conceptually) about 'Box<>'. It's just unfamiliar if you don't know Rust. But we'll get nowhere fast confusing readability with familiarity.

I'd be interested to know if anyone has done interesting conceptual and/or empirical work on readability. It seems like a very slippery and difficult concept to me. Readable to whom? Readable in the small or the large?


Readability and familiarity are not the same but closely related. There is nothing inherently more or less readable in the rust or haskell tree example.

I think there is work on "readability", just in another context: Its called typography and orthography. And I think the gist of it is: Do it like everybody else does, first and foremost, strange and unfamiliar equals unreadable.


> I think there is work on "readability", just in another context: Its called typography and orthography.

I think that's a very different case. Maybe some analogies might be drawn between some of that work and some of the lower-level aspects of reading code (related to syntax noise etc), but code readability, if it's a defensible concept at all, is a far more complex and layered phenomenon than letter & word recognition.

The first thing a researcher would need to establish is whether or not readability even exists as a natural kind apart from familiarity. I don't know the field, so this might already have been pursued somewhere.


> There's nothing unreadable [...] it's just unfamiliar if you don't know Rust

Agreed. Note the same applies to Haskell's syntax :)

People confuse "readable" with "based in my knowledge of Java and C, I can't make head or tails of this notation without reading a tutorial first", which in my opinion is not a sensible conclusion.


It's not entirely sensible, but it is understandable. I don't think most programmers are truly aware of how much they know, and how deeply automatic their recognition of programming constructs has become.


Box puts something on the heap and returns a pointer to that thing. Here it is necessary because otherwise the Rust compile wouldn't be able to determine the size (in memory) of a Tree.


Box<T> is a type, so it’s confusing to say it returns a pointer. It holds a pointer.

GP should note that C++’s unique_ptr isn’t an obscurity.


Calling it Box is confusing. unique_ptr would be an improvement for the name, or maybe HeapRef. The problem is exactly that rust chose a misleading name Box (boxed types are something entirely different in most languages) instead of the obvious C(++)/Java-like _ptr, ref, * or & notation/convention


Boxed types in Java are basically the same thing: a heap allocated version of an otherwise stack-allocated type.


I thought the name was pretty clear; when I saw it in some list of different kinds of Rust pointers, I knew what it was immediately.

It doesn't matter if some people are confused, because you can just explain what it is in 3 seconds. What's important for such a ubiquitous type is that the name is short.


> instead of the obvious C(++)/Java-like _ptr, ref, * or & notation/convention

That would be very misleading since Box represents a heap-allocated owned value


The syntax and terminology (enums, structs, etc.) is more familiar to C++ / C# / Java users.


I always found the use of "enum" for things that are not really enumerable in a useful way to be very confusing. Or are Rust "enum"s enumerable in some subtle way that I don't recognise? Is it just some vestigial term that now has no relation to its original meaning? At least in C, "enum"s are enumerable because they are just integers.


Rust's "enums" look more like tagged unions to me. I guess the tag is enumerable? Although I also don't understand why Rust called tagged unions "enums."


They enumerate a finite set of disjoint cases, so in some sense they are an enumeration. But the real reason is, of course, that sum types can be seen a generalization of C enums, so the syntax was chosen to maximize familiarity.


Cases aren't values, though. a,b,c is an enum type over enumerable data. int, char, float is an enum kind of enumerable types.


They enumerate a possible set of valid values. Hence “enumeration.”

They are tagged unions, but sometimes, the tag doesn’t exist. Or rather, invalid parts of values can be used so that the tag isn’t an extra bit of data, but instead is built into the same space. “Tagged union” gets too deep into only-mostly-accurate implementation details to be a good name.


They seem to enumerate a possible set of valid structures which can hold arbitrary values. I guess it's just so different from C enums I'm having trouble understanding why the name was repurposed. It's probably less different from C++/C#/Java enums (I know at least one/some of those languages have more complicated enums than C).

Sure, tagged union implies a particular implementation that may not always be required, but it's conceptually easy to understand and doesn't have the historical baggage. (Maybe a better description would be strongly typed union? I want to make clear I'm unfamiliar with Rust and just guessing based on the syntax presented.) I think the biggest problem with "tagged union" (or the even longer "strongly typed union") is that it just isn't a good keyword name — it's two (or three) words and fairly long. No one wants to type out 'tagged_union' and from that sense, 'enum' is better. I don't have a better suggestion for you, and IIRC Rust 1.0 has now frozen the language to some extent.

Thanks for trying to explain, I appreciate it.


They can be any data type; just a name, a struct, a tuple struct, or even another enum.

I think also, likewise, “union” sounds strange unless you have C experience. Many of our users do not know C, and so that name doesn’t help them either.

In the end names are hard.

Happy to, you’re very welcome :)


C, or a little set theory. :)


> In the end names are hard.

Indeed! Thanks again.


I would say the second one is more intuitive to me, it reads like I have a Tree type and it can be either a Leaf or a Node.

The rust example to me isn't immediately obvious that it's an either / or situation other than that must be how an enum works (I know enums from other languages)


Literally the punctuation ("<>{}()") and lexical structure is more familiar to anyone who's written any C-family language (C, C++, Java, ...). I say this as a C programmer with more or less equal (near-zero) Rust and Haskell experience.


Because of the memory safety in rust, it’s now clear those left and child nodes could be missing since there’s no null in rust.


Given that GP already stated they don't know enough Haskell to translate all the examples, it seems pretty clear to me that by "familiar" they mean "in a language I'm familiar with".


There's no reason why it should be more familiar. But it is more familiar to the huge number of developers who are used to ALGOL/C family languages.


The word “should” can also be “used to indicate what is probable” (according to the OED). I think that's the way GP intended to use it. As in, "why is it probable that the syntax is more familiar?"


    Node(A, Box<Tree<A>>, Box<Tree<A>>),
is obviously nested (a Node contains an A and two Trees) and the trees are optional (they are contained in a Box, whose purpose is obvious without even knowing the language)

    | Node a (Tree a) (Tree a)
might or might not be nested, given the cheerful taste for currying and juxtaposition without punctuation that prevails in Haskell syntax, and it isn't obvious what purpose the parentheses serve (just grouping ?) and whether the trees are optional.


To be obvious that it’s nested, you need to know that < and > are used as approximations for ⟨ and ⟩. You need to know that they’re brackets, not operators.

You need to know that “enum” means that commas in the next section are different than usual, but only one layer deep—check nesting carefully. You need to know about type parameters in either version.

I had a similar experience learning my first ML: I couldn’t even tell how many words each word would gobble up, because I didn’t know the reserved words get. Syntax highlighting helped, and it’s not a problem after a week or two. It’s no worse than figuring out what’s a binary vs unwary operator in C and its descendants.

Also: it’s not obvious to me that Box means optional/nullable. I’d expect it to mean a required non-null pointer to a heap element.


Children of a non-leaf tree node must be optional because otherwise the node would be forced to have both children. It's therefore obvious that Box means optional; otherwise there would be a bare Tree<A> to represent a mandatory reference.


I don't know Rust (though I'm familiar with Java & C) and there's nothing obvious about Box. In fact, I'm just learning from your comment that it means optional.


(Box does not mean optional, the parent is wrong. Option is the type for optional. Box is basically a mallloc, placing a value in the newly allocated memory, and then a call to free automatically when it goes out of scope. The box itself is a pointer to the heap. It’s not allowed to be null, in some sense, the opposite of optional.)


How is the Haskell example not "nested" according to your definition? It contains an "a" and two Trees, just like the Rust example. Nothing is optional. (There might be a Maybe or similar type, possibly hidden behind a type name, but it would still not be optional.)


> given the cheerful taste for currying

Currying makes no sense in type definitions. It's like saying that in Java you aren't sure if "String name" will run something, "given Java's cheerful taste for running things".

To me, it's obvious the parentheses in "(Tree a)" are grouping things, which is the most immediate (and correct) interpretation, but I'll agree this is more debatable.


They are clearly grouping "Tree a", but do they mean that it is optional?


Why would you assume it's optional? In which popular programming language do parentheses mean "this is optional"? When you read a formula such as

    (x + 1) / 2
do you assume part of it is optional?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: