Why is the author talking about null in the intro, which implied using pointers and thus boxed objects and then running benchmarks on integers? That makes no sense to me.
Because it's a benchmark on Optionals? In Java an Optional<Long> requires boxing, in Rust it does not. You'd expect a "sufficiently smart compiler" to detect this and avoid needless boxing after inlining and escape analysis but clearly that is not the case.
Note that "Long" in Java can be null because it is boxed, "long" (lowercase) however cannot be null, but it also can't be Optional<long>. Java sucks :)
Rust doesn't have optionals built-in. The language has no special support for them (beyond the try operator); just like Java, Rust's optional type is provided by the standard library, but it could be trivially implemented yourself and your implementation would have the same behaviour and performance characteristics. It's literally just:
enum Option<T> {
Some(T),
None,
}
What makes Rust fast here is that it has value types and can optimise them.
Even for the try operator (?) it is these days just an unstable Trait that you are welcome to implement for your own types. Once that's stabilized it's not any different from how types can implement PartialEq and get == and !=.
The extra edge beyond not needing boxing is from niches, if T doesn't occupy all possible bit representations Rust will squeeze None into one of the unused bit values. Several standard library types have such niches, but today there is no stable way to make your own yet.
> Several standard library types have such niches, but today there is no stable way to make your own yet.
However note that if a type has suitable niches enums will automatically take advantage of those. So that’s more of a concern with T than with a bespoke Option, that’ll work OOTB.
Yes, Rust's Guaranteed Niche Optimisation says if you make this:
enum Foo<T> {
Nasty,
Nice(T),
}
... and T has a niche, it is guaranteed that Foo<T> has the same size as T and Nasty just slots into the niche.
In practice, other fancier things will get optimised, but Rust doesn't guarantee exactly what will or will not be handled, e.g. if the niche has room for four things, and I make an enum with four extra plain variants, the guarantee doesn't apply but probably that'll work.
• #[rustc_diagnostic_item = "Option"]: the compiler knows about Option for the purpose of improving its error messages. I’m not sure how this is used, and am not looking it up now.
• #[lang = "None"] and #[lang = "Some"]: lang items allow the compiler to hook things up, like knowing the + operator maps to the core::ops::Add trait. https://github.com/search?q=repo%3Arust-lang%2Frust+%28optio... suggests that these lang items are only being used by Clippy, to provide better diagnostics. So if you use your own Option type, you won’t get Clippy lints related to Option.
• I suppose there are also the #[stable] attributes, which can’t be used outside the standard library, and which cause visible changes in generated documentation. When talking about strict accuracy, I guess that counts!
Anyway: for practical purposes this is just minor diagnostics and documentation stuff, not actual functionality, about which there is nothing special.
Since you mentioned the try operator, might as well look at the trait implementations on Option too, https://doc.rust-lang.org/std/option/enum.Option.html#trait-.... There are a few things marked “This is a nightly-only experimental API.”, which you can implement yourself if you’re willing to take that stability risk; they’re all linked to try_trait_v2, which is what backs the try operator, `?`. Once it, or its successor, is stabilised, we’ll effectively be back to there being absolutely nothing special about Option, as you’ll be able to implement those traits for your own Option type as well.
Yeah I was on my phone thinking, isn't Optional implemented as something else? and yeah it's an enum. oops :)
Which in the Rust world probably just needs to be one byte, probably a bit in most cases, in a register for state.
I don't think Java will be that optimized with value types. Even if the heap allocation is gone, there's still probably going to be a sizable object header?
The current proposal plans on not defining anything about where and how those things are gonna be stored, but the plan behind all that (and how plenty of classes of the JVM have been designed even in foresight) is that a value type can be flattened (and each value type field recursively so). But do note that it is only a possibility — the JVM might decide that flattening all that would have a negative impact due to increase in size and opt for storing object references.
But the important part is losing identity - even if a field a few layers deep won’t be flattened, it doesn’t have to work the same way for the same data at every place - the JVM now is free to instead of copying the reference only, can choose to copy the value. This trivializes things like escape analysis as well (we can just stack allocate this value class, if it does turns out to escape, just copy to the heap).
> Note that "Long" in Java can be null because it is boxed, "long" (lowercase) however cannot be null, but it also can't be Optional<long>. Java sucks :)
I think using primitive types as generics is something that makes Java less ergonomic than C# (where they’re called unmanaged types), whether it is considered justified or necessary.
To say Java sucks because of this is a bit much. To say Java sucks because you can’t avoid null is definitely warranted. (You can say good things about Java, and not being able to opt out of nulls is not one of them.)
But `Optional` could have been a value type from the start and had effectively zero overhead, especially if it were specialized for primitive types. There are 8 primitive types, so supporting them all with a value-type optional would not have been the end of the world, even if it was only a language level optimization (e.g. optional becomes a 96-bit-128-bit type and the compiler is responsible for ensuring primitives are wrapped/unwrapped specially).
GNU Trove is a collection library that focuses on optimizing for primitive types and is significantly faster that Java collections which require boxing.
There is OptionalLong in the standard library, though. Java just can’t do generic specialization as of yet, which is necessary for Optional<long> to be a different implementation (+ value types of course)
But `OptionalLong` is a heap allocated object. I’m suggesting a wider primitive that can be passed on the stack (value plus flags to indicate state) bypassing any need for allocation. In Java this can only be provided by the compiler (without resorting to some programming and value convention).
Well, Java doesn’t specify that it has to be heap allocated and it will in fact not allocate that object if the producer function can be inlined and the escape analysis deems so (which happens surprisingly often).
But here is another option if you really want to avoid that allocation (besides of course using ByteBuffer and similar which is always a possibility): https://news.ycombinator.com/item?id=35133577
It doesn’t specify, but in practice (at least when I was last using it around 11), it is. Stack allocation has very different semantics than single allocation buffers, though I’m not sure I follow your logic.
What is the reason for not making a OptionalLong a 72-bit (or larger if you care about alignment), primitive value but keeping object semantics at the language level? Someone who thinks they have an object OptionalLong is already looking at minimally 112-bits for the class pointer and value on a non-empty value or add another 96-bits onto the if it’s an `Optional<Long>`. What’s missing with the value-type is shared references to the same instance, but for an immutable optional to an immutable long, that doesn’t make much difference in practice. That’s the only drawback I can see. In practice, how often is it important to have identity properties for boxed primitives? That’s already probably caused more bugs than it’s benefits.
> Java's language philosophy is simple - everything must be an object.
Except primitive types like long in this case, which are not objects.
This was a performance-consistency tradeoff made in the early 90s. It made sense at the time and now doesn't make sense to some people, but that's ok. I wouldn't say Java sucks because of that either. Now type erasure, that's a different topic.
The lack of optimisation here isn’t a dealbreaker. The fact that Optional<T> can also be null, because it’s a reference type, makes it a less safe implementation of optionals. That’s why the newer standard library uses static methods in a lot of places, e.g.:
Rust doesn't have optionals built into the language except insofar as Option<T> is defined in the standard library. The difference is that Rust allows you to define new value-types, whereas Java has a small fixed set of "primitive" value-types.
> The task was to compute a sum of all the numbers, skipping the number whenever it is equal to a magic constant. The variants differ by the way how skipping is realized:
> 1. We return primitive longs and check if we need to skip by performing a comparison with the magic value directly in the summing loop.
> 2. We return boxed Longs and we return null whenever we need to skip a number.
> 3. We return boxed Longs wrapped in Optional and we return Optional.empty() whenever we need to skip a number.
And the only one that truly would make sense would of course be Optional<long>, i.e. the optional primitive long...
First having to declare the value in the one type of four that makes least sense, then praying that the compiler optimizes the allocation of not one but TWO(!) objects(!) in order to represent "maybe a number" is basically why I ragequit Java almost 20 years ago.
20 years ago there were no generics, so you couldn’t have implemented it that way. You could have written a class OptionalLong { long value; boolean isSet; } at the time and that would have only a single allocation overhead. Alternatively, have an array of longs and a boolean array marking which ones are set, with a trivial wrapper object over that for essentially zero overhead.
Java’s tradeoffs are maintainability in huge teams over multiple years with relatively fast performance even if you write your code very naively, with top notch tooling, observability, etc. In the rare case you have to optimize in the hot loops you can allow to have less readable code like I mentioned.
> 20 years ago there were no generics, so you couldn’t have implemented it that way. You could have written a class OptionalLong
That was the actual reason, yeah. Basically having to make IntList and so on.
> Java’s tradeoffs are maintainability in huge teams over multiple years with relatively fast performance
When I did switch to C# in 2003, it was very young. Since then generics have been bolted on and so on, but I didn't find any hit to maintainability due to this. What I do think was sad though is that when Generics were bolted on (and value types obviously were there all along), that the APIs didn't immediately include some easy and obvious wins like Option<T>. Those have been reimplemented since ad nauseam.