The flaw in Java design is that String is a reference type. Logically, it's a va...

The flaw in Java design is that String is a reference type. Logically, it's a value, and should be treated as such (and not copying the underlying data should be an implementation detail).

The problem is that in Java, there are no user-defined value types, only primitives; and primitives can't have methods. So if it were a primitive, you'd have to write "String.length(s)" etc. Also, all other Java primitives are basically bit sequences that are interpreted in one way or another, but that wouldn't be the case for strings.

.NET/C#, though, doesn't get that excuse. It totally could have defined String as a struct with an internal char[] field, and then there would be no question of value/reference equality for its ==. But that would also mean that you couldn't use null for strings - and they didn't have nullable value types back then. I suspect that, plus the overall mindset carried over from Java, is what won the day.

Side note: there's no hard and explicit distinction between reference and primitive types in C itself, nor in C++. If Python is "everything is an object reference", and Java is "everything is an object reference except for primitives that are values", then C++ is "everything is a value, including object references". Thus, there's no ambiguity with == in C++ - it always compares values, it's just that sometimes those values are pointers.

I think sometimes that perhaps the implicitness of object references that is so common to OO languages, that I think was introduced by Smalltalk, is a mistake. It's interesting that Simula-67 didn't have it - although it was very Java-like otherwise, having only primitive values and references to objects (i.e. no objects as values, like C++), it distinguished the two consistently by using distinct equality operators (= vs ==), and even distinct assignments (:= vs :-).

Or, alternatively, treat everything as an object reference, but make == do implicit dereferencing as well, as Python and VB do; and provide a completely distinct operator for reference equality, such as "is". Python has a problem in that regard in that it has a default implementation of == for all classes that compares references, and so it ends up used as reference equality in practice sometimes. It would be better to have no default for == at all, just as there's no default for other comparison operators; this is what VB does.

It might also be best to stop talking about value and reference types altogether - what's actually important is the presence or absence of identity. Then everything can be an object reference, but not all object references can be compared for equality (and in practice, under the hood, the implementation can just skip the references and copy the data itself - or not, depending on mutability and size).