What behaviors are undefined in rust? Oh wait nobody knows, since it has no stan...

jcranmer · on Aug 17, 2023

* Reading uninitialized memory

* Violating pointer provenance

* Out-of-bounds pointer accesses (though unlike C, I think, it's legal to make a pointer go out-of-bounds and bring it back in-bounds and use it)

* Use-after-lifetime

* Storing trap representations in variables

* Having two mutable references to the same memory location

* Data races

Not an exhaustive list, and C has most of these (even the last one, although change "two mutable references" to "two restrict pointers"). Of course, C itself doesn't have an exhaustive list (J.2 is not, in fact, an exhaustive list).

JonChesterfield · on Aug 17, 2023

Pointer provenance is a nice example. A block of memory cannot be read as an array of simd types sometimes and scalar types otherwise. It can't contain atomic values which are operated on using non-atomic operations during program startup before you spawn any threads.

There were proposals to let one mmap existing structures but I don't know if any landed. Usually done with reinterpret cast and hoping that rule violation doesn't break you.

Pointer provenance does make most application code faster but other times it opens a performance gap that you have to step outside of C++ to close. Compiler extensions, switching off the analysis, changing language.

agalunar · on Aug 17, 2023

> A block of memory cannot be read as an array of simd types sometimes and scalar types otherwise.

As far as I can tell, it is currently the case that, using raw pointers, this is not actually undefined behavior (but I never entirely trust my conclusions on these matters).

"&mut T and &T follow LLVM’s scoped noalias model" [1][referring to 2 and 3] but I am fairly sure this does not currently apply to raw pointers, and "provenance is implicitly shared with all pointers transitively derived from the original pointer through operations like offset, borrowing, and pointer casts." [4]

[1] https://doc.rust-lang.org/reference/behavior-considered-unde...

[2] https://llvm.org/docs/LangRef.html#pointeraliasing

[3] "noalias" under https://llvm.org/docs/LangRef.html#parameter-attributes

[4] https://doc.rust-lang.org/core/ptr/index.html

Also excellent are

https://faultlore.com/blah/fix-rust-pointers

https://www.ralfj.de/blog/2018/07/24/pointers-and-bytes.html

https://www.ralfj.de/blog/2020/12/14/provenance.html

https://www.ralfj.de/blog/2022/04/11/provenance-exposed.html

It seems likely you'd already be familiar with these; I'm just putting them out there for anyone interested.

JonChesterfield · on Aug 17, 2023

LLVM can represent various aliasing relationships, modulo some risk of C++ inspired bugs in some passes. They might all be stamped out now. I remember a bug report about one that was open for many years.

I'm happy to hear rust can (probably) represent the same relationships LLVM can. C++ cannot, at least as of about two years ago when I last looked through the corresponding papers. All it can do is different types do not alias, where atomic_int and int are different types.

tialaramex · on Aug 18, 2023

No, LLVM definitely still has big problems. https://github.com/llvm/llvm-project/issues/45725 is an example, the symptom in Rust is that you can write what is in effect a pointer comparison in which LLVM ends up claiming that two things are different, although they are also identical...

angiosperm · on Aug 17, 2023

Use of mmap itself is undefined in the language.

Posix provides a definition that programs rely on, instead. Implementers are allowed to define literally anything the union of all standards leaves undefined.

JonChesterfield · on Aug 17, 2023

Mmap itself is alright. You've got a void* from somewhere, that's OK. You can placement new into it to make objects.

What isn't allowed is casting it to a hashtable type and then using it as such. Because there is no hashtable instance anywhere, and specifically not there, so you've violated the pointer aliasing rules.

The obvious fix is to guarantee that placement new doesn't change the bytes, perhaps only for trivially copyable types or similar constraint. I didn't see the proposals in that direction land but also didn't see them fail, so maybe the newer standard permits it.

LegionMammal978 · on Aug 17, 2023

As I understand it, that's precisely what std::start_lifetime_as<T>() does: it effectively performs a placement new to create a T object, except that it retains the existing bytes at the address. It only works with implicit-lifetime types (i.e., scalars, or classes with a trivial constructor), though, so it probably wouldn't work with your hash table example, except perhaps for an inline hash table.

JonChesterfield · on Aug 17, 2023

Superb! Looking through https://en.cppreference.com/w/cpp/memory/start_lifetime_as, this appears to be the right thing. It also has volatile overloads (which it looks like placement new still does not). This doesn't appear to be implemented in libc++ yet but that seems fixable, it'll go down the same object construction logic placement new does. Thank you for the reference, that'll fix some ugly edge cases in one of my libraries.

angiosperm · on Aug 18, 2023

To call mmap, you are calling a function that is not in the collection of translation units that makes the program. Libraries are beyond the Standard. Include files not listed in the Standard, likewise. So you rely on Posix, there.

For objects got by casting void* to a known type, you rely on the compiler being unable to prove that the objects didn't exist already, somewhere in the program. Pray the compiler doesn't get smart enough to notice no constructor for that type is linked, meaning you that you couldn't have made that object.

klankerzz · on Aug 18, 2023

Can't you also just say that mmap() is a magical function that you don't know what it does?

For all the compiler knows, mmap() could just be a:

  static Hash_Table h; return (void *)&h;

And make that the rule for all externally defined functions.

gpderetta · on Aug 18, 2023

Indeed. That's the way to reason about correctness of opaque functions.

proto_lambda · on Aug 17, 2023

There is no undefined behaviour in Safe Rust. You're right about Unsafe Rust of course.

lionkor · on Aug 17, 2023

The ultimate "the code is the documentation" is "the compiler is the language spec".

thesuperbigfrog · on Aug 17, 2023

>> The ultimate "the code is the documentation" is "the compiler is the language spec".

Rust has a great potential to become a replacement for C and C++, but the lack of a language specification is a shortcoming that needs to be addressed for it to see wider adoption, especially for safety-critical systems.

If the Rust compiler does something surprising, people will ask, "Is this a bug?" and without a spec the answer becomes the language developers or the community asking, "What should the compiler do in this situation?".

It makes sense because the correct behavior (whatever that is) has not been defined, but it has a feeling of "we are making this up as we go along" because there is no formalized answer defined. While this approach is fine for running your website or building a command line tool, it is not acceptable for safety-critical software. If the software breaks and people die, the "we are making this up as we go along" approach is not acceptable because it has too much risk.

lionkor · on Aug 17, 2023

I fully agree, and its definitely a strange feeling coming from C++ to not have a single, complete and extensive spec to read up on if all else fails.

I want to like Rust, but its already a kitchen sink on par with C++ in complexity and misused quirks, not to mention macros which hide complexity just like C macros did, that the lack of a committee and spec makes it very difficult to trust that it won't get more and more features as time goes on (becoming like C++, in only the bad ways).

I understand they have an RFC process, but thats not enough for a language which is now so commonplace in discussion (usually in the form of "if you did it in Rust, this problem wouldnt exist", which is often even true).

tialaramex · on Aug 17, 2023

> a single, complete and extensive spec to read up on if all else fails

Did you try using the "single, complete and extensive spec" ? What for and how successful was that ?

The ISO C++ standard was published in 1998, so, about 25 years ago. One of the things it says, even in the C++ 23 standard that's likely to be published later this year, is that some input files have Undefined Behaviour during parsing.

But, wait a minute, Undefined Behaviour is a runtime property. Parsing isn't a runtime activity. This "complete" specification clearly was never even proofread. Which makes a kind of sense, it's an enormous sprawling document, why would anybody properly read it. But, if they actually don't, what's the point ?

The fix for this - hopefully to land in C++ 26 - is P2612, named "UB? In my lexer?" because it's been so long that even "It's more likely than you think" memes https://knowyourmeme.com/memes/its-more-likely-than-you-thin... are now dad jokes. But don't focus on this particular minor bug, which is not a big deal, focus on what it means about the value of the specification.

jcranmer · on Aug 17, 2023

> But, wait a minute, Undefined Behaviour is a runtime property. Parsing isn't a runtime activity.

So it turns out you're wrong here. UB can also be intentional extension points, and these aren't implementation-defined behaviors because honestly I don't want to track down 25-year-old documents to figure out what's going on here. This use of UB in the standard has diminished greatly in the past few decades (although there are still remnants of it kicking around, e.g., the lexer UB), and the extra focus on nasal demon aspects of UB from ~15 years ago really obscures this nature of UB.

One annoying thing about UB is that it is actually several different concepts with the same name. In addition to the aforementioned use, it can also refer to behavior that can go haywire in ways really impossible to constrain (buffer overflow is the classical example here). Or it can refer to intentional optimization instructions (e.g., restrict and strict aliasing). Or it can refer to axioms you need to have hold or else you have no clue how to think about semantics (pointer provenance, data races). Or, incorrectly but depressingly common, lay people can use it to refer to what the specification considers implementation-defined behavior (e.g., size of data types). Working out which kind of UB people are referring to when they use the term is frustrating at best, and frequently people are using one kind of UB to justify how all kinds should be handled. (Annoyingly, some of those people are committee members.)

tialaramex · on Aug 18, 2023

> because honestly I don't want to track down 25-year-old documents to figure out what's going on here

When it was written these weren't 25 year old. So this seems like a poor rationale. The answer is that they should just have written that it's ill-formed and they didn't. That's a completely understandable mistake, but it's telling that it wasn't fixed for so long people grew up, had kids, and the kids are writing the proposal to fix it. As to the idea of multiple "kinds" of UB, the standard defines this term exactly once.

There are a few things that I expect to see from Bjarne, Herb and WG21 generally that will mean they've finally figured out the true nature of the problem. When / if I see those things they may begin work to get C++ to where it'd need to be to stay relevant - not "relevant" the way COBOL is relevant, but relevant the way C++ still is in 2023. Meanwhile they're gliding, losing momentum.

Firstly, and the biggest hurdle, that the problem is Cultural. Yes Rust has some nicer technology, that's not enough, the technology supports a Culture, you could build C++'s culture with Rust's technology but that's worse, so, don't waste your time doing that.

Next though, most important of the technical insights and unwelcome if you spent your life on the C++ language, there are two choices of what to do about Rice's Theorem and C++ chose wrong, it will need to fix that, and the fix isn't cheap because it's a broad change to the entire language standard. If you have no stomach for that fix, it's likely actually better to announce that unsafety is your intent, and wrestle with the consequences as they are than to pretend you don't need the fix to get safety which is false.

What I mean here is, suppose I wrote a program which I say is safe, but the compiler can't see why it's safe. In Rust that's simple, the program doesn't compile. In C++ though the program compiles, and, if I'm correct, it's safe, but, if I'm wrong it has Undefined Behaviour (actually it's a bit worse, but that'll do in context). Henry Rice showed that we have no choice in these rich high level languages (which want non-trivial semantic properties of software), such programs will definitely exist, C++ allows this to happen a lot and Rust works hard to avoid that where possible, because in C++ the consequence is it compiles anyway and in Rust the consequence is it won't compile so that's undesirable.

lionkor · on Aug 18, 2023

> Did you try using the "single, complete and extensive spec" ? What for and how successful was that ?

Yes, I did. I used the ISO standard of the version of C++ I was using (17 or 20, dont remember) to look up how variables are initialized if they arent explicitly initialized, and it turns out the standard has a very clear definition of e.g. variables in a function which are of a class type have their default ctor called, or something like that.

So it was very successful. No idea what your UB rant is about.

Dylan16807 · on Aug 18, 2023

> https://knowyourmeme.com/memes/its-more-likely-than-you-thin...

I like that your post exposes a bug in HN's parsing. 61 character URL gets turned into 60 characters plus 3 periods.

lionkor · on Aug 19, 2023

The href is still correct in their comment, while in yours, since you copied it, its not

Dylan16807 · on Aug 19, 2023

The href is correct but the link shortener made it longer, so I call that a bug.

I can only quote the visible text because of how the site works. Maybe I should have put it in a code block so it wouldn't link.

iknowstuff · on Aug 17, 2023

Rust macros don't hide anything. They're hygienic and clearly annotated when used.

mike_hock · on Aug 17, 2023

Rust macros are a crutch to work around the language's shortcomings. It's just a better crutch than C's.

duped · on Aug 18, 2023

I'm seeing rust adoption accelerate largely because time is spent improving the language and implementation rather than bemoaning a spec.

> without a spec the answer becomes the language developers or the community asking, "What should the compiler do in this situation?".

No, they ask "what does the RFC say"

> If the software breaks and people die, the "we are making this up as we go along" approach is not acceptable because it has too much risk.

The spec does not define the software. The software is as the software does. Having or not having a spec doesn't protect from bugs - people do.

What you're taking about is covering one's ass, not specification.

thesuperbigfrog · on Aug 18, 2023

>> The spec does not define the software. The software is as the software does. Having or not having a spec doesn't protect from bugs - people do.

>> What you're taking about is covering one's ass, not specification.

They are related.

In safety-critical software, bugs can cause people to die. Without a spec, no one will use Rust for safety critical software. It would be too risky and no company would accept that level of risk.

For example if software that controls an airplane is written in Rust and an error occurs during flight, what happens? The software can't just panic and crash or the airplane might crash.

The Ferrocene project (https://ferrous-systems.com/ferrocene/) is working on producing a safety-critical Rust specification (https://github.com/ferrocene/specification) because having a language specification matters for safety-critical work.

duped · on Aug 18, 2023

How does a spec fix bugs?

AnimalMuppet · on Aug 18, 2023

It doesn't fix bugs (in the language/compiler). It documents exactly what the language is supposed to do, and therefore what the language user can count on. Without a spec, all you can count on is what you can experimentally determine that the language does (and even then, you can't be sure that it will do that in all situations).

This actually reduces bugs in applications, because it means that the app writers now know what the language will actually do, and can write their code accordingly. Without a spec, they will too often have a cargo cult understanding of what the language does, and so their code won't do what they intend it to.

If there's a spec, and the compiler/language doesn't do what the spec says, now you can definitively say that it's a bug in the language. That can still cause bugs in your app. But at least you can now definitively say that the language implementation is at fault, and demand that they fix it, and you can agree with the language authors on what "fixed" means.

duped · on Aug 18, 2023

So what you're saying is, a spec is no better than documentation and design documents.

It doesn't sound materially helpful or like it saves lives - it seems like a contrived requirement.

AnimalMuppet · on Aug 18, 2023

You do not have the right to put words in my mouth, or to claim that your twisted version is what I was saying.

A spec is more detailed and more precise than (other) documentation and design documents. ("Other" because the spec is itself part of the documentation, and one of the design documents.) For the safety-critical software itself, you would demand a full, formal spec, not just "documentation". (At least, if you wouldn't, then others would. and they are right to do so.)

But if you demand that for the software, doesn't it make at least some sense to ask it of the compiler? And even if you don't think it makes sense for the compiler, it seems reasonable that the standard libraries of the language should face the same requirements as the subroutines that are part of the safety-critical software.

iknowstuff · on Aug 17, 2023

>a shortcoming that needs to be addressed for it to see wider adoption, especially for safety-critical systems.

This seems like just a hunch of yours that does not seem to be reflected by the real world.

thesuperbigfrog · on Aug 17, 2023

>> This seems like just a hunch of yours that does not seem to be reflected by the real world.

What safety-critical systems are written in Rust?

Where can I buy a validated Rust toolchain for safety-critical work?

Ferrocene is an effort to build a safety-critical Rust, but it is not done yet:

https://ferrous-systems.com/blog/ferrocene-update/

mjw1007 · on Aug 17, 2023

The good news is that the Rust project has recently agreed to write a specification, and has a budget to hire an editor for it.

The less good news is that it's likely to take a long time before anything resembling a complete description gets written.

You can follow its status at https://github.com/rust-lang/rust/issues/113527

thesuperbigfrog · on Aug 17, 2023

>> The good news is that the Rust project has recently agreed to write a specification, and has a budget to hire an editor for it.

This is awesome to hear. Following that issue . . .