> To summarize, enforcing [Rust's] borrowing rule in C++ is unfortunately not so simple because there is a lot of existing code that creates multiple non-const pointers or references to the same object, intentionally violating the borrowing rule. At this point we don’t have a plan of how we could incrementally roll out the borrowing rule to existing C++ code, but it is a very interesting direction for future work.
This is actually possible for C++, if we add a concept of pure functions: functions that don't modify anything indirectly reachable from arguments (or globals). Basically:
* Annotate those parameters as "immutable" or as coming from the same immutable "region". Anything reached through them would also have that immutable annotation.
* Anything created during the call would not have that annotation.
The type system, basically a borrow checker, could keep them separate and make sure we don't modify anything that came from outside the function.
We're currently adding this to Vale [0], It's a way to blend shared mutability with borrow checking, and it gives us the optimization power and memory safety of borrow checking without all the constraints and learning curve problems.
That's not the issue here- there are plenty of schemes that could be used to enforce the rule, including just copying Rust's if you wanted.
The issue is that a bunch of existing code actively holds and uses multiple mutable references to the same objects. It would simply not be able to adopt the chosen scheme, regardless of how it's spelled.
That's what this "region borrow checking" solves: if a pure function (or block) regards all previously existing memory as one immutable region, it doesn't matter if there was any mutable aliasing happening before, because everything inside that region is shared and immutable now.
Don't get me wrong, it's not the only missing piece for C++; C++ lets pointers escape to other threads which might modify the objects while our thread considers them immutable. Vale solves this by isolating threads' memory from each other (except in the case of mutexes or Seamless Concurrency). Luckily, there are plenty of schemes that can solve that particular problem for C++.
If I had infinite time I would love to figure out how to implement this into C++, after the proof-of-concept in Vale. It's a fascinating topic, and an exciting time in the memory safety field, full of possibilities =)
> > a bunch of existing code actively holds and uses multiple mutable references
> if a pure function (or block) regards all previously existing memory as one immutable region
Yes, but existing code doesn't do that. So you couldn't leverage this against existing code which is intentionally mutating shared memory. Outside of a rewrite it must continue to do so in order to function.
Unless you meant that new functions could be annotated as pure in order to avoid future errors in code not yet written?
I'm not so sure a mandatory system would go over well for C++ since many who use it want shared mutability. I'd be fine with an opt-in system of the sort D seems to be headed towards but I'm not really interested in the tight constraints imposed by Rust. I'm not writing OS kernels or crypto routines over here.
> Unless you meant that new functions could be annotated as pure in order to avoid future errors in code not yet written?
Precisely. New functions can be written to treat all pre-existing memory as immutable, and we can be confident that that data won't change.
For example, if the pure function sees a unique_ptr<MyThing> somewhere in pre-existing memory, we'll know that that MyThing existed at the time of the pure function call, and will keep existing until the end of the pure function call; nobody can destroy it in-between.
Also, it's surprising how many functions in C++ are already effectively pure, and can add the annotation with little (or zero) refactoring. But we don't have to do this, the benefit for new code is enough.
> I'm not so sure a mandatory system would go over well for C++ since many who use it want shared mutability
This is opt-in; we don't have to annotate all our functions as pure, and we can still use shared mutability freely outside of pure functions. We can then hand a shared-mutable blob of data to a pure function, which will then treat it as a shared-immutable blob of data.
That's why I like this approach: one can write an entire program without it, and one can start using it whenever they want. It composes well like that.
Also, if one wants to leverage the region borrow checker outside of pure functions, that's possible too. In Vale, even when not in pure functions, one would make use of `iso` objects (little isolated sub-regions in an otherwise shared-mutable region, similar to Pony's `iso`) and treat those as immutable or mutable as one desires. I suspect it could work in C++ too, but nobody's tried it so I can't say for certain.
no, `const` means you can't modify the data through that `const` variable (excepting shenanigans), not that it's immutable. It can be rather confusing.
> no, `const` means you can't modify the data through that `const` variable (excepting shenanigans), not that it's immutable.
But that's the whole point, isn't it? I mean the selling point of these lifetime annotations is to allow developers to specify that data within a thread cannot be modified by the code running in that thread.
No, const can be added to objects at any declaration or callsite, meaning there can be many const- and non-const references to the same object within the same scope/thread/program/address space/execution context.
I think you are discussing different things, and in the process missing the whole point.
It's one thing for an object to be immutable throughout is life cycle. That's immaterial to this discussion.
It's an entirely different thing that the same object cannot be changed within specific contexts.
If you want to ensure that a thread has read access to an object and it cannot be changed accidentally then passing a const reference to that object already ensures that. That's pretty much the whole point of const.
I think you are missing that const is insufficient for safe concurrency; it requires programmer discipline to ensure there are no shared mutable references.
Const also has no bearing on lifetimes. Constexpr/consteval do, but those objects always have static lifetime so it's sort of irrelevant.
> Vale solves this by isolating threads' memory from each other (except in the case of mutexes or Seamless Concurrency).
Given the whole point of threads is concurrency while sharing a memory space, and given we already have semantics on which data is thread local and how the ownership of heap memory should be handled, what's the point of that?
To me it sounds like some people fail to understand how the complexity they are trying to pile onto C++ is slowly killing C++ due to the sheer volume of all the cognitive load they're trying to add.
No, that is not what this "region borrow checking" solves! Both Rust and your "pure functions" are perfectly capable of enforcing this on new code, that is not the point.
Neither of them can address the line you quoted from the RFC. The existing code they're talking about is not pure. It uses multiple non-const pointers to the same object, at the same time, and temporarily carving out immutable access in between those uses is not the hard part- Rust's scheme handles that just fine already.
>>if a pure function (or block) regards all previously existing memory as one immutable region, it doesn't matter if there was any mutable aliasing happening before, because everything inside that region is shared and immutable now.
Pure functions don't usually need any of this. If they are handed a pointer it has to be to immutable data or not shared by another thread.
> C++ doesn't have a concept of immutable data, or data not shared by another thread, which is part of the challenge.
Does it really need it, though? In the real world C++ apps already use higher-level constructs to handle this sort of semantics. For instance, Qt offers it's moveToThread member function to specify thread affinity.
Sorry for silly question, but I have been away from C++ for a long time now. How do these immutable function parameters properties differ from const parameters?
In D, we discovered both const and immutable annotations were required. Immutable means the object never changes. Const means the object cannot be altered via the const reference, but can be altered by mutable reference to the same object. This makes optimization based on immutability possible, it also means immutable objects can be shared among multiple threads without synchronization.
Both const and immutable attributes are transitive, meaning they are "turtles all the way down." People who are used to C and C++'s non-transitive const find it a bit difficult to get used to; people who have used functional languages find it liberating, and it makes for much easier to understand code.
> Const means the object cannot be altered via the const reference, but can be altered by mutable reference to the same object.
Given that this discussion is already going deep into the implementation level, isn't it possible to flag an object as immutable if it's declared and initialized as a const object, or is a const reference to said const object?
It's because const pointers aren't "deeply" immutable. For example, if we have a const Car*, we can reach into it to grab a (non-const) Engine* through which we can modify things.
If there was a "imm" keyword in C++ which acted "deeply", that would get us pretty far towards our goal here. However, we'd then find ourselves in cases where we need to (for example) cast from an imm Engine* back to a (non-const) Engine*, often for values returned from functions. That's what this new "region borrow checker" concept would solve.
> For example, if we have a const Car, we can reach into it to grab a (non-const) Engine through which we can modify things.
Isn't that already deep within undefined behavior territory?
That hardly is an adequate example, given that it spells an eggregious fault, the kind that's covered by any intro to C++ course with clear indications that it's nonsense.
A const object can still have a const pointer to a non const object. It's a source of much confusion and I've seen such errors often in our own code base.
Immutability is not so simple in C++ as it might first appear.
A const object can also still have mutable data. const_cast can just remove the 'const' attribute entirely, but even ignoring that "abuse" there's also the 'mutable' keyword which allows fields to be modifiable on const objects.
The rationale for "mutable" is similar to the one for interior mutability in Rust. It's actually really hard to define what it means for something to be "constant" in a systems language, and that's why Rust did not consider deeper sorts of immutability or functional purity.
Not irritating at all. You are saying that you can't swap the pointed at object for another one AND that you can't change the pointed at object. Two very different things.
You have four possible combinations all with different meanings.
It's also why I always I insist on putting const to the right of the declaration part the const is referring too. The C++ standard allows const to the left IF it is the first const. But I think this flexibility is kind of garbage....
True, but I am a little sensitive to a multiplicity of keywords on a line.
Agree about the variable placing being not great. The trouble is most folk use the flexibility to bind the const in the wrong direction, myself included. Then putting it "right" looks "wrong".
This reminds me of the -Wlifetime proposal, which provides similar checks but requires annotation of ownership at struct level (hence the check only applies to new structs):
I tend to think that the ROI for large legacy codebases is in static analysis and instrumented tooling like the sanitizers.
I’m grudgingly coming around to the idea that Rust is probably a rock-solid FFI story away from being a serious part of my kit (pybind11 good, not like anything we have now).
But there is this in-between place where first-class linear/affine typing could be bootstrapped off of move semantics.
FWIW the Rust-C FFI is very solid. Binding to more complex languages is in various degrees of progress. For C++ I have heard really good things about https://cxx.rs/ (but never had the need to try it). wasm_bindgen is already very good for binding to JS and I have heard people having lots of success writing Python and Ruby libraries in Rust (with some manual glue on the scripted side).
I agree that the language-level C-FFI is quite good, the `bindgen`/`cbindgen` stuff is ok, but not amazing.
I have a fair amount of experience with `cxx`, and even with `autocxx`. These tools show a lot of promise but are nowhere in the ballpark of `pybind11` in features, maturity, flexibility, or ease-of-use. It's early days on the Rust <-> C++ journey, which I appreciate might not be a huge concern for a lot of folks, but I depend on a lot of existing C++. My in-house thing ("alloy") is a big mess of `bindgen`/`cbindgen` and pain. It works ok but it's super counterintuitive at best that I budget like 5x the time to hoist something core up into Rust and I do for Python.
This is where I stand right now, too. I recently published an overview of Rust <-> C++ interoperability[0], and cxx is the most effective tool available IME, but OTOH even cxx is not there yet (mainly missing bindings for Option and iterators). However the space, especially with autocxx, is evolving rapidly.
Interesting! You're correct that it was not on my radar, don't really know why.
Looking at the repository, it looks like the output is a bunch of extern "C" functions, though. Does that mean that one must call eg destructors by hand? How are C++ exceptions handled? Is this sound in the presence of non-relocatable types (eg std::string)? If so, how? Also, this only handles the C++ -> Rust direction, right?
Anyway, thank you for letting me know of this project!
note that IME, pyo3 was nowhere near as stable and reliable than pybind (or boost::python), at least at the time I last used it (2y ago? I remember soundness issues around inheritance, and I got a bona fide segfault without any unsafe). Their packaging tool, maturin, was already insanely good by then though.
Looks like at least some of the authors are working for Google (if not everyone), with compiler backgrounds. I wonder if this is a strategical investment from Google to make cxx like approach more robust? Chrome team showed some interests on using Rust but not sure if there's any significant code written in Rust from then. Benefit from using Rust beside well isolated library might be quite limited without this kind of lifetime annotation while interop itself is additional cognitive overheads for programmers.
I think it makes the most sense for Google, considering the ridiculous amount of C++ code. Even Spanner alone is larger than most company's codebases, and rewriting it in Rust would take decades, especially because of the paradigm mismatch (C++ embraces shared mutability, Rust rejects it).
It also makes sense because in Spanner, doing any minor change required many months of review, because it was such critical infrastructure. Refactoring was a non-starter in a lot of cases.
So, a more gradual approach, just adding annotations that don't themselves affect the behavior of the program (just assist in static analysis) makes much more sense.
Rust does not reject shared mutability. Rather, it is explicitly reified via the Cell<>, RefCell<>, etc. patterns. Even unsafe shared mutability ala idiomatic C++ is just an UnsafeCell<> away.
Interior mutability is used all over the place. It's absolutely necessary. The thing is you are forced to be clear about the runtime safety mechanism you're using. In general, it gets abstracted away behind a safe API.
Indeed it's present under the hood, we are operating on a CPU after all, which treats all of RAM as one giant shared-mutable blob of data. It will always be there, under some abstraction.
The point I'm trying to communicate is that in practice, for various reasons, Rust programs do not use shared mutable access to objects to the same extent that C++ programs do. For example, C++ programs use observers, and dependency injection (the pattern, not the framework) to have member pointers to mutable subsystems, and we just don't often see that in Rust programs. This is the paradigm mismatch I'm highlighting: to rewrite a C++ program in Rust often requires a large paradigm shift. The pain is particularly felt when making them interoperate in a gradual refactoring.
This is IMO one the bigger reasons that big rewrites to new languages fail, and why new languages benefit from being multi-paradigm, so that there's no paradigm mismatch to impede migrations.
> * Serverless full-text search with Cloudflare Workers, WebAssembly, and Roaring Bitmaps *
> "Edgesearch builds a reverse index by mapping terms to a compressed bit set (using Roaring Bitmaps) of IDs of documents containing the term, and creates a custom worker script and data to upload to Cloudflare Workers"
All those alternatives you mentioned are nowhere close to being on the level of Spanner in reliability or performance, particularly in a high contention scenario.
Interestingly unsafecell only allows half the shared mutability story. It allows immutable references to be mutated. But it does not allow mutable references to alias. So rust can’t (yet) express all the patterns in C++.
That's backwards. If you can mutate through a shared reference, then there is no need to alias `&mut T`s in the first place! It is C++ that is missing the ability to express unique references like `&mut T`.
> If you can mutate through a shared reference, then there is no need to alias `&mut T`s in the first place!
Unfortunately that is not true. See [1] as an example is core where mutable aliasing is needed (async generators). In this case they just added to compiler hack [2][3], but this really needs proper support in rust. Something like “AliasCell”.
I'm familiar with this issue. But it's a self-imposed problem, not a fundamental limitation relative to C++: the "same people" decided (at different points in time, to be fair) to make the `Future` trait use `&mut Self`, and to let specific futures hold self references.
This is a very different issue than the actual ability to express shared mutability in the language. With enough foresight, Rust could have "simply" used `&Self` and `UnsafeCell` for futures (and the current compiler hack with `!Unpin` is basically just replacing `&mut Self` with `&UnsafeCell<Self>` under the hood).
> We are designing, implementing, and evaluating an attribute-based annotation scheme for C++ that describes object lifetime contracts. It allows relatively cheap, scalable, local static analysis to find many common cases of heap-use-after-free and stack-use-after-return bugs. It allows other static analysis algorithms to be less conservative in their modeling of the C++ object graph and potential mutations done to it. Lifetime annotations also enable better C++/Rust and C++/Swift interoperability.
> This annotation scheme is inspired by Rust lifetimes, but it is adapted to C++ so that it can be incrementally rolled out to existing C++ codebases. Furthermore, the annotations can be automatically added to an existing codebase by a tool that infers the annotations based on the current behavior of each function’s implementation.
> Clang has existing features for detecting lifetime bugs [...]
Is this doc talking about the same feature that this post is about? It sounds like the doc is talking about a different, less-precise analysis that the post cites as prior art.
I think this is a good idea, mostly because I've been of the opinion that much of the rust "safety" could be done with a linter pass in C++. These annotations should help solve the edge cases that aren't deterministic.
I've not generally been a big fan of where C++ has been going in the last couple revisions, mostly because it seems the newer features aren't fully thought out when it comes to how they interact with other features (reminds me of javascript in that regard) leaving a bunch of new footguns, looks at lambda's.
I think i've said this here before, that C++ needs its own version of "use strict;" that basically kills off some of the syntactically odd corner cases that lead to those footguns.
Sounds awesome. After the initial hurdle of getting used to lifetimes in rust it was really an enjoyable time and I would love to see the same feature in c++.
Maybe a C++ person can help me out, I am staring at this C++ translation of Rust's elision rules:
> If there are multiple input lifetimes but one of them applies to the implicit this parameter, that lifetime is assigned to all elided output lifetimes.
In Rust we have self rather than this in methods, but importantly we sometimes don't take a reference here, and that's still a method, and you still have the self parameter, but the lifetime elision rules don't apply. They don't apply because if you've actually got self, not some reference to self, the lifetime of self is going to end when the function finishes, so returning things with that lifetime is nonsense.
This can make sense in Rust for transformative methods. If there's a method on a God that turns them into a Mortal, the God doesn't exist any more when that method exits, the Mortal is the return type, and if you just drop it then I guess sucks to be them. (In Rust, and I think in C++ you can label something as must-use and cause a warning or error if the programmer forgets to use it).
It seems though, as if this C++ rule would hit such methods too. Can you distinguish in C++ between "a reference to me" and "me" when it comes to methods? If you had a method on a God class in C++ that transforms the God into a Mortal and returns the Mortal, what does that look like? Does this elision rule make sense to you for such a scenario?
C++ does not currently have such methods- all `this` parameters are always by (some kind of) reference. (This doesn't stop you from moving out of `*this`, but C++ moves require the source object to remain valid, and I'm not sure I've ever seen that in the wild anyway!)
This will change in C++23 with "explicit this", but then the `this` parameter will no longer be implicit and that translated elision rule will no longer apply either.
So in today's C++ your example might look something like this:
struct God {
Mortal transform() && { // Take an rvalue reference for `*this`, I guess?
return Mortal{ /* Move some fields out of `*this` probably? */ };
}
};
The elision rule does apply here even though we're moving out of `*this`, but again we are still taking a reference and the God object remains valid afterward anyway.
With "explicit this" you could instead write something like this:
struct God {
Mortal transform(this God self) { ... }
};
Now there's no reference and the rule ceases to apply. (Though in most cases you are still going to be leaving a valid, but now at least unrelated, God object behind- the only way to avoid that is to construct it directly in argument position so it stays a prvalue.)
That only seems tangentially related to me, but it is interesting, haha.
Specifically that sounds like it would cause use-after-frees- if you actually `delete this` and then `new (this) Bla`, you're writing to memory that's already been returned to the allocator. Maybe you meant `this->~Bla()` rather than `delete this`? Or a custom `operator delete`?
It interoperates with a tiny subset of the C++ language, it is miles away to interoperate with libraries, SDKs, IDEs, GPGPU, FPGAs and tons of other tooling used in C++ ecosystem.
C++ developers tend to be very conservative. I still meet shops that genuinely are discussing if they should use c++98 or try this new modern thing c++11. Lately that’s getting more and more rare fortunately.
Even if your team is more progressive, there is likely a huge mountain of still active legacy code that nobody wants to rewrite. There a more incremental approach is still the way to go.
Otherwise yes, for a greenfield project, rust is likely superior in many other ways. Not just in this area of borrowing but also build, syntax and lots of other features.
10 years from now, there will be nobody on earth able to say "i know c++" because each codebase will use a completely different set of bazillion optional features making each codebase look like an entirely different language from each other.
This might sound kinda weird, but I really like reading about coding languages and with c++ there is so much to discover that I really like that. It's like buying an expensive car with a lot of options and even after years you discover new buttons. If you like reading about low level code and implementations of recent new techniques in computer science, then you'll love c++, there is always so much going on. I've learned almost all coding languages and they've become simple and boring, articles about them feel like reading a children's book. It's definitely not easy to learn all features, but it also doesn't become boring so fast.
I know that feeling. There is joy is reading a good technical manual. Enjoyment in seeing a complex system built up. How the parts support each other and intermix to produce something complex and intricate. Once you have read it all and have in your head a complete mental image of how it works, when you can open the manual to any page and not feel lost, there is additional enjoyment and sense of completeness.
I remember that feeling. I've given up on ever experiencing it with C++. Too much of C++ feels like complexity for the sake of fixing other complexity, which makes it harder to get excited about learning it. Maybe I'm just getting old.
> 10 years from now, there will be nobody on earth able to say "i know c++" because each codebase will use a completely different set of bazillion optional features making each codebase look like an entirely different language from each other.
I mean, this is kind of why C++ is successful. It's a toolbox that'll do anything and you can write high level or low level code.
> but maybe thats how we'll finally get rid of c++
Lol we're never getting rid of it. There's too much C++ software and will Rust replace it all? Probably not.
Bad analogy. Flash was never the core technology of the web, and it was proprietary to one company, which made it much easier to kill when that one company lost interest. It will take a long time for C++ to fade away, because of all of the useful code written in it.
If people actually got to make their own choice on that I'd strongly guess that Flash wouldn't have ever gone away. Browsers declared HTML5 was going to replace Flash long before it was actually capable of doing so (and arguably still isn't).
But it only took ~4 "deciders" to kill Flash regardless of what anyone else wanted, which was Apple, Microsoft, Google, and Mozilla (and eventually Adobe decided it didn't care about Flash anymore either). Nobody so centrally "owns" C++ such that there could be a concerted deprecation effort that anyone would actually care to listen to & respect. Even if the C++ committee itself decided to kill C++, and got G++, MSVC, and Clang on board, which is extremely unlikely, would anyone even care that much or just keep using the last release of the compilers with support until the end of time? Kinda like they do for FORTRAN. And COBOL. And etc...
Right. And a lot of the content is on Myspace, which is pretty much baked in to how everybody lives now. If you tried to launch some sort of rival service, even if it was as good as Myspace why would anybody join it? Next thing somebody's going to talk about handheld computers again, as if you could convince ordinary people to carry a computer around with them. Not likely.
More Seriously: You shouldn't believe any of this nonsense about how we're permanently locked in to something that's less than a century old. There are plenty of people still alive today who were born in a world that not only didn't have C++ it didn't have programmable computers at all.
> each codebase will use a completely different set of bazillion optional features making each codebase look like an entirely different language
The issue is each codebase has its own DSL?
That is pretty close to the case with all non-C++ langs: industry settles on a handful of frameworks which are syntactically similar but each has their own gotchas.
I've never met anyone who knows the entirety of C++ today. Some people kept up with C++11 maybe, but anything beyond that is quickly getting into the territory where there's bits of the language that are truly arcane to even very experienced devs.
even then what does it mean to "know c++"? Do you mean the spec? Specific implementations?
IMHO the language and implementation have been far from simple for almost a decade now. I don't think that's necessarily a negative though.
> but maybe thats how we'll finally get rid of c++.
I feel like you didn't read your own comment here. What you've described is a world that is littered with so many c++ variants, it will be impossible to eradicate them.
The world runs on C/C++ code. There are more than 5 million C++ developers out there and that number is increasing. So it doesn’t look as if C++ will go away any time soon.
Not at all. This is all about retrofitting existing code with more static checking. Nothing here creates an incentive to write new code in C++ instead of Rust.
What happens if you make an annotation mistake? Could the compiler generated code then have a security vulnerability, just like when you use undefined behaviour and the compiler is then allowed to optimize safety checks away?
No- the idea is that if you make an annotation mistake, you will get an error, because the program will not match its annotations.
Lifetimes are not like an `unreachable` UB operation. Instead they are just a description of cross-function information, about what both the caller and callee are allowed to assume.
You could technically (both in Rust and under this proposal) do the same checks without them, if you had access to the entire program at once. However, this would be more expensive, and probably give you less-localized error messages (a lot like C++ template errors, for similar reasons).
This is actually possible for C++, if we add a concept of pure functions: functions that don't modify anything indirectly reachable from arguments (or globals). Basically:
* Annotate those parameters as "immutable" or as coming from the same immutable "region". Anything reached through them would also have that immutable annotation.
* Anything created during the call would not have that annotation.
The type system, basically a borrow checker, could keep them separate and make sure we don't modify anything that came from outside the function.
We're currently adding this to Vale [0], It's a way to blend shared mutability with borrow checking, and it gives us the optimization power and memory safety of borrow checking without all the constraints and learning curve problems.
[0]: https://verdagon.dev/blog/seamless-fearless-structured-concu...