Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Does the Bronze Garbage Collector Make Rust Easier to Use? (arxiv.org)
109 points by ingve on Dec 23, 2021 | hide | past | favorite | 180 comments


> A key tradeoff is that Bronze does not guarantee thread safety

Not having data races is one of the key benefits of using Rust, “fearless concurrency” and all that. So by throwing away a key guarantee of Rust (no aliasing mutable references), it can become easier for learners to program in. It becomes a bit of an apples and oranges situation at that point.


Yes pretty much like taking a statically typed language, making compile time type annotations optional and claiming victory that it's easier for beginners. Now they would get runtime exceptions instead.


It's still valuable to have a language that has better semantics and a more modern standard library than C++, even if it didn't get the benefits of strict memory management. That's still a Useful Thing.


Once you introduce a garbage collector there are plenty of other languages that provide that while still having expressive type systems and modern features, like Kotlin or C#.


Yeah, Rust's entire value proposition is that it doesn't lock you into GC


Rusts ownership system has benefits beyond memory management.

If I'm a situation where having a GC is ok and there is no "major library ecosystem benefit" (or simlilar) for one of the languages I still would choose rust over Python, JS, TS, Java, Kotlin, Dart, Scala (probably C#, idk. as I haven't used it).

The borrow checker is something which cost you once time to learn but if you are fairly familiar with it it normally won't cost you much time (if any). Sure there are still situations in which it can be tricky. But most times they are pretty clear and you can just throw a Clone/Rc/Arc at it and it's normally just fine (the borrow checker is still useful even with managed pointers/collections like Rc/Arc, in a certain way it makes them less error-prone to use, especially in case of more complex types like some thread safe Cow optimized manage pointer type you might find in a library).


Or… OCaml.


Isn't that language called D?


There are two forms of optional type annotations:

1. Using a placeholder (let/var/val/auto/...) -- e.g. in modern C#/Java/C++/Kotlin -- and letting the compiler figure out the type, but keep the actual type known at compile time. This will give you compile time errors when using the wrong types, and keeps the variables of a fixed type.

2. Effectively making all types variant types that can hold any value and can change their type -- e.g. in JavaScript/Python/Ruby -- such that they are dynamically typed. This can lead to runtime errors.

For languages like C#, Java and the derivatives, they have the concept where all objects are instances of a common type. If you use this -- especially in collections -- you can also get runtime errors. As long as you stick to the generic versions of these, the compiler will enforce the type safety.

The statically typed languages have been making types optional where the compiler can deduce them to avoid redundancy and duplication. If there is an ambiguity, the compiler will omit a compiler error. This is the best of both worlds -- type safety without the noise of annotating types everywhere.


My point here was that this was equivalent to using Object type in Java/C#. Here the data race guarantees of rust type system are elided due to this GC.


Rather it's like taking a statically typed language and making all type annotations inferrable.


If they were made inferrable, it wouldn't cause runtime exceptions. Here the data race guarantees of rust type system are elided instead.


True, yet in the context of the experiment that part of the language was not used, so they could compare the two approaches.

I think this research highlights a important strategy to make all software safer.

A quote from the discussion section:

> Encouraging adoption of safer languages by reducing stress.


It appears in this case that they're encouraging adoption of safer languages by making them unsafe and unsound: https://users.rust-lang.org/t/bronze-gc-and-aliasing-problem...


Usually, you don't start learning a new language with multithreaded operations..

In that sense, adding a gc to rust for single threaded programming, is almost useless : it's not helping that much and you're not learning.


> Usually, you don't start learning a new language with multithreaded operations..

When I have started with rust I was not writing stuff with Box/Rc/Arc... or even explicit lifetimes. Not saying that I was cloning everything but for simple stuff you can come long way with just moving and simple borrowing.


Depends if it's your first language, probably.

For an already experienced programmer, it's hard not to think about using multiple threads, even when just starting in the language. I know my few first Rust programs were using multithreaded constructs and I found that especially easy to do safely in Rust (when I started, there was already crossbeam and rayon, both making a lot of things easier).


Due to the type of software I do that is exactly how I look at new language: How it handles concurrency, what kind of synchronization primitives it offers, how it manages lifecycle etc. etc. If it does not provide enough facilities in comprehensible way then it essentially useless to me. I do not learn languages just for the f.. of it.


> Usually, you don't start learning a new language with multithreaded operations..

Why not? I mean, maybe not your first ever programming language, but - why should you not get used to doling out work to all available threads to begin with?


Because multi-threaded programs are largely unnecessary for solving many problems (as the popularity of python has aptly demonstrated).


If you already have the required experience for writing working multithreaded programs, then learning Rust isn't going to be an issue !

If you don't or are unsure about that, then stick to simpler forms of programming or use type-hinted Python instead.

This would the recommendation I would make to anyone asking me that question !


> Not having data races is one of the key benefits of using Rust

Exactly. Lifetimes, borrowing, etc. are the complexity of thread safety. It's like saying that airplanes are easy to fly, if you replace the airplane with a car.


Doesn't it seem useful to have one language that can be used at several levels of sophistication?


Depends on the tradeoffs.

Having different levels of abstraction within the same language can be really useful.

But you don't want to overcomplicate the language to support that, especially if it makes operating at different levels more complicated.

In this case, it seems like an odd tradeoff to make. The less sophisticated user is exactly the same kind of user that is more likely to foot-gun themselves with a memory or thread-safety issue. The "right" answer is probably a thread-safe garbage collector, but that has its owns set of usability and implementation tradeoffs.


How does Bronze overcomplicate Rust as a language? It's an optional library, not a language extension.

I think it's often pretty easy to not footgun yourself with thread or memory issues, because these issues simply don't exist in wide swaths of application.


It's locking you into single threaded usage, but you might very easy run into a library which requires Send bounds, as it uses e.g. rayon internally.

And for the cases where having managed pointers are better, Rc/Arc are often (not always) good enough.


Also it seems to be fundamentally unsound, the borrow checker isn't limited to making memory is freed correctly and safe multi-threading. But also affects assembly generation! (And is used to makes sure that all kinds of thinks work correctly, even without multi-threading).

The library seem to allow creating multiple mutable refs, which is instant UB in rust. I.e. having two mutable refs is already UB even if you only use one at a time. While this is just a PoC it means you can't implement it as a library! Only as a compiler extension which furthermore means a lot of optimizations must behave different if the extension used.. which is quite painful to maintain and prone to introduce compiler bugs.


Thank you for mentioning UB. I like Rust quite a bit, but once you trigger UB, things get nasty really, really fast. It feels like you’re using gcc at -O5 (not a real thing). It’s not like C/C++ in which you can sort of infer the consequences of UB on a single platform with enough experience.

It is greatly helped by compiler warnings, but; still, UB in Rust can be downright brutal.


It's kinda the same in C/C++ as far as I can tell.

They use the same backend in LLVM/GCC and the code gen options aren't even that different mainly that rust uses the noalias attribute a lot.

> C/C++ in which you can sort of infer the consequences of UB on a single platform with enough experience.

UB in C/C++ allows the compiler to arbitrary rewrite your code, and it sometimes does so. All depending on compiler version, enabled compiler flags, optimization level and "seemingly arbitrary" factors (i.e. some optimization passes detect or fail to detect optimizations sometimes for quite surprised reasons).

So I don't think you can "sort of infer the consequences" on a single platform in practice. (Maybe in with optimizations disabled, but even then it can be tricky.)


There are similarities but also significant differences. Safe Rust has no UB, and so this appears less often in general. There’s also just straight up different semantics; famously there was an i soundness bug in safe Rust because a loop with no side effects is defined behavior in Rust (you get an infinite loop) and UB in C++ (the compiler can elide it). LLVM added a new syntax to handle this specific case, eliminating the UB in other languages. But this kind of thing does happen.


The borrow checker does not affect code generation. Lifetimes are completely elided when codegen'ing.

What _does_ affect code generation is Rust's rules about borrowing and so.

If borrowck was affecting codegen, stuff like `RefCell` wasn't possible.


Sure,

But in the end most times people (outside of the rusty community) speak about rusts borrow checking it's about the combination of checking lifetime, "pointe" aliasing rules and the (not fully specified) memory model. I.e it's ownership model.

And while lifetimes get eliminated at some point in compilation this is after making sure they are correct, which enables rust to use a "pointer" aliasing model for it's references which is subtle different from C-pointers in ways which affects code generation.

I.e. they implicitly affect code generation in the way they affect what the rust language/compiler can do/can rely on (like relying on being often able to mark &Mut as noalias).

I.e. if you write unsafe code assuming lifetimes are just about making sure memory is freed correctly you will produce unsound code.


This reminds me of C# sharp and their introduction of the unsafe keyword. gc for everything until you need to manage your own pointers for speed or something.


Data races across threads, Rust type system does nothing to prevent data races across processes.


That's not entirely correct, or at least misleading. Rust will provide the same guarantees for variables in memory shared between processes as for variables in memory shared between threads. But you have to make sure that any locking datastructures you're using (like mutexes or read-write-locks) are able to work across processes.

(There are limits, though: if you map the same physical addresses to different virtual addresses, Rust can't help you. However, that is independent of threads/processes, because you can also do that in single-threaded programs.)


Which is a different story that just asserting fearless concurrency no matter what, also misleading.

Hence why I try to make a point that comes with a footnote.

Rust is after all supposed to target all kinds of system programming scenarios.


> Which is a different story that just asserting fearless concurrency no matter what, also misleading.

Frankly, you're being a bit disingenious. Nobody claimed that Rust can or will solve all conceivable concurrency problems. "Fearless concurrency" is generally understood to mean "...within a single program", not "...across different processes/machines/networks". By the time you understand what interprocess shared memory is, you're well able to correctly interpret Rust's "fearless concurrency" slogan.


Understood by most on the Rust community, not by others.

Most outside of the community aren't aware that nomicon points out exactly this.

By the way, there are also ways to cause havoc within a single program, example using a file as backing store being accessed by multiple threads concurrently, or accessing database data without transactions.

My goal is not to bash Rust, rather to trigger discussions around these kind of problems.


Both those situations are race conditions, but neither are data races. Rust only prevents data races, which is a specific kind of race condition, but it does not prevent race conditions in general.


It is the "in general" I care about and think it gets too little discussion on the community, because just like some RIR threads, the details get lost in the discussion.


When can data races across processes happen?

Are you talking about databases, services or IO and such?


I guess the simplest example is shared memory between processes.

Even Python has it: https://docs.python.org/3/library/multiprocessing.shared_mem...


Access to raw memory is locked behind the unsafe keyword though. Rust officially already does not guarantee any safety in that scenario even within 1 process.


There is always unsafe at some level on the standard library.

The point is that it doesn't protect the user of a crate that only exposes a fully safe API, unless they do digging to validate overall architecture safety.


>Even Python

It's not "even". Python specifically has it because it has no real threading.


Python does have real threading. The `threading` module provides os-level threads and synchronization primitives. The only difference between this and multithreading in C or Java is that CPython's GIL prevents more than one thread executing bytecode at a time. This prevents parallelism, but not concurrency.

Note this does not mean that python code is thread-safe by default. At most, you can theoretically rely on bytecode operations to be atomic, which means you'll need to synchronize multi-threaded code with mutexes, semaphores and higher-level synchronization constructs.


Python has cooperative threading. It's the same threading model used in the Erlang VM, Julia and many other dynamically typed languages. But preemptive threading vs. cooperative threading is orthogonal to whether data races can happen. Java threads are preemptive but data races can still happen.


The Erlang VM does preemptive scheduling.


No it doesn't.


While this is technically true it's quite misleading. The VM itself uses cooperative scheduling, but the Erlang compiler emits something akin to yields appropriately, such the net effect is preemptive scheduling. You can break it by calling a NIF that doesn't do the yields appropriately, but that's not the norm.


Preemptive threading means that the running thread can be paused by the system without ANY cooperation from the thread itself. Emphasis on "any". Python works roughly the same as Erlang. Every N bytecodes it checks if a thread is waiting to run and switches to that. Both are variants of cooperative threading.


SharedMemory is a new thing in Python. Not even supported by all 3.x versions.


But this is specifically about Rust.

What data races between processes, other than Disk/IO, databases, or external services, can a Rust program have?

I explicitely exclude the whole category of external services, since that is "by design" really. And the whole reason for ACID, global mutexes, transactions and CRDTs.


That and kind of data structure that can be shared via IPC mechanisms, some of them even transparent for the processes.


Environment variables.

Locales.

Quite a few other POSIX bits, really.


It is not possible to have a data race with environment variables across multiple processes. Every process has its own copy of environment variables (in fact they have their own copy of the entire environment).

I'm not sure what data race is possible across processes with locales, that's too vague of a claim to make.


One type of locales I know are the LC_ env vars. So there the "ENV is a copy" applies too.

Another would be to read and write into locale files, such as JSON. But then the ame applies as with any database or IO: this is inherently race-condition-prone and that is by design.

Maybe grandparent is thinking about locales in many web frameworks, that is some global var which should not be shared across users. So that if you set `Locale.current = "EN_GB"` that applies for any (email)notifications, errors, files, responses or such, being sent out during that request/response and during any jobs that request/response may spawn. In e.g. Rails this "somewhat global var" is a Frankenstein, but works suprisingly stable, actually.


It's interesting to see if one can come with a solution based on custom `Send`/`Sync`-like traits.

Of couse it will require nightly since auto traits are not stable.


This study, along with their GC library, is fundamentally flawed: The Bronze garbage collector doesn't just provide automatic memory management, it also allows shared mutable references! This undermines one of the most critical principles of Rust, one which is necessary for many of Rust's safety guarantees: There is never have more than one live mutable reference to the same memory location. As soon as you create two mutable references to the same memory location, you get undefined behavior, even if you don't dereference them.

That's why the entire approach is flawed: They didn't study "Rust + GC", but effectively created a new language. See [1] for more details.

Personally, as someone who uses Rust professionally, I believe optional GC would be a great addition to Rust! The compiler would provide support for GC, and the actual GC runtimes would be library-based, similar to how async/await is handled. However, this GC would merely replace Arc/Rc, not Mutex/RefCell/Cell.

[1] https://users.rust-lang.org/t/bronze-gc-and-aliasing-problem...


I absolutely agree. There’s prior art in that direction. It’s more or less abandoned now, but Microsoft developed an extension to C++ that integrated the .NET runtime with it’s garbage collector called C++/CLI. The way it worked is that in addition to C++’s native pointers and references (Foo* and Foo&), you had a new kind of garbage collected reference, written Foo^. There were some annoying limitations that a GC built directly for this use case might have solved better, but mostly, this worked really well.

And I think Rust would be a much better target language for such a system than C++ was. For example, one of the interesting problems was that the .NET GC is compacting — it will move objects around to improve cache locality and simplify the allocation algorithm. To interact with ordinary C++ code, you need native pointers into GC’d objects. But it was only safe to take such a pointer while you told the GC to temporarily “pin” the object in memory, otherwise, your pointer could be randomly invalidated. Forgetting that step was a very evil source of nondeterministic memory bugs. It’s not hard to see how the borrow checker would help there.


It isn't abandoned at all, it was one of the milestones for .NET Core 3.1 release and is currently up to date with C++17.

Much easier than dealing with P/Invoke declarations or dealing with COM over RCW/CCW layer.


If you add a garbage collector to Rust, you neutralize one of its strength: the controlled lifetime of objects. You also make memory leaks easier.

I'm convinced Rust is easier to write without the management of lifetimes in a short time span. However, what about long term maintenance, and application stability?

Rust with a GC probably has its uses (compiled language with strong error checking), but one has to know what ship they are boarding.

It also makes it quite closer to D, an "easier C++", compiled language with a GC.

(I'm saying this as someone who tried Rust, and mostly uses GC'd languages)

edit: thanks for the replies below. I hadn't though about this. I guess one should better read my message as "If you remove Rust's lifetime management, …"


Rust’s type system doesn’t guarantee that object lifetimes are bounded from above (guaranteed to be destroyed), just that they are bounded from below (guaranteed to exist when used). This is why you have stuff like crossbeam—elaborate workarounds for a lack of linear types in Rust.


It doesn't guarantee it under all circumstances, like if your computer shuts off obviously destructors don't get called. But you can rely on the behavior to be deterministic.


You can rely on the behavior to be deterministic, but the Rust type system provides no way on relying on this determinism. For example, if you have a stack variable and pass a reference to a short-lived thread, Rust will just throw an error, because there is no way for Rust to prove that the lifetime of the borrow is short enough (bounded from above) because Rust does not have linear types.

I want it to be exactly clear what I'm saying... While you (the programmer) may understand when an object is destroyed, and Rust (the compiler) will agree with you, that information is not present in the type system and you cannot use it in your programs. In this aspect, Rust has the same problems as C++... but where C++ still lets you write code, Rust will just throw an error and force you to use "unsafe" here.


> For example, if you have a stack variable and pass a reference to a short-lived thread, Rust will just throw an error, because there is no way for Rust to prove that the lifetime of the borrow is short enough (bounded from above) because Rust does not have linear types.

Rust has scoped threads for this use case. Generally this uses closures as opposed to lifetimes to "bound borrows from above" in a safe way.


Rust has crossbeam, but it is not a complete solution to the problem. Scoped threads were removed.


Would you mind elaborating?

It was removed from the standard library ages ago, but `crossbeam::scope` appears to still be there in the newest release of crossbeam and the git repo.


Using a scope for spawning threads is a solution to a narrow slice of a much larger problem.


Yes, that's correct. I just want to be clear that it's not like destructors in rust are unreliable in the way that finalizers are in Java.


To be clear, I definitely wasn't talking about that.

Since you can't use the type system to guarantee that an object is destroyed, there are a few things in Rust that are a major pain to get working. Like passing a reference to a short-lived thread.


It's not that hard in my experience. You generally do it by using a guarded scope where the user provides a callback and doesn't manage the resource (like a thread) themselves.


Sure, if that works for you, it's easy. Stuff like crossbeam::scope only solves a subset of the problem, though. When that solution doesn't work for you, it's a pain.

Using a scope is, strictly speaking, less powerful and more annoying to work with than a more general system of linear types. It's like Go's defer, or C#'s using/IDisposable... they work in certain scenarios, but there's a percentage of the time where a lexical scope or function scope doesn't match the lifetime of your object, or can't match the lifetime of your object.


Yep, agreed.


> I'm convinced Rust is easier to write without the management of lifetimes in a short time span.

Yeah that's the problem with all these studies that use students writing new code for a few hours.

99% of coding isn't like that - it's reading other people's code (including code you wrote ages ago and can't remember), trying to figure out how it works, following complex call chains, debugging etc.

If you only look at Advent of Code type problems you'll come up with all sorts of strange conclusions, like static types don't make a difference, comments aren't useful at all, it doesn't matter if you exclusively use single character variable names, goto is totally harmless, etc. etc.

Though in this particular case I guess they didn't have much choice because there are no existing large programs written using Bronze.


I'm not 100% sure, but I think the OP was talking about the overhead of lifetime management for objects that don't live for a very long time. Or maybe objects that don't have "interesting" lives.

One example in my head individual strings that need to be constructed dynamically but live for the lifetime of the application (better to leak it with Box::leak or lazy_static! than pollute all code with lifetimes)

Another is writing in lifetimes for single purpose objects that live on the stack and might only ever get passed by ref into a single function and then get destroyed soon after.

Lifetimes are super important in rust and are a core part of the language, but in such degenerate cases they take up a lot of programmer effort for little benefit. In my head a "automatic" solution such as GC /could/ have a home in the language. Perhaps this would make rust a slightly better fit for really complex monolithic GUI apps (word processing, spreadsheets, CAD) where full GC would be performance-onerous but data lifetimes would be too complex for rust's strictly ordered lifetime concept


We were particularly interested in learnability for this study, which CAN be assessed in a study with students.


Garbage collection and controlled object lifetimes are important and orthogonal concerns. You can have both at the same time. C# for instance lets you add a Dispose() method for the times when you want to free the resources now. There are even ways to ensure statically that objects aren't used after the close/dispose is called. (Of which I would find the paper, but my google-fu is failing me right now ... https://okmij.org/ftp/Haskell/regions.html and https://arxiv.org/abs/1803.02796 I think)


Tbh, I'm not sure about this GC-purism. I've encountered tons of high-end C++ that's absolutely full of smart pointers, which is a form of GC. Rust has Rc<> and Arc<> as well. Having a a ref-counted GC instead of a tracing one doesn't seem like a step up in 'purity' to me.


Ref-counting provides consistent performance - the costs are amortized smoothly over the operations. They aren't doing all this work for "purity".


Performance of ref-counting is proportional to the size of the dead objects. Performance of tracing gc is proportional to the size of the alive objects. In other words, there are many real-world situations in which ref-counting pauses are much longer than tracing gc pauses.


Ref counting can also lead to memory leaks with circular references.


> It also makes it quite closer to D, an "easier C++", compiled language with a GC.

That's exactly why I loved D over C++ and Rust and had the feeling that I was not fighthing the compiler from day one: the GC, auto types, lack of annoying C preprocessor.


D also has the good characteristic of not needing to download the entire Earth and compile half of it before starting a project. You just apt install ldc2 or apt install gdc and there you go (pun not intended).

This actually made me chose D over Rust for my last project.


> those who [used the GC] required only about a third as much time (4 hours vs. 12 hours).

This makes me happy to use D instead of Rust. D has a @safe subset.


Can you use lifetimes for most objects, and use GC where it's "hard"? Like, canonical example with double linked lists.


You can do this in Rust today, with the standard library's reference counter


Also, one of the easiest ways to leak memory in Rust. My understanding is that substructural typing systems are just not quite powerful enough to represent the semantics of double-linked lists, but I don't understand the math myself.

Obviously it won't leak memory if you do it correctly, but Rust will happily let you leak memory in an Rc<T>.


If you really do want to leak memory, Rc<T> is overcomplicated - just call `Box::leak` and throw away the output.


Rc<T> is the preferred way to introduce unintentional memory leaks.


I’m saddened to think that because this industry has an obsession with making programming accessible to the lowest common denominator, expressive and innovative languages like Rust will fall by the wayside, and we’ll all be writing code in languages like Go and Python in the future because they’re “easy” and even high schoolers can learn them in a couple days.

It reminds me of the longbow vs crossbow.

One is a technology that anyone can learn to use easily, the other is one that takes longer to master, but is a more efficient weapon in the hands of a skilled soldier.

The analogy isn’t perfect but the parallels are there.

http://www.thebeckoning.com/medieval/crossbow/cross_l_v_c.ht...


There is absolutely no evidence to support this assertion.

Instead there is plenty of evidence that currently there is a lot innovation in all niches of the programming ecosystem - from low level [Rust] to high level, and a lot of coder interest in all the options.

Also talking about "lowest common denominator" comes across as somewhat snobby - many world-class coders use high-level languages when appropriate. It's a matter of choosing the right tool for the job, not dumbing down.


> There is absolutely no evidence to support this assertion.

Consider the popularity of Javascript and Electron for Desktop applications.


I think that's more a function of "We want to access all demographics without paying to properly comply with the interface design guidelines of each platform individually. What 'write once, run everywhere including the browser' solution have you got for us?"


That wouldn't explain why even companies like Microsoft use webtech in their OS configuration GUI these days.


Good point. It's probably fundamentally a cost-cutting measure in both cases, where Microsoft just found it cheaper and easier for similar reasons.

(eg. Win32 isn't a convenient API, Qt has license fees while Electron does not, Microsoft has a history of making successor APIs that even they don't have proper confidence in, etc.)


And yet much of Google was built with Python for a very significant chunk of its life. Netflix and Uber were/are heavily leveraging JavaScript for core workloads. The machine learning ecosystem is largely dominated by Python. Stripe, GitHub, and Shopify are still investing heavily in Ruby. Almost none of this is/was for the sake of accessibility to beginners.

Your example cherry picks exactly one instance, ignoring the context of that choice. Such an assertion requires more than just anecdotal musings and ignores the tradeoffs that those teams made when choosing a platform.


Because it’s a first class development experience, cross-platform by default, V8 is really really fast, and toolkits that support “build your own toolkit” are non-existent outside the web.

I don’t want to make a “Windows app”, “A Mac app”, or a “QT app”. I want a rectangle and a high-level enough interface to build my own widgets. People who are single mode users obviously hate that every app looks and behaves different but most people are multi-mode and don’t care.

JS is only easy at the most superficial level and gets to be a mess once you go beyond small apps.


> Because it’s a first class development experience [...]

I'm going to have to disagree with that.

> People who are single mode users obviously hate that every app looks and behaves different but most people are multi-mode and don’t care.

I am going to have to strongly disagree with that. If that were true nobody would ever have wanted to do theming, which was a big deal before developers forgot how to make software flexible enough to support it.

> JS is only easy at the most superficial level and gets to be a mess once you go beyond small apps.

Oh, you mean like BASIC? A language designed specifically to be easy to learn? Lots of applications were made in BASIC because it was easy to get into and people just kinda dealt with the mess. Visual Basic especially has a reputation. Sound familiar?


> It's a matter of choosing the right tool for the job, not dumbing down.

In some cases, but there is definitely a trend of dumbing down stuff in the name of accessibility or ease of use. As long as the GUI does not limit me and I can still tinker with various stuff under, say, "Advanced", I am okay-ish.


Every time I hear the term "lowest common denominator" applied to programmers, I reflect on the number of network-accessible remote exploits that provide root access via the Linux kernel over the past three decades.

And yet the folks maintaining the network stack in the Linux kernel are not anywhere near what I or almost anyone else would consider the lowest common denominator programmer.

Sometimes you just have to accept that ALL humans are inherently bad at this, only some are worse than others at various points in their lives. The security and reliability of the product of our output should not be so largely dependent upon our ability to get a full night's sleep, reasonable working hours, and lack of personal conflict. To be sure, we should all strive for those things, but perhaps the choice of tools is correct in prioritizing mitigation of difficulties and vulnerabilities from meatspace.

Sometimes that's a borrow checker. Other times it's a garbage collector. And every once in a while, we need to break out an unsafe block (or the entire C language) to get a job done.

Considering how much more expensive people are than software or hardware, the trade off for easier/simpler access is increasingly straightforward.


While I strongly agree this doesn't seem to match the sentiment of the paper.

Which is prioritizing a lower barrier to entry for new programmers at the cost of safety checks that help prevent humans from making mistakes at runtime.

It seems to me if our conclusion is humans, no matter how skilled, are flawed, we should be willing to do absolutely anything to lower the barrier to entry except removing checks on correctness.


What makes you say that? What safety or correctness features do you think a garbage collector in Rust eschews? On the contrary, gc should make the language safer since it obviates the need for the "unsafe" escape hatch.


See https://users.rust-lang.org/t/bronze-gc-and-aliasing-problem.... The GC allows use after free bugs that are compile time exceptions in Rust.


But the gc is there to test the usability cost of Rust's memory management scheme. A properly written gc would of course not allow for use-after-free bugs.


The problem is that this study is relatively worthless, because the API used by this GC inherently introduces UB. If you have a correct API, the usability tradeoff might be very different.


Can you please point to some results in the study that would have been different if the tracing gc hadn't suffered from use-after-free bugs?


To phrase what my sibling commentor said but in a different way:

This paper is an experiment to see if a particular API could make using Rust easier. This specific API is not sound. This means that the paper's results are kind of irrelevant in a strict sense; this API cannot work in Rust, so it being easier is kind of a moot point. Maybe someone can take their API, make it sound, and then try again, and that's certainly good. But it is a serious flaw in the methodology.

Specifically, this API call is unsound:

  GcRefCell<T>::as_mut(&self) -> &mut T
It's not about the internals of the GC. It's about the API that it exposes to users.

The author has some ideas: https://github.com/mcoblenz/Bronze/issues/2#issuecomment-939... These ones are straightforward, and would probably work, but they would also introduce some friction, and removing friction is the entire point of the exercise. Is it too much? Maybe! It also maybe isn't. You'd need another study to figure that out.


> This paper is an experiment to see if a particular API could make using Rust easier.

According to the author: "The Bronze project is exploring the usability costs of the restrictions that Rust imposes."

That does not require a sound gc to explore. Maybe it would have been different had the goal been to see whether a gc can be bolted onto Rust. Yes, that might be difficult given Rust's semantics and how to deals with pointers. But that doesn't seem to be the goal of the author's research.


Again, it's not about the GC being sound, it's about the API that's being used. There's (imho) not a lot of value in figuring out if an API that can't be used is usable.

There are other GC crates that offer sound APIs. Research using those sounds very promising and good!


Then explain to me why it matters. If none of the toy examples in the research actually required multiple mutable references to the same objects, then why would it matter that the compiler allowed it? I think you missed the point of the research: to "explor[e] the usability costs of the restrictions that Rust imposes."


If they didn’t require it, then that would significantly strengthen the paper! They could have not exposed that API and it would be much more useful. I suspect that they probably did need it though. An immutable GC would only be useful by adding an interior mutability wrapper, which is exactly the kind of friction the paper is trying to avoid.

It matters because the rest of the Rust universe follows this pattern. If they coded their own standard library (and any other allowed libraries) and modified the compiler to not miscompile this usage, then that exploration would be valid. But you’re also going to get friction from miscompiles and fighting with other APIs that do have this property. So it would be more useful to either go all-out ok this idea that Rust’s uniqueness properties are a bad thing, or fix the soundness issues. The halfway step simply confounds too many issues to be truly useful, in my opinion.

I agree that exploring the space is a good idea and useful. My issue is with the methodology, not the concept.


The article does include a thorough description of the tasks. Afaict, none of them requires multiple mutable references to the same objects. It would be strange if any task did as it then wouldn't be implementable in plain Rust.

I think you are missing the point of this research. The author has, given his large sample size, persuasively demonstrated that Rust's borrow checker is confusing to newbies. Data from 428 students is a lot and an order of a magnitude more than many other programming language usability studies. This is in my opinion interesting research, even though I understand why Rust fans doesn't like his results!


> Afaict, none of them requires multiple mutable references to the same objects

They ask to store objects ("turtles") into a single Vec. Two turtles from that Vec can breed to create a child turtle stored in that same vec. Parents must retain a reference (a real ref, not an index or other workaround) to their children, meaning that children have multiple refs. Children can become parents themselves, so all the turtles are mutable.

There you have it: multiple mutable references to the same object. With proper Rust you'll need some kind of RefCell to implement this (convoluted) design. The runtime check will ensure a runtime panic if you try to make the same object mutable via different RefCells (trying to breed a turle with itself). With BronzeGc the compiler will believe that they are different objects, and UB-optimize accordingly.


I don’t think you understand why, given that as myself and several people have stated, the issue isn’t with the results. It’s with the methods. I have no opinion either way about the results, because as far as I can tell, they result from an incoherent premise, which means the conclusion is also incoherent.


> An immutable GC would only be useful by adding an interior mutability wrapper, which is exactly the kind of friction the paper is trying to avoid.

This would be the most idiomatic approach, yes. They could use a code transformation to automate the 'wrapping' and present an intuitive, frictionless interface to the developer.


Totally, and that’s one of the ideas floated in the issue. I think depending on how it could be done it may not even lost out on ergonomics in most cases, but I haven’t worked through it myself.


The problem isn't that the GC suffered from use after free. The problem is that the GC allowed the user to have 2 references to a mutable object which is UB in Rust (and a compile time error without unsafe code). The problem with the API is that without some pretty fundamental changes to the language, the only times a GC doesn't error, and Rust does is when the user tries to do undefined behavior. Removing the UB is also pretty much impossible because if you allow these references, you remove one of the main tools the Rust compiler uses to optimize code.


Having simultaneous mutable references to an object is UB in Rust because the memory manager cannot detect use-after-free errors. What you are talking about is not a feature but a limitation of Rust's borrow checker. But when using a gc multiple multiple references it not a problem at all so there is no reason to prevent it. This has almost nothing to do with performance. Restricting the number of mutable references to one does not mean that the compiler can emit significantly faster code.


> Having simultaneous mutable references to an object is UB in Rust

This is correct.

> because the memory manager cannot detect use-after-free errors.

... this is not. It is UB because the language declares it UB. It is absolutely to the core of the design of the language itself. All Rust code relies on this property to work. Something that breaks it is in fact broken, regardless of any other aspect of the program.


Quoting the article:

> those who [used the GC] required only about a third as much time (4 hours vs. 12 hours).

Surely you can devote a chunk of your newfounded time to find use-after-free bugs.


If that worked, people wouldn't still be finding use after free bugs in commonly used C libraries. The whole reason to use Rust is that it turns most of C's UB into compile time errors so that your code isn't horribly broken in the first place.


I don't know how to find use-after-free bugs in arbitrary code, no matter how much time I devote to the task. I just can't keep enough state in my head at once.


What does "use after free" mean in a garbage collected environment? Isn't the purpose of GC to automatically free after last use.


It means the GC design is fundamentally broken.


> A key tradeoff is that Bronze does not guarantee thread safety

Data races are currently prevented by the borrow checker. Any GC in order to provide equivalent correctness would need to do so as well.


If you read the paper, they needed to throw out the thread safety from Rust's borrow checker in order to make the GC work. That is a massive surface area for bugs they are opening back up to make the language easier for beginners.


I just added a clarification to the README about this. The main issue is that the current implementation keeps track of roots with a shadow stack technique (https://llvm.org/docs/GarbageCollection.html#using-llvm-gcwr...), which is not thread-safe. It wasn't worth the engineering work for this particular study since the tasks didn't require more than one thread. A practical implementation, of course, would need to be thread-safe.


I'm sure the GC implementation itself could be made thread safe. But the paper mentions that Bronze lets you have multiple mutable references to an object. Doesn't this open the door to the user's code having data races, whereas it would be safe by default in vanilla Rust?


Because their gc is a proof-of-concept created specifically for this research. It doesn't even deallocate memory. Most likely, they didn't have their students write any threaded code so thread-safety wasn't a concern. For a production ready tracing gc, of course they would add thread safety, it's not a big problem.


No, this is the fundamental tradeoff they made to make the Bronze'd Rust easy to use:

> Rust permits only one mutable reference to a value at a time... With Bronze, mutation is permitted through all references to each garbage-collected object, with no extra effort. A key tradeoff is that Bronze does not guarantee thread safety; as in other garbage collected languages, it is the programmer’s responsibility to ensure safety.

Allowing mutability anywhere is what fundamentally makes Bronze easier to learn, and more error-prone.


The paper's author has already explained to you how thread safety can be achieved in a production-ready gc. This is not something that is particularly difficult to engineer and is quite orthogonal to whether one chooses to use tracing gc, ref counting or Rust-like borrow checking.


But if equipping Rust with a GC makes it as easy as Python, Rust could become the language of choice because you have the low entry barrier and the option of making your code much more robust within the same language. That seems like a big win for everyone.

I think there can be an unproductive temptation to use a language as gatekeeper of a culture. If you want a culture with shared values about how to do things, there are plenty of ways to do that other than keeping the language from accommodating anyone outside of that culture. And your culture will be more accessible to people who genuinely belong in it, if it's visible from the places they already live (e.g. the Rust language).

The languages you dislike are popular because they make it easy for everyone to contribute, and as a result they have a bigger community and, perhaps even more importantly, a greater breadth and depth of libraries and packages than competitors. Rust could be competitive with these languages if it had a low floor and a high ceiling of admissible code quality and sophistication.


Appealing in theory, but not necessarily attainable in practice.

I use Rust to write CPython extensions... it's a well-known fact that GCs are solitary creatures and you're in for a lot of hassle if you want two different GCs to play nicely together.

I use Rust for all sorts of things because I like the high floor Rust sets for the crates I depend on, and I pair that with being more likely to rewrite or find an alternative for a dependency that uses `unsafe` in a context I don't feel it's merited.

A lot of what the borrow checker brings is fundamentally about requiring the programmer to be more precise about what they intend, rather than patching the cracks with extra CPU/RAM or waiting until runtime and hoping it won't come up.

My main concern here is the risk of Rust running into a more minor version of what happened to D, where you effectively had two language ecosystems and the one with a GC and all the libraries wasn't an alternative to C or C++ and wasn't a very appealing alternative to Java.


im not a great programmer but having used Rust and Python/Scala/Java, i have a hard time believing anyone can be as productive in Rust as they would be in a GCed language for many/most applications. Say building a typical backend or prototyping some linear algebra/ML, the memory management is just a very low level that is likely not a concern and takes away focus from the higher level.


I am much more productive in Rust vs Python and Java, it's not even close. Memory management in Rust is trivial, it takes up virtually 0% of my time.


Fear not! Experts will always be inclined to make tools that match their skill level. You can still buy longbows today!


> ...indicated that ownership, borrowing, and lifetimes were primary causes of the challenges that...

I thought this was kinda obvious. Reasoning about borrowing and lifetimes is hard if you've never done it before. But the real question is how long it takes users to learn it and what we can do to improve that. I wonder how close the plain rust users' times would get to bronze users over time as they keep doing similar tasks.


>> I wonder how close the plain rust users' times would get to bronze users over time as they keep doing similar tasks.

This is the real question for me. The rest is just implementation detail. Someone is (usually) paying for the code to be written. Are they getting more value for less spend over time?

When you look at it through this lens though, there tend to be bigger overriding factors a rust compiler (or any other choice of language) can’t really help with - e.g. is the design & architecture of the code heading in the “ball of mud” direction. I.e. is change velocity going to reduce over time?

With a garbage collector, yeah you most likely are getting more productivity out of a developer. You’ve delegated one of the hardest problems (when can i free this thing?) to the computer. Because of the way rust is designed, it doesn’t protect against memory leaks. It protects against use after free but not that i fail to release redundant memory. But unfortunately It’s not guaranteed that just adding a garbage collector will help with this. With a sufficiently good developer though, they should be able to move faster than if you make them Arc<> & lifetime everything themselves. It’s specifically the lifetime bit that consumes time / developer productivity. Using Arc<> isn’t hard or arduous in any way.


The thing is rusts owner ship system doesn't "just" provide memory handling, it provides reliable hard to get wrong resource handling for all kinds of resources.

To some degree for me this is *more* important then the memory handling aspects.

For example the whole "collection modified while iterating over it" situation can easily become a nightmare, especially for new programmers. It also does happen in non multi-threaded code all the time (if there are some "cross-cutting" concerns). In many GCed languages/libraries running into this situation is "safe" but unspecified. Worse if it happens to work in some situations (it often does) it's not rare for programmers to rely on this behaviour. That is until it gets subtitle broken due to other code changing, or the library changing. In rust this problem is not a thing, the borrow checker prevents you from accidentally running into it by accident. If you need it there are ways to still get it, but now with well define characteristics. So while this is initially more work, it is much much less bug prone and easier to maintain.

And that's just one example.

The borrow checker might be more initial work, but it often makes you write better, less buggy, easier to maintain code independent of memory management.

That is if you don't get over-obsessed with never cloning, never using Rc/Arc etc. then you are in for pain. Being over-obsessed with only using borrows is a form of premature optimizations. (As a side note even with clones/Rc/Arc the borrow checker still helps you as it's not the same as replacing it with a GC).


The short lifetime of most objects should be implicit from scope and obvious to the programmer. It's usually easier that way, and certain GC-based languages ended up adding special syntax to add that level of control for objects when the programmer wants fine-grained control over the destructor.

And on the other hand, optional GC is something non-GC languages can end up adding too, like C++. Garbage collection is required (or convenient) for some problems, cleaning up objects that would be leaked by reference counting or weaker strategies.

EDIT: deemphasised universality of this rule, not really important to my point anyway: that GC/non-GC memory management are more orthogonal than they appear from language feature discussions, in theory. I wasn't trying to imply literally "all" such languages add these features, I have just noticed a number of relevant cases.


Huh? Which languages are you referring to? Javascript, Lua, Python and ruby are all GC based languages with (as far as I know) no special syntax for short lived objects. Does Go have syntax like that? The only GC language I can think of with special stack variables is C#. Are there others? Does Java do this now?

And on the other side, I’ve still never seen a production C or C++ codebase which used a garbage collector. The closest I’ve seen is refcounting - but that’s a very different beast compared to what V8, .net, Go, etc do internally to garbage collect objects.

The only languages I’ve seen which mix GC and non-GC code are D, Nim (I think), with an honourable mention for Obj-C / Swift’s ARC. But again, I’m not sure ARC belongs in the same category as the generational GCs we see in the JVM and others.

With a few obscure exceptions, it seems to me that the gc/ non-gc line is still pretty firm for most programming languages.


> The only GC language I can think of with special stack variables is C#. Are there others? Does Java do this now?

There are tons of languages that allow that, even if many of them failed to gain mainstream adoption.

- Mesa/Cedar (https://yahnd.com/theater/r/youtube/z_dt7NG38V4/)

- CLU

- Modula-2+

- Modula-3

- Oberon

- Oberon-2

- Oberon-07

- Active Oberon

- Component Pascal

- Oberon-V

- D

- Nim

- Swift (RC is a GC algorithm, chapter 5, https://gchandbook.org/)

- Eiffel

- C++/CLI

- Unreal C++

- Linear Haskell (getting there still WIP)

- Pony

I kept Nim and Swift for completion as you already mentioned it before.


Note that "short lived objects" and "stack allocated objects" aren't exactly the same set. Common Lisp has a dynamic-extent declaration, which can be used to stack allocate [1]. Go and Java implementations perform escape analysis. There can have some wins if stack allocatability is tracked at runtime, rather than compile-time [2]; Cliff Click reported roughly doubling the number of stack-allocated objects by using escape detection rather than analysis [3].

At least the Inkscape vector editor uses the Boehm garbage collector [4]. There are several programming language implementations which use Boehm and C, but one wonders if it is still a "C codebase" then.

[1] http://www.lispworks.com/documentation/HyperSpec/Body/d_dyna...

[2] Henry Baker, CONS should not CONS its arguments https://www.cs.tufts.edu/~nr/cs257/archive/henry-baker/cons-...

[3] https://youtu.be/5uljtqyBLxI?t=791

[4] http://inkscape.gitlab.io/inkscape/doxygen/namespaceInkscape...


Thankyou. I didn’t know about Inkscape, or anything about Common Lisp. Javascript (in V8) also has special detection & processing for very short lived objects. I appreciate the links!

But I was specifically responding to / confused by this claim in the GP comment:

> GC-based languages all ended up adding special syntax to add that level of control for objects when the programmer wants fine-grained control over the destructor. (Emphasis mine)

And as you say, Java, V8 and Go do this with heuristics, not syntax. And I’m not sure if Ruby or Python do this sort of escape / generational analysis at all. On the flip side, it looks like Lua has added something like this, and C# has stack-allocated structs.

Lua + C# + CommonLisp falls very short of the “all GC languages” claim.


> And I’m not sure if Ruby or Python do this sort of escape / generational analysis at all.

It would almost certainly be implementation specific (as it is with js really, v8 is not a langage).

I would assume pypy does escape analysis, I’m almost certain cpython does not.


Okay I have edited the post so it does not say "all".


Lua for instance has specific syntax added for it: https://www.lua.org/manual/5.4/manual.html#3.3.8

Languages tend to have workarounds or conventions for this kind of code where there isn't special syntax for it: https://stackoverflow.com/a/865272 (Python)

Also just because C or C++ codebases don't use GC doesn't mean it's not useful. I've never used it either, but I can think of situations where I would use it. Like I've not used most of the algorithms I learned at uni but I can still think of where I might use them.


If you replace "short-lived objects" with "short-lived resources" then Go has defer, Python has context managers.


Is Go’s defer used to optimize the GC? I thought it was just syntax sugar to make sure people wouldn’t forget to close file handles and things like that..?


It is for deterministically release resources based on scope (although Go's defer is tied to function-scope for reasons I can't understand). python's context manager is similar. Much like destructors in C++.

You can probably say the same about heap allocated memory, but GC'd languages make the choice the non-deterministic release of heap allocations are fine. But there are resources where you can't play the same game (open files, connections, locks) and you need a way to deterministically close them.


defer is tied to function scope because that is the usecase it was designed for. go also has finalizers executing right before deallocation.


I can think of only Common Lisp, which has optional optimization hint that specific object won't get passed outside of current stack (declaring something DYNAMIC-EXTENT, to be specific). Mind you, it's only a hint, and there were specific techniques for writing "nonconsing" code (in Lisp lingo, "non-allocating", thus not creating garbage)


Short lifetime is not usually implicit from scope, in languages with GC. Function return values are the problem and solving this problem is the primary ergonomic of GC which in turn affects the style of code written.

If you allocate a value to return, you don't in general know how it will be deallocated because that depends on the caller. The caller may use it for a short time (so stack allocation could work) or store it in the heap where it will have indefinite lifetime. This is why it's not implicit from the scope.

Data flow compounds the problem: if references to arguments to the function are stored in the allocated return value, then the return value extends their lifetime, but again, the writer of the function doesn't know by how much.

Simple refactoring which abstracts code into more reusable functions will obscure local reasoning about allocation.

Working around these problems in the absence of GC or lifetime analysis means adopting various ad-hoc fixes, like calling conventions which pass the location for the return value as a hidden parameter, or smart pointers which dynamically track ownership, or coding conventions like caller always allocates, or global conventions like zone allocation.

The big programming ergonomic side-effect of the presence of GC is the ability to more freely write and refactor functions that allocate and freely reuse them elsewhere. Conversely, code written without GC will be less factored into reusable functions.

Rust's lifetime management is somewhere in the middle. It requires particular conventions and if you're able to follow the conventions then ergonomics approach that of GC. But if you want cyclic data structures, or to simply pass data which might enable the construction of cyclic data structures, you need to think harder.


didn't they remove GC from C++?


I honestly don't know, maybe. There are libraries for it too though. https://hboehm.info/gc/


> To see whether Bronze could make Rust more usable, we conducted a randomized controlled trial with volunteers from a 633-person class, collecting data from 428 students in total.

Well, this paper starts off with a bang.

A quick search on github returns this repo [1] by mcoblenz who I assume is the first author Michael Coblenz. Unfortunately nothing but the readme has been updated in 10 months and I couldn't find a link to the latest source with a skim of the paper and bibliography.

[1] https://github.com/mcoblenz/Bronze


This (unfortunately) is rather typical in academia - finish the paper and publish and move on to the next research topic. It would be a shame if the version of Bronze related to the paper forgot to be pushed, but I'm sure sending him a quick email will resolve that.


I think providing a garbage collector as training wheels as the authors are proposing would be counterproductive to helping beginners truly learn Rust.


Manish Goregaokar has a wonderful blog post on garbage collectors in Rust, the current implementations, the trade-offs and where they might be worthwhile: https://manishearth.github.io/blog/2021/04/05/a-tour-of-safe...


There are a lot of Rust GC approaches here, makes me wonder why the study author didn't use one (or more) of the existing sound GCs. Remove a glaring flaw of the study, and spend less time writing throw-away code.


> [the borrow checker] makes Rust relatively hard to learn and use.

I would say the borrow checker makes rust harder to learn, but it does not make it harder to use once you actually grok it. When that time comes, the borrow checker fades into the background. When I got to that point with Rust, it became much easier to use than other languages for me (others I use being C++ and C).


From https://crates.io/crates/bronze_gc : "The Bronze garbage collector for Rust. This version only includes the API for creating and using GC references; it does not actually collect anything. For experimental purposes only." (my emphasis)

This seems to be at odds with the paper, which describes it as mark-sweep?


From Github:

> This implementation is experimental. The 'main' branch has a collector, but it only works in limited cases is not general enough to work with YOUR code. The 'API-only' branch has the collector disabled; be aware that you will eventually run out of memory. However, the present version is suitable for experimentation and prototyping.


That's sort of disappointing. Why do people insist on "releasing" and marketing before the product is even finished nowadays?


Because we’ve all bought into the lie that branding is the most important part of product development.


I don't think bronze is actually intended to be used outside of the context of the study. My impression is that it exist purely to make it possible to to conduct a study to quantify the overhead of constructing code that can pass the borrow checker.

It's possible the team behind this may have hoped to use the study as part of a grant proposal to get funding to build out a more robust GC for rust.


Is Bronze memory safe?



That’s an evil problem to have. Let’s say you derive two references to a GC’d object (this can easily happen accidentally in a graph-like data structure) and pass them both as parameters to a function. I’ve looked at disassembled Rust code before. Given how aggressively the compiler optimizes based on the guarantees around & and &mut, I’d be surprised if this system compiles any non-toy programs correctly at all.


Is a study based on students the best way to measure the productivity of a language? It would be interesting to see what Rust experts code do with and without, for example.


Why would you want to add garbage collection to rust? Doesn't that undermine one of its key value propositions?


Rust is already extremely easy to use.


Why do we need to make everything easier to beginners? Are the beginners supposed to stay as beginners forever?


Valid point, but if a good system deters beginners by being too complex and hard to use - adoption will suffer. We want good systems to have high adoption.


I was fortunate to see Michael present this recently, and found the presentation convincing.

I may be a bit sheltered, but this was my first exposure to a study with so many volunteers completing involved coding tasks to assess language features.

I find Figure 3 to be particularly convincing: http://shitmyself.com/image/figure3.png


> http://shitmyself.com/image/figure3.png

Why are you posting a link to a page which requires authentication?


If you refuse to provide authentication, the page will tell you the authentication to provide. Not sure why this mechanism is in place.


It is basically a captcha to keep out crawler bots.

I added an exception for images and then overwrote the config with a previous version... -_-

Should be fixed now.


Sorry about that, I fixed the configuration and it shouldn't require auth anymore.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: