More

pornel · 2024-12-24T18:59:39 1735066779

A borrow checker that isn't "viral all the way down" allows use-after-free bugs. Pointers don't stop being dangling just because they're stashed in a deeply nested data structure or passed down in a way that [[lifetimebound]] misses. If a pointer has a lifetime limited to a fixed scope, that limit has to follow it everywhere.

The borrow checker is fine. I usually see novice Rust users create a "viral" mess for themselves by confusing Rust references with general-purpose pointers or reference types in GC languages.

The worst case of that mistake is putting temporary references in structs, like `struct Person<'a>`. This feature is incredibly misunderstood. I've heard people insist it is necessary for performance, even when their code actually returned an address of a local variable (which is a bug in C and C++ too).

People want to avoid copying, so they try to store data "by reference", but Rust's references don't do that! They exist to forbid storing data. Rust has other reference types (smart pointers) like Box and Arc that exist to store by reference, and can be moved to avoid copying.

germandiago · 2024-12-24T23:56:30 1735084590

> Pointers don't stop being dangling just because they're stashed in a deeply nested data structure or passed down in a way that [[lifetimebound]] misses

This is the typical conversation where it is shown what Rust can do by shoehorning: if you want to borrow-borrow-borrow from this data structure and reference-reference-reference from this function, then you need me.

Yes, yes, I know. You can also litter programs with globals if you want. Just avoid those bad practices. FWIW, references break local reasoning in lots of scenarios. But if you really, really need that borrowing, limit it to the maximum and make good use of smart pointers when needed. And you will not have this problem.

It looks to me like Rust sometimes it is a language looking for problems to give you the solution. There are patterns that are just bad or not adviced in most of your code and hence, not a problem in practice. If you code by referencing everything, then Rust borrow-checker might be great. But your program will be a salad of references all around, which is bad in itself. And do not get me started in the refactorings you will need every time you change your mind about a reference deep somewhere. Bc Rust is great, yes, you can do that cool thing. But at what cost? Is it even worth?

I also see all the time people showing off the Send+Sync traits. Yes, very nice, very nice. Magic abilities. And what? I do my concurrent code by sharing as little as possible all the time. So the patterns of code where things can be messed up are quite localized.

Because of this, the borrow checker is basically something that gets a lot in the way but does not add a lot of value. It might have its value in hyper-restricted scenarios where you really need it, and I cannot think of a single scenario where that would be really mandatory and really useful for safety except probably async programming (for which you can do structured concurrency and async scopes still in C++ and I did it successfully myself).

So no, I would say the borrow checker is a solution looking for problems because it promotes programming styles that are not clean from the get go. And only in this style it is where the borrow checker shines actually.

Usually the places where the borrow checker is useful has alternative coding patterns or lifetime techniques and for the few ones where you really want something like that, probably the code spots are small and reviewable anyway.

Also, remember that Rust gives you safety from interfaces when you use libraries, except when not, bc it basically hides unsafe underneath and that makes it as dangerous as any C or C++ code (in theory). However, it should be easier to spot the problems which leads more safety in practice. But still, this is not guaranteed safety.

The borrow checker is a big toll in my opinion and it promotes ways of coding that are very unergonomic by default. I'd rather take something like Swift or even Hylo any day, if it ever reaches maturity.

nostradumbasp · 2024-12-25T13:25:37 1735133137

In general I view the borrow checker as a good friend looking over my shoulder so I don't shoot myself in the foot in production. 99 times out of 100 when the borrow checker complains it's because I did something stupid/wrong. 0.99 times out of 100 I think the borrow checker is wrong when I am in fact wrong. 0.01 times out of 100 the borrow checker fumbles on a design pattern it maybe shouldn't so I change my design. Usually my life is way better for changing the design after anyways.

The thing is, you don't need to have refs of refs of refs of refs of refs. You can clone once in a while or even use a smart pointer. You'll find in 99.99% of cases the performance is still great compared to a GC language. That's a common issue for certain types of people learning how to write Rust. I can't think of any application that needs everything to be a reference all the time in Rust.

As far as "mandatory" goes for choosing a language. We can all use ASM, or C, write everything from scratch. It's a choice. Nothing is mandatory. No one is saying you HAVE to use Rust. Lots of people are saying "when I use it my life is way better", that's different. There was a recent post here where people say they don't use IDE's with LSP or autocomplete. A lot of people are going to grimace at that, but no one is saying they can't do that.

pornel · 2024-12-25T15:09:09 1735139349

> I also see all the time people showing off the Send+Sync traits. Yes, very nice, very nice. Magic abilities. And what? I do my concurrent code by sharing as little as possible all the time. So the patterns of code where things can be messed up are quite localized.

They check whether your code really shares as little as you think, and prevent nasty to debug surprises.

The markers work across any distance, including 3rd party dependencies and dynamic callbacks, so you can use multi-threading in more situations.

You're not limited to basic data-parallel loops. For example, it's immensely useful in web servers that run multi-threaded request handlers that may be calling arbitrary complex code.

> places where the borrow checker is useful has alternative coding patterns

There's a popular sentiment that smart pointers make borrow checker unnecessary, but that's false. They're definitely helpful and often necessary, but they're not an alternative to borrow checking.

Rust had smart pointers first, and then added borrowing for all the remaining cases that smart pointers can't handle or would be unreasonable to use.

Borrowing checks stack pointers. Checks interior pointers to data nested inside of types managed by smart pointers (so you don't have to wrap every byte you access in a smart pointer). It allows functions safely access data inside unique_ptr without moving it away or switching to shared_ptr. Prevents using data protected by a lock after the lock has been unlocked. Prevents referencing implicitly destroyed temporary objects. Makes types like string_view and span not a footgun.

> the borrow checker is basically something that gets a lot in the way

This is not the case for experienced Rust users.

Borrow checker is a massive obstacle to learning and becoming fluent in Rust. However, once you "get" it, it mostly gets out of the way.

Once you internalise when you can and can't use borrowing, you know how to write code that won't get you "stuck" on it, and avoid borrow checking compilation errors before they happen. And when something doesn't compile, you can understand why and how to fix it. It's a skill. It's not easy to learn, but IMHO worth learning more than C++'s own rules, Core Guidelines, UB, etc. that aren't easy either, and the compiler can't confirm whether you got them correct.

germandiago · 2024-12-25T17:50:01 1735149001

> Borrowing checks stack pointers. Checks interior pointers to data nested inside of types managed by smart pointers (so you don't have to wrap every byte you access in a smart pointer). It allows functions safely access data inside unique_ptr without moving it away or switching to shared_ptr. Prevents using data protected by a lock after the lock has been unlocked. Prevents referencing implicitly destroyed temporary objects. Makes types like string_view and span not a footgun.

I understand part of the value the borrow checker brings. Actually my complaint it is more about having a full borrow checker and viralize everything than about having the analysis itself. For example Swift and Hylo do some borrow-checking analysis but they do no extend that to data structures and use reference counting (with elision I think) and value semantics.

The problem with the borrow checker is not the analysis. It is the virality. Without the virality you cannot express everything. But with the amount of borrow checking that can be done through other conventions (as in Hylo/Swift) and leaving out a part of the story I think things are much more reasonable IMHO.

There are so many ways to workaround/just review code in(assuming the cases left are a bunch of those) the remaining spots that presenting a fully viral borrow checker to be able to represent so many situations (and on top of that promoting references everywhere, which breaks local reasoning) that I question the value of a full borrow checker with full virality. It also sets the bar higher for any refactoring in many situations.

> Borrow checker is a massive obstacle to learning and becoming fluent in Rust. However, once you "get" it, it mostly gets out of the way.

This is just not true for many valid patterns of code. For example, data-oriented programming seems to be a nightmare with a borrow checker. Linked structures are also something that is difficult. So it is not only "getting it", it is also that for certain patterns it is the borrow checker who "gets you", in fact, "kidnaps you away" from your valid coding patterns.

> but IMHO worth learning more than C++'s own rules, Core Guidelines, UB, etc

I admit to be more comfortable with C++ so it is my comfort zone. But there are middle solutions like Swift or (very experimental) Hylo that are worth a try IMHO. A full, embedded borrow checker with lifetime annotations is a big ergonomy problem that brings value if you abuse references, but when you do not, the value of the borrow checker is lower. Same for escaping references several levels up... why do it? I think it is just better to try to avoid certain coding patterns. Not because of Rust itself. Just as general coding style in any language...

> that aren't easy either, and the compiler can't confirm whether you got them correct.

Not all as of today, but a subset yes, there are linters. Also, there is an effort to incrementally increase the value of many analysis. It will never be as perfect as Rust's, I am sure of that. But I am not particularly interested either. What I would be more interested in is if with what can be fixed and improved the delivered software has the same defect rates as Rust lifetime-wise. This is counter-intuitive bc it looks like the better the analysis, the better the outcome, but here two factors also play the game IMHO:

  1. not all defects are evenly distributed. This means that if the things that can be lifetime-checked are a big amount of typical lifetime checks in C++, even if not all kinds such as Rust's can be done, it can get statistically very close.
  2. once the spots for unsafe code are more localized, I expect the defects rate to decrease more than linearly, since now the attention is focused on fewer code spots.

Let us see what comes from this. I am optimistic that the results will be better than many people predict in ways that look to me too academic but without taking into account other factors such as defect density in clusters and reduction of surface to inspect by humans bc it cannot be verified to be safe.

pornel · 2024-12-24T11:47:18 1735040838

It is an abstraction, but the safety requirements are neither enforced nor abstracted away.

C's type system can't communicate how long pointers are valid for, when and where memory gets freed, when the data may be uninitialized, what are the thread-safety requirements, etc. The programmer needs to know these things, and manually ensure the correct usage.

pornel · 2024-12-21T12:15:45 1734783345

It's annoying that there's interest in these stats mainly as an argument against renewable energy, not from perspective of wildlife preservation. Just those particular birds are precious, not the others killed by other man-made structures, pollution, and habitats destroyed by expansion of agriculture.

I'd like to see not just more precise numbers of birds lost to wind energy, but the environmental and societal costs of not having the wind energy. Fuel extraction and processing has its environmental impact too. Lack of affordable energy (fuel poverty) costs human lives too. How many human lives are harmed to save a bird from a windmill?

mcv · 2024-12-23T09:54:39 1734947679

Yeah, it's kinda weird, the kind of people who are suddenly pretending to be into wildlife preservation. If they were honest about it, they'd also look into bigger bird killers like high rise buildings, powerlines, cars, domestic cats. Also, climate change disrupting ecosystems is unlikely to be good.

But it's probably a good idea to build wind farms outside major migratory routes.

pornel · 2024-12-21T11:15:59 1734779759

I think there was little traction in curl, because Rust users can just use hyper directly.

https://lib.rs/curl is used in 1000 packages, https://lib.rs/hyper is in 21,000 packages.

Curl is big and supports lots of protocols, but Rust doesn't need a one-stop-shop for all of them. HTTPS covers majority of uses, and Rust has separate packages for mail, ftp, websockets, etc.

usr1106 · 2024-12-22T06:57:56 1734850676

Hmm, wasn't it the other way round? Curl using a Rust library (hyper) instead of Rust programs using curl?

Disclaimer: Just reading TFA, not an active Rust programmer.

pornel · 2024-12-22T10:35:20 1734863720

Yes, but curl-with-hyper needed Rust programmers to finish and maintain the integration with the C codebase, and couldn't find anyone interested enough, which I assume is because Rust users don't need curl.

aragilar · 2024-12-22T11:30:00 1734867000

It sounds like a lack of funding (i.e. the grant ran out) was the real issue. Given how high profile curl is, this raises the question of how sustainable rewrite-in-rust efforts driven by grants (or other short-term funding) are, if they don't have an existing rust community to take advantage of the grant.

pornel · 2024-12-22T13:53:42 1734875622

This wasn't a rewrite-in-Rust effort, and I think that's the problem. Nothing valuable from curl has been rewritten in Rust.

Only some existing Rust code has been added to curl, but the Rust ecosystem already has a better, safer way of using that code.

Curl is not planning to ever require Rust, so the rewrites are limited only to optional components, and can't fully guarantee that the code is safe. The Rust components are required to expose an unsafe unchecked C interface for the rest of curl. C compilers are unable to enforce the safety invariants of the interface, like the Rust compiler would in a program fully written in Rust.

aragilar · 2024-12-23T10:30:04 1734949804

Probably I misread the original announcement, but I got the impression this was a pilot to adding more rust to curl (and is exactly what a rewrite would start to look like)?

pornel · 2024-12-26T23:44:28 1735256668

The original announcement explicitly said it's not a rewrite, multiple times: https://daniel.haxx.se/blog/2020/10/09/rust-in-curl-with-hyp...

> A rewrite of curl to another language is not considered

> This is not converting curl to Rust.

> Don’t be fooled into believing that we are getting rid of C in curl by taking this step.

Curl plans to live and die with C: https://daniel.haxx.se/blog/2017/03/27/curl-is-c/

This has been reaffirmed recently in https://daniel.haxx.se/blog/2024/08/06/libcurl-is-24-years-o...

> There was never any consideration to use another language than C for the library […] Not then, not now.

Not only curl is rejecting possibility of being rewritten in Rust, it's also committed to supporting completely-Rust-free curl, because they pride themselves on supporting a lot of retro/niche platforms that Rust doesn't exist on.

IshKebab · 2024-12-22T16:07:37 1734883657

I think you're right. Curl is a rich source of C era vulnerabilities (memory safety, weak typing, etc.). Anyone using Rust has already decided they don't want anything to do with that.

pizlonator · 2024-12-22T16:11:09 1734883869

Except when they use `unsafe`, which they do, a lot.

Philpax · 2024-12-22T17:17:55 1734887875

Can you back up that claim? In my experience, the vast majority of Rust is safe, or built on communally-audited safe abstractions over unsafe code.

pizlonator · 2024-12-22T17:46:42 1734889602

Using communally audited abstractions over unsafe code means you’re using unsafe a lot.

If there was some way to prove that the abstraction is safe, then that would be fine. But the inadequacy of communal auditing is the reason why C has security issues.

Philpax · 2024-12-22T18:47:57 1734893277

The area of Rust code that is unsafe is much, much smaller than the amount in equivalent C code, making it much more tractable to audit. I won't pretend that it's perfect, but it's not remotely comparable to C.

pizlonator · 2024-12-23T00:51:55 1734915115

There’s no easy bound on the set of code you’d have to audit to confirm that even one use of unsafe is in fact safe.

burjui · 2024-12-29T19:52:40 1735501960

It's literally THE unsafe part of the code. It's the only part of code that can invoke UB.

  fn do_something() {
      unsafe { ... }
  }

  // Somewhere in the program
  do_something();

Doesn't matter where "do_something" is used and how much. The only possibly problematic part of this code is the unsafe block. You only audit it.

meltyness · 2024-12-22T18:08:52 1734890932

But if you can manually identify an invariant inside an abstraction it can greatly improve performance for callers/users, additionally, tools like Kani use comprehensible macros to facilitate automatically proving safety of `unsafe` code. Not to mention built in linting, package management, docs, and FP that rust/std provides. Lots has been said about unsafe rust, but the most basic libc tools require the whole cascade of upstream callers to check safety, it's basically backwards from the ground up from a resources and an outcomes perspective.

IshKebab · 2024-12-22T21:52:31 1734904351

Use of unsafe is very rare (except for FFI to C where it's unavoidable). I've written tens of thousands of lines of Rust and used `unsafe` exactly once.

burjui · 2024-12-29T20:18:26 1735503506

Exactly the same experience : my toy compiler project has 10477 lines of Rust code, and there is only one line with unsafe:

  let is_tty = unsafe { libc::isatty(libc::STDERR_FILENO) } != 0;

Here's the source, just in case: https://github.com/burjui/rambo/

In fact, there are exactly two "unsafe" blocks in all of my Rust projects, and the second one is not even needed anymore because of the language and ecosystem improvements, but the project is basically abandoned, so I'm probably not gonna fix it. There's just no need for unsafe in the vast majority of code.

I don't know where Rust critics get their statistics; probably from picking their noses, judging by their arguments. Most don't seem to even have read the official docs, the bare minimum required to form any conclusions at all about the language. I guess they don't read much Rust code and think there is no way we can write even semi-decently performing high-level code without resorting to unsafe hacks or calling C, because that's the way it's done in other languages.

methou · 2024-12-21T23:09:41 1734822581

yeah, rust is mainly for developers,and curl are for sysadmins and their derivatives.

b5n · 2024-12-22T20:03:08 1734897788

_lib_curl

https://curl.se/libcurl/

https://curl.se/docs/companies.html

xign · 2024-12-29T00:50:08 1735433408

I have worked in other large companies that use libcurl and they aren't even listed above. It's pretty much the de facto way to do HTTP requests in C-land unless you really want to write your own backend for some reason. The world still primarily runs on C/C++, not Rust.

meltyness · 2024-12-22T18:16:33 1734891393

I could see rust catching on as a shell scripting language too, honestly. Well documented, typed abstractions over shell utilities is really quite nice.

colejohnson66 · 2024-12-23T01:06:24 1734915984

I disagree. I would hate to wrangle a borrow checker just to write a simple script.

burjui · 2024-12-29T20:32:22 1735504342

Why it's always the borrow checker that bugs the haters? You don't even see it work most of the time, because instead of using references everywhere, like most C++ers are conditioned to do, not only you can move values (doesn't necessarily mean actually performing memory operations), but also the compiler checks that you don't use the variable from which the value has been moved. And even if you do use references everywhere, they don't usually cause any problems, unless you decide to put more than one reference in a often used struct. I often use both shared (RO) and exclusive (RW) references as function arguments and rarely have to specify lifetimes manually, because automatic lifetime elision done by the compiler is sufficient in most cases.

colejohnson66 · 2024-12-30T18:20:25 1735582825

Who said I’m a hater? That was a very aggressive response. All I said is that I think Rust is not a good language for a scripting system. In scripting, I’m not always writing something “correct”, but good enough. Mutability everywhere helps do so, but the borrow checker gets in the way of that.

meltyness · 2024-12-23T01:20:16 1734916816

I've reached some mechanical sympathy with it, I think.

You get some intuition for which methods to use, many structs have a pretty breadthy pallet, and then you can mostly just ignore it; quickly write your algorithm and let the compiler walk you through any oversights with the way you may have coded processing tasks.

Also helps to know that the docs support search by type signature since functions operate on or return owned, borrowed, or mutably borrowed, and generally just think of borrowed as a pointer for heuristic reasons.

It makes more sense than I do, usually.

project2501a · 2024-12-22T15:57:13 1734883033

That is quite the arbitrary distinction. There are plenty of sysadmins that code and/or hold compsci degrees.

croemer · 2024-12-22T16:52:13 1734886333

Not if you ask meta.stackoverflow.com

pornel · 2024-12-21T02:08:44 1734746924

Rust's ownership model is close enough for translating C. It's just more explicit and strongly typed, so the translation needs to figure out what a more free-form C code is trying to do, and map that to Rust's idioms.

For example, C's buffers obviously have lengths, but in C the length isn't explicitly tied to a pointer, so the translator has to deduce how the C program tracks the length to convert that into a slice. It's non-trivial even if the length is an explicit variable, and even trickier if it's calculated or changes representations (e.g. sometimes used in the form of one-past-the-end pointer).

Other C patterns like `bool should_free_this_pointer` can be translated to Rust's enum of `Owned`/`Borrowed`, but again it requires deducing which allocation is tied to which boolean, and what's the true safe scope of the borrowed variant.

smolder · 2024-12-21T06:04:09 1734761049

It's not that simple. In fact it's impossible in some cases if you don't sprinkle unsafe everywhere and defeat the purpose. Rusts restrictions are so that it can be statically analyzed to guarantee safety. The superset of all allowable C program behaviors includes lots of things that are impossible to guarantee the safety of through static analysis.

Formally verified C involves sticking to a strict subset of the capabilities of C that is verifiable, much like Rust enforces, so it makes sense that programs meeting that standard could be translated.

pizlonator · 2024-12-21T02:47:59 1734749279

Rust’s ownership model forbids things like doubly linked lists, which C programs use a lot.

That’s just one example of how C code is nowhere near meeting Rust’s requirements. There are lots of others.

orf · 2024-12-21T03:10:46 1734750646

> Rust’s ownership model forbids things like doubly linked lists, which C programs use a lot.

It’s literally in the standard library

https://doc.rust-lang.org/std/collections/struct.LinkedList....

quuxplusone · 2024-12-21T03:49:59 1734752999

But it's not in C's standard library. So the exercise isn't merely to auto-translate one language's standard library to another language's standard library (say, replacing C++ std::list with Rust LinkedList) — which would already be very hard. The exercise here is to auto-identify-and-refactor idioms open-coded in one language, into idioms suited for the other language's already-written standard library.

Imagine refactoring your average C program to use GLib for all (all!) of its data structures. Now imagine doing that, but also translating it into Rust at the same time.

Animats · 2024-12-21T04:55:18 1734756918

> The exercise here is to auto-identify-and-refactor idioms open-coded in one language, into idioms suited for the other language's already-written standard library.

That's what LLMs are for - idiom translation. You can't trust them to do it right, though.

[Pan et al . 2024] find that while GPT-4 generates code that is more idiomatic than C2Rust, only 61% of it is correct (i.e., compiles and produces the expected result), compared to 95% for C2Rust.

This problem needs both AI-type methods to help with the idioms and formal methods to insure that the guessed idioms correctly capture the semantics.

A big advance in this project is that they can usually translate C pointer arithmetic into Rust slices. That's progress on of one of the hardest parts of the problem. C2Rust did not do that. That system just generates unsafe raw pointer arithmetic, yielding ugly Rust code that replicates C pointer semantics using function calls.

DARPA is funding research in this area under the TRACTOR program. Program awards in April 2025, so this is just getting started. It's encouraging to see so much progress already. This looks do-able.

fuhsnn · 2024-12-21T06:14:58 1734761698

>That's what LLMs are for - idiom translation. You can't trust them to do it right, though.

Optimizing C compilers also happened to be good at idiom recognition, and we can probably trust them a little more. The OP paper does mention future plan to use clang as well: >We have plans for a libclang-based frontend that consume actual C syntax.

If such transformation can be done at IR level it might be more efficient to be to C-IR > idiom transform to Rust-IR > run safe-checks in Rust-IR > continue compilation in C-IR or Rust-IR or combining both for better optimization properties.

swiftcoder · 2024-12-21T07:40:06 1734766806

I'm definitely bullish on this angle of compiling C down to LLVM assembly, and then "decompiling" it back to Rust (with some reference to the original C to reconstruct high-level idioms like for loops)

immibis · 2024-12-21T07:10:19 1734765019

Actually, LLMs are for generating humorous nonsense. Putting them in charge of the world economy was not intended, but we did it anyway.

dhosek · 2024-12-21T07:40:18 1734766818

Given that in my (small, employer-mandated) explorations with Copilot autocompletions it’s offered incorrect suggestions about a third of the time and seems to like to also suggest deprecated APIs, I’m skeptical about the current generation’s ability to be useful at even this small task.

LeFantome · 2024-12-21T17:23:57 1734801837

Have you seen O3?

If your experience with something less than half as good as state-of-the-art is that it worked 66% of the time, I am not sure why you would be so dismissive about the future potential.

glouwbug · 2024-12-21T15:59:49 1734796789

Sure but it takes two copilots to fly a plane

saghm · 2024-12-21T06:05:25 1734761125

Oh god, I can't even imagine trying to have formally-verified LLM-generated code. It's not surprising that even incremental progress for that would require quite a lot of ingenuity.

CodesInChaos · 2024-12-21T12:56:02 1734785762

Why does C2Rust produce so much incorrect code? Getting 5% wrong sounds terrible, for a 1:1 translation to unsafe Rust. What does it mis-translate?

https://dl.acm.org/doi/pdf/10.1145/3597503.3639226

> As for C2Rust, the 5% unsuccessful translations were due to compilation errors, the majority of them caused by unused imports.

I'm rather confused by what that's supposed to mean, since unused imports cause warnings, not errors in Rust.

singron · 2024-12-21T03:33:09 1734751989

This implementation uses unsafe. You can write a linked list in safe rust (e.g. using Rc), but it probably wouldn't resemble the one you write in C.

In practice, a little unsafe is usually fine. I only bring it up since the article is about translating to safe rust.

orf · 2024-12-21T03:36:29 1734752189

Safe rust isn’t “rust code with absolutely 0 unsafe blocks in any possible code path, ever”. Rc uses unsafe code every time you construct one, for example.

Unsafe blocks are an escape hatch where you promise that some invariants the compiler cannot verify are in fact true. If the translated code were to use that collection, via its safe interfaces, it would still be “safe rust”.

More generally: it’s incorrect to say that the rust ownership model forbids X when it ships with an implementation of X, regardless of if and how it uses “unsafe” - especially if “unsafe” is a feature of the ownership model that helps implement it.

andrewflnr · 2024-12-21T04:01:36 1734753696

No one here is confused about what unsafe means. The point is, they're not implemented by following Rust's ownership model, because Rust's ownership model does in fact forbid that kind of thing.

You can nitpick the meaning of "forbids", but as far as the current context is concerned, if you translate code that implements a doubly linked list (as opposed to using one from a library) into Rust, it's not going to work without unsafe. Or an index-based graph or something.

oneshtein · 2024-12-21T04:56:32 1734756992

It's easy to implement doubly linked lists in safe Rust. Just ensure that every element has one OWNER, to avoid «use after free» bugs, or use a garbage collector, like a reference counter.

Unlike C++ or Rust, C has no references, only pointers, so developer must release memory manually at some arbitrary point. This is the problem and source of bugs.

saghm · 2024-12-21T06:15:04 1734761704

While I might agree that it's easy if you use a reference counter, this is not going to be as performant as the typical linked list written in C, which is why the standard library uses unsafe for its implementation of stuff like this. If it were "easy" to just write correct `unsafe`, then it would be easy to do it in C as well.

Note that the converse to this isn't necessarily true! People I trust way more to write unsafe Rust code than me than me have argued that unsafe Rust can be harder than writing C in some ways due to having to uphold certain invariants that don't come up in C. While there are a number of blog posts on the topic that anyone interested can probably find fairly easily by googling "unsafe Rust harder than C", I'll break my usual rule of strongly preferring articles to video content to link a talk from youtube because the speaker is one of those people I mention who I'd trust more than me to write unsafe code and I remember seeing him give this talk at the meetup: https://www.youtube.com/watch?v=QAz-maaH0KM

bonzini · 2024-12-21T11:17:46 1734779866

> unsafe Rust can be harder than writing C in some ways due to having to uphold certain invariants that don't come up in C.

Yes, this is absolutely correct and on top of this you sometimes have to employ tricks to make the compiler infer the right lifetime or type for the abstraction you're providing. On the other hand, again thanks to the abstraction power of Rust compared to C, you can test the resulting code way more easily using for example Miri.

imtringued · 2024-12-21T11:34:10 1734780850

I don't really see it as a big "owning" of Rust that a complex pointer heavy structure with runtime defined ownership cannot be checked statically. Almost every language that people use doubly linked lists in has a GC, making the discussion kind of meaningless.

So C and C++ are the exceptions to the rule, but how do they make it easy to write doubly linked lists? Obviously, the key assumption is that that the developer makes sure that node->next->prev = node->prev->next = node (Ignoring nullptr).

With this restriction, you can safely write a doubly linked list even without reference counting.

However, this isn't true on the pointer level. The prev pointers could be pointing at the elements in a completely random order. For example tail->prev = head, head->prev = second_last and so on. So that going backwards from the tail is actually going forwards again!

Then there is also the problem of having a pointer from the outside of the linked list pointing directly at a node. You would need a weak pointer, because another pointer could have requested deletion from the linked list, while you're still holding a reference.

If you wanted to support this generic datastructure, rather than the doubly linked list you have in your head, then you would need reference counting in C/C++ as well!

What this tells you, is that Rust isn't restrictive enough to enforce these memory safe contracts. Anyone with access to the individual nodes could break the contract and make the code unsafe.

oconnor663 · 2024-12-21T07:02:39 1734764559

More important than whether you use a little unsafe or a lot, is whether you can find a clean boundary above which everything can be safe. Something like a hash function or a block cipher can be piles and piles of assembly under the covers, but since the API is bytes-in-bytes-out, the safety concerns are minimal. On the other hand, memory-mapping a file is just one FFI function call, but the uncontrollable mutability of the whole thing tends to poison everything above it with unsafety.

pizlonator · 2024-12-21T04:32:33 1734755553

Good luck inferring how to use that from some C programmer’s deranged custom linked list.

C programmers don’t do linked lists by using libraries, they hand roll them, and often they are more complex than “just” a linked list. Lots of complex stuff out there.

Rusky · 2024-12-21T21:25:28 1734816328

Rust's ownership model has two aspects:

- A dynamic part specifies what is actually allowed, and totally supports doubly linked lists and other sorts of cyclic mutable data structures.

- A static part conservatively approximates the dynamic part, but is still flexible enough to express the interfaces and usage of these data structures even if it can't check their implementations.

This is the important difference over traditional static analysis of C. It enables `unsafe` library code to bridge the dynamic and static rules in a modular way, so that that extensions to the static rules can be composed safely, downstream of their implementation.

Rust's strategy was never for the built-in types like `&mut T`/`&T` to be a complete final answer to safety. It actually started with a lot more built-in tools to support these patterns, and slowly moved them out of the compiler and runtime and into library code, as it turned out their APIs could still be expressed safely with a smaller core.

Something like Fil-C would be totally complementary to this approach- not only could the dynamic checking give you stronger guarantees about the extensions to the static rules, but the static checks could give the compiler more leverage to elide the dynamic checks.

oneshtein · 2024-12-21T04:50:17 1734756617

Rus's ownership model doesn't forbid doubly linked lists. It forbids doubly owned lists, or, in other words, «use after free» bug.

jerf · 2024-12-21T04:05:06 1734753906

That's a classic example of an argument that looks really good from the 30,000 foot view, but when you're actually on the ground... no, basically none of that beautiful idea can actually be manifested into reality.

bloppe · 2024-12-21T02:48:56 1734749336

Is this sarcastic? There's a reason why the lifetime checker is so annoying to people with a lot of C experience. You absolutely cannot just use your familiar C coding styles in Rust.

orf · 2024-12-21T03:14:10 1734750850

You’ve misread the comment.

The ownership model is close enough, but the way that model is expressed by the developer is completely arbitrary (and thus completely nuts).

pornel · 2024-12-21T01:13:23 1734743603

Because it's not merely mutable, it's exclusive. You get a static guarantee that, for as long as you can use this reference, this is the only reference in the entire program that can mutate this memory.

This is automatically thread-safe, without any locks. It's guaranteed that there can't be any side effects that could affect this memory, no matter what code you call. You don't need any defensive coding copying the memory just in case. It optimizes well, because it's certain that it won't overlap with any other region.

C++ doesn't have that kind of strong no-alias guarantee. Even memory behind const pointers can be mutated by something else at distance. The closest equivalent is C's restrict pointers, but they're more coarse-grained, and aren't checked by the compiler.

pornel · 2024-12-19T12:04:59 1734609899

It should be possible (it'd need to also save memory map), but for some reason Rust's standard library wants to resolve human-readable paths at runtime.

Additionally, Rust has absurdly overly precise debug info.

Even set to minimum detail, it's still huge, and still keeps all of the layers of those "zero-cost" abstractions that were removed from the executable, so every `for` loop and every arithmetic operation has layers upon layers of debug junk.

External debug info is also more fragile. It's chronically broken on macOS (Rust doesn't test it with Apple's tools). On Linux, it often needs to use GNU debuginfo and be placed in system-wide directories to work reliably.

exDM69 · 2024-12-19T12:32:46 1734611566

> (it'd need to also save memory map

Typically the memory map is only required when capturing the backtrace and when outputting the stack frames' addresses relative the the binary file sections are given/stored/printed (with the load time address subtracted). E.g. SysRq+l on Linux. This occurs at runtime so saving the memory map is not necessary in addition to the relative addresses.

Not sure if this is viable on all the platforms that Rust supports.

> but for some reason Rust's standard library wants to resolve human-readable paths at runtime.

Ah, I see that Rust's `std::backtrace::Backtrace` is missing any API to extract address information and it does not print the address infos either. Even with the `backtrace_frames` feature you only get a list of frames but no useful info can be extracted.

Hopefully this gets improved soon.

> External debug info is also more fragile.

I use external debug info all the time because uploading binaries with debug symbols to the (embedded) devices I run the code on is prohibitively expensive. It needs some extra steps in debugging but in general it seems to work reliably at least on the platforms I work with. The debugger client runs on my local computer with the debug symbols on disk and the code runs under a remote debugger on the device.

I'm sure there are flaky platforms that are not as reliable.

pornel · 2024-12-19T11:50:57 1734609057

Safety is not an extra feature a'la carte. These concepts are all inter-connected:

Safety requires unions to be safe, so unions have to become tagged enums. To have tagged enums usable, you have to have pattern matching, otherwise you'd get something awkward like C++ std::variant.

Borrow checking works only on borrowed values (as the name suggests), so you will need something else for long-lived/non-lexical storage. To avoid GC or automatic refcounting, you'll want moves with exclusive ownership.

Exclusive ownership lets you find all the places where a value is used for the last time, and you will want to prevent double-free and uninitialized memory, which is a job for RAII and destructors.

Static analysis of manual malloc/realloc and casting to/from void* is difficult, slow, and in many cases provably impossible, so you'll want to have safely implemented standard collections, and for these you'll want generics.

Not all bounds checks can be eliminated, so you'll want to have iterators to implement typical patterns without redundant bounds checks. Iterators need closures to be ergonomic.

…and so on.

Every time you plug a safety hole, it needs a language feature to control it, and then it needs another language feature to make this control fast and ergonomic.

If you start with "C but safe", and keep pulling that thread, nearly all of Rust will come out.

AlotOfReading · 2024-12-19T18:04:49 1734631489

I've experienced that frustration myself several times and tried to do "rust but simpler". I just recently failed attempt #4 at not reinventing rust but worse. Attempt #2 ended with Erlang but worse, which was a pleasant surprise.

mpweiher · 2024-12-23T00:28:01 1734913681

Hm...Smalltalk is also safe.

So there are obviously different ways of addressing this.

pornel · 2024-12-23T12:07:49 1734955669

Bash is also memory-safe.

Almost every programming language is memory-safe, but Smalltalk and most high-level dynamically typed languages require a fat runtime, and have a much higher overhead than C or Rust.

For languages that can be credible C and C++ competitors, the design space is much smaller.

mpweiher · 2024-12-25T09:02:23 1735117343

The id-subset[1] of Objective-C is also memory safe, just like the Smalltalk it copied.

And Objective-C is a credible C competitor, partly because it is a true superset of C, partly because you can get it to any performance level you want (frequently faster than equivalent practical C code [2]) and it was even used in the OS kernel in NeXTStep.

Now obviously it's not done, as it is a true superset and thus inherits all of C's non-safety, and if you were to just use the id-subset that is memory safe, you wouldn't be fully competitive.

However, it does show a fairly clear path forward: restrict the C part of Objective-C so that it remains safe, let all the tricky parts that would otherwise cause non-safety be handled by the id-subset.

That is the approach I am taking with the procedural part of Objective-S[3]: let the procedural part be like Smalltalk, with type-declarations allowing you to optimize that away to something like Pascal or Oberon. Use reference counting to keep references safe, but potentially leaky in the face of cycles. Optional lifetime annotations such as weak can be used to eliminate those leaks and to eliminate reference counting operations. Just like optional type declarations can reduce boxing and dynamic dispatch.

[1] https://blog.metaobject.com/2014/05/the-spidy-subset-or-avoi...

[2] https://www.amazon.com/gp/product/0321842847/ref=as_li_tl?ie...

[3] https://objective.st/

pornel · 2024-12-15T15:42:51 1734277371

It the image is watermarked, you can't remove it that way. Watermarks easily survive uniform noise higher than humans can tolerate. Watermark data is typically stored redundantly in multiple locations and channels, so uniform noise mostly averages itself out, and cropping won't do much. Watermarks often add signal in a different color model than RGB and in a heavily transformed domain of the image, so you're not adding noise along the "axis" of watermark's signal.

For similarity search, it also won't do much. Algorithms for this look for dozens of "landmarks", and then search for images that share a high percentage of them. The landmarks traditionally were high-contrast geometric features like corners, which wouldn't be affected by noise. Nowadays, landmarks can be whatever a neural network learns to pick when trained against typical deformations like compression and noise.

notfed · 2024-12-15T19:50:06 1734292206

Which common photo apps are implanting watermarks of this kind?

pornel · 2024-12-11T01:02:48 1733878968

sqlite also has some runtime checks that are expected to be always true or always false, and solves that by using a custom macro that removes these branches during branch coverage test.

The same would be possible in Rust. Everything that could panic has a non-panicking alternative, and you could conditionally insert `unreachable_unchecked()` to error handling branches to remove them. That wouldn't be most idiomatic, but SQLite's solution is also a custom one.