Hacker News new | past | comments | ask | show | jobs | submit login

Does the "Rust standard of quality" for these crucial crates include "no unsafe code"?

"Vec" currently needs unsafe code, because Rust doesn't have the expressive power to talk about a partially initialized array. Everything else with unsafe code is an optimization. Often a premature one. Maps should be built on "Vec", for example.




I would actually hope for the opposite. If your application needs unsafe code, it would be better for that code to be isolated into a "blessed" (i.e. widely vetted by those with the relevant expertise) library with a safe interface, rather than rolling your own.


Why do you say premature? Maps, for example, being such a crucial and widely-used data type, should be optimized as much as possible. There's nothing premature about using `unsafe` to build a highly-efficient Map.


Let's see the benchmarks justifying the use of "unsafe" for maps. Maybe there's a better way to do it without much of a performance penalty. It's a good way to find out what optimizations the compiler is missing. It may even turn out that unsafe code written early no longer is a performance win, since the Rust compiler is getting better at optimizing out redundant subscript checks.

When you start looking through Rust libraries, "unsafe" turns up way too often.


(The hashmap that is used in the winning rust benchmarksgame entry is 100% safe, fwiw)

> When you start looking through Rust libraries, "unsafe" turns up way too often.

You keep making this claim without substantiation. Yes, there is some level of unnecessary unsafe, but certainly not "way too often". I recall going through all the crates in my .cargo and finding very little unnecessary unsafe, and showing you the audit: https://news.ycombinator.com/item?id=13280347

Please stop throwing around this claim without substantiation.


Do you think there is some room for education here? Would it be useful for Crates.io to show some "coverage" metric? eg. "unsafe lines = 5%" or "unsafe instructions = 0.01%"

I think that would be a pretty easy way to put this convo to bed yea?

Have there been talks about this before? Is it worth me opening an issue/RFC on Cargo?


I don't think it's worth making changes like that to satisfy random misconceptions unless they are widespread.

There are good reasons for having an unsafe-code flag on a crate. This is not one of those.

This has been discussed in the past and the main concern IIRC is that this may make people afraid to use crates that contain unsafe code even if it is well audited. There's a finer balance you want to strike here.


> winning rust benchmarksgame entry

C is winning that benchmark, around 2x faster than Rust.

http://benchmarksgame.alioth.debian.org/u64q/regexredux.html


Wrong benchmark. I'm talking about k-nucleotide, which tests hashmaps.

http://benchmarksgame.alioth.debian.org/u64q/knucleotide.htm...

(And I was only really talking of the comparison of rust v rust, not rust v c, i.e. that ordermap is faster than the built in hashmap despite not using unsafe)


Ah, sorry, you were writing about hashmaps and I was writing regexes in my IDE in background while reading this thread from time to time, probably that's why I've checked regex instead of hashmap benchmark. I even wrote this benchmark in language not listed on benchmarks game lately to compare the speed. Brain can work in interesting ways!


We've seen this unsubstantiated denial before on YC, at "Why I’m dropping Rust"[1] and "Rust sucks if I fail to write X".[2] I once started going through the Rust library packages and listed uses of "unsafe". Try doing that.

The previous discussions made a few things clear:

- The big design-level problems with data structure safety are 1) partially initialized arrays, and 2) backlinks. The first is needed for growing arrays in Vec, and the second is needed for doubly-linked lists and some kinds of trees. It's very hard to handle either of those in safe Rust. A very small number of packages need unsafe code for those functions, and those should be tightly controlled.

- Foreign code remains a problem, and is inherently unsafe when calling unsafe languages. The most downloaded Rust crate is "libc". What could possibly go wrong there? Was it really a good idea to import unsafe "strcpy" into Rust?

    pub fn strcpy(dst: *mut c_char, src: *const c_char) -> *mut c_char;
- There's a lot of use of "unsafe" code that's not strictly necessary. You find unsafe code like "from_utf8_unchecked", which then turns up in the JSON decoder at (https://github.com/rust-lang-deprecated/rustc-serialize/blob...). There are no comments on the safety of that. Is there some way to create bad JSON, get bad UTF-8 into a string, and cause trouble further upstream? I don't know, but somebody "optimized" there, and created a potential problem.

I could give many more examples.

Most of these problems are fixable. They're not inherent in Rust. Fixing them is important to Rust's credibility. If Rust is going to replace C++, which it should, the holes have to be plugged. It only takes one hole to create a security vulnerability.

Denial is not a river in Egypt.

[1] https://news.ycombinator.com/item?id=12474445 [2] https://news.ycombinator.com/item?id=13670366


> which then turns up in the JSON decoder at (https://github.com/rust-lang-deprecated/rustc-serialize/blob...). There are no comments on the safety of that.

Maybe there's a reason it's in a repository labeled "deprecated", namely that it's deprecated. The replacement serialization framework, serde, has exactly one appearance of "unsafe", in a function that is only compiled when you explicitly enable the "unstable" feature flag and that in fact appears to be a reasonably safe use of unsafe.


The thing is, you cannot decide which crate someone will depend on, regardless how it is labeled.


But you can read the dependency list or even the code and decide whether to use it or not?


Not in the presence of binary dependencies.

Also the amount of CVEs in FOSS projects show that even the process of code review for patches isn't enough.


You mean, in projects written in languages that are unsafe from beginning to end rather than in small blocks?


I agree, just stating that plain code review isn't enough.

Those patches also come in small blocks.


> I once started going through the Rust library packages and listed uses of "unsafe". Try doing that.

That's ... exactly what I did? Talk about denial. I went through the libraries in my .cargo folder (which filters for libraries that actually get used, not just random libraries out there). I linked you to that audit in the comment.

Yes, libstd contains a lot more unsafe, but that's kind of the raison d'etre of libstd -- to contain OS-abstractions and very common internally-unsafe abstractions. It's better to have one unsafe implementation of Vec (in a library with a lot of eyeballs on it) than to have ten that the ecosystem relies on.

> which then turns up in the JSON decoder at

rustc-serialize is deprecated. It was deprecated before 1.0, and went into maintenance mode. It was still kinda-maintained because of the difficulty of using serde on stable, but now it's basically going to be full maintenance mode.

Also, that line is trivially safe to execute on untrusted input; `char` is utf8. Yes, a comment would be nice, but like I said, maintenance mode.

It is possible to do what that function does in safe Rust today, but it needs an API that probably didn't exist ~two years ago when that library was actually relevant. Fixing.

> The most downloaded Rust crate is "libc". What could possibly go wrong there? Was it really a good idea to import unsafe "strcpy" into Rust?

libc is just bindings (to everything in libc). You still need to use unsafe to call those functions (everything in `extern "C"` is unsafe to call even if not marked explicitly as such). This is not a valid example.

> A very small number of packages need unsafe code for those functions, and those should be tightly controlled.

which is pretty true already? The "partially initialized arrays" problem is generally handled by just using Vec. Yes, you need unsafe to write vec, but then you can just build things out of it.

Backlinks turn up rarely -- if perf isn't involved folks just use Weak safely, but otherwise people use petgraph or something.

> I could give many more examples.

Go ahead then. You literally never have, and none of these examples are valid "unnecessary unsafe" for reasons I gave above. I did substantiate with an audit. I'm genuinely interested in finding places where we have too much unsafe code, because I'd like to avoid depending on those crates and/or fix them.

> Most of these problems are fixable. They're not inherent in Rust. Fixing them is important to Rust's credibility.

Sure. And folks are always looking to improve this. But this doesn't mean that there's some widespread problem of unsafe being used too much in Rust. I'm not denying that this shouldn't be improved, I'm denying your allegations that `"unsafe" turns up way too often`.


rustc-serialize is deprecated

It's the fourth most downloaded crate.[1] Somebody may have "deprecated" it, but the users aren't paying attention. It's not listed as "deprecated" on its own Cargo page.[2] There's a weak note about deprecation on its Github page, but users don't look there.[3] It has 2,950,353 downloads, probably because other crates are pulling it in.

Will it be in the new set of "approved" packages? Or will all the crates that use it be fixed?

Again, denial. Not seeing many links from the "everything is OK, don't need to look at this" crowd.

[1] https://crates.io/ [2] https://crates.io/crates/rustc-serialize [3] https://github.com/rust-lang-deprecated/rustc-serialize


> It's the fourth most downloaded crate.[1] Somebody may have "deprecated" it, but the users aren't paying attention.

The most downloaded stat appears to be for the most downloaded of all time. Considering the difficulty of using serde on stable until recently, it's not surprising that it has been downloaded so much and would continue to be as projects move away from it.

> It's not listed as "deprecated" on its own Cargo page.

This is true, however as you said the GitHub page and the docs page both mention that. Unless one already knows how to use the crate, they are probably going to check the docs and see the deprecation. Although having a deprecation notice on crates.io does seem prudent.

> Again, denial. Not seeing many links from the "everything is OK, don't need to look at this" crowd.

You kind of skipped over the part where the parent poster explained why it wasn't even a big deal in the first place. It really feels like you're grasping at straws to make a point and I don't see why. No one seems to be denying that a lot of unsafe code is potentially bad, but there is really no solid proof that there is an unreasonable amount of unsafe usage. In fact, the parent has mentioned and linked to a list of such usages. If you're going to make such an extraordinary claim I think it behooves you to provide more than just a handful of examples.


Whether or not rustc-serialize is deprecated is kind of a red herring. It was already pointed out that the usage of there is trivially provably safe. The deprecation of rustc-serialize is really only relevant here because it explains why the crate is in maintenance mode and therefore why that particular unsafe block hasn't been replaced with a newer safe API.


Number of downloads is pretty terrible metric because nearly all downloads from a package repository are from automated/CI builds, not people. Depending on what's consuming a package, something which runs tests more frequently could easily inflate the number of downloads for any of it's dependencies.


> Foreign code remains a problem, and is inherently unsafe when calling unsafe languages. The most downloaded Rust crate is "libc". What could possibly go wrong there? Was it really a good idea to import unsafe "strcpy" into Rust?

And what would you prefer a tool like Corrode use? Or programmers manually doing an initial "no refactoring" conversion pass from C or C++ to Rust? Should they all use their own ad-hoc libc bindings when converting to Rust? That sounds harder to audit busywork with no upside. Or perhaps it should be impossible to incrementally convert a C codebase, instead being forced to do an up front monolithic rewrite? Although my experience has been that leads to more debugging, bugs, pain, and abandoned rewrites.

This isn't to say that improvements cannot or should not be made, but a reminder that perfect is the enemy of the good. Yes, you should really really really not use strcpy in new code if you can possibly avoid it - and you can. And you should refactor it away in existing code ASAP. But that argument probably extends to libc in Rust code in general. Being able to test-compile if your codebase builds yet when you remove libc sounds way better than trying to hunt down and remove all your ad-hoc defines.

And as a stepping stone, I'd suspect the good outweighs the bad here. Not because there will never be a security vulnerability in Rust code because of an unrefactored strcpy in Rust (there very well may be one if there hasn't already!) - but because it will encourage conversion, and thus subsequent cleanup and refactoring, that might not have otherwise happened.

> There are no comments on the safety of that. Is there some way to create bad JSON, get bad UTF-8 into a string, and cause trouble further upstream? I don't know, but somebody "optimized" there, and created a potential problem.

A quick look at blame shows this originally was applied to a buffer that was encode_utf8(...)ed. The refactoring (to remove a libunicode dependency) has made the safety less clear, yes. I could see the use for a method:

  let buf = v.to_str_with(&mut buf);
Ultimately, this is just shifting the "problem" from json.rs to str/mod.rs, but the latter was going to have to deal with unsafe code anyways, as we're getting into the details of "just how is str implemented anyways": https://doc.rust-lang.org/1.4.0/src/core/str/mod.rs.html#125 . Although perhaps there's something to be said for keeping that to the bare essentials as well...

> Most of these problems are fixable. They're not inherent in Rust. Fixing them is important to Rust's credibility. If Rust is going to replace C++, which it should, the holes have to be plugged. It only takes one hole to create a security vulnerability.

I encourage you to submit pull requests (at least for non-deprecated projects) if you haven't already.

I feel the need to restate the obvious, although I'm sure you're aware already: There will be one hole. In fact, there will be two holes. I dare say there will be no less than three security vulnerabilities, even - for mistakes are inherent to programming. Perfect safety is not an achievable goal - only better safety. Worse - some of the people using Rust are going to have a higher tolerance for unsafe than you. But at least they're no longer using C though, right?

(I'm afraid I'm still using C and C++.)


(I will mention that shifting unsafe to libstd is a good thing, this way you have more eyeballs on it and it's reusable, with less chance of people messing it up elsewhere. This is why a large fraction of what's in libstd is unsafe stuff so that people don't end up implementing their own.)


Unsafe rarely turns up in any of the libraries I've used. Where are you getting this from?


> Everything else with unsafe code is an optimization.

You need unsafe code to invoke a syscall. Doing I/O is not an "optimization".


I suspect removing unsafe is outside of the scope of Rust.

Rust tries to help you as much as possible but recognizes that it sometimes gets in the way and provides an escape hatch. The idea is to minimize and abstract the use of the escape hatch so its easily auditable.

I wish they hadn't named the escape hatch "unsafe". I think I heard the name came from PL theory but it makes it sound scarier than it is.

I remember hearing something about wanting to improve documentation around unsafe for how to best use it without running into problems.


IMO it's good that it's scarier than it actually is :)

http://doc.rust-lang.org/doc/stable/nomicon/ is the documentation for unsafe. I do plan on improving it heavily, but I'm very busy right now, and I'd like to wait for some of the unsafe semantics stuff to be pinned down so that I can go in full depth when I write this.


Thanks for your work

> I'd like to wait for some of the unsafe semantics stuff to be pinned down so that I can go in full depth when I write this.

Thats the work I was thinking of.


https://github.com/bluss/ordermap is 100% safe rust and pretty fast

I believe the stdlib one could be refactored, but I'm not sure. There are a lot of optimizations in that one.

IMO "no unsafe code" is a stretch. "no unnecessary unsafe code" is what it should be.


That "ordermap" is nice. Could that replace std::collections::hash_map as the main supported map crate?

It imports std::collections::hash_map, which has unsafe code, but only for RandomState, which does not. If RandomState were pulled out of hash_map and moved to a crate with no unsafe code, that dependency could be removed and it would be safe crates all the way down.

IMO "no unsafe code" is a stretch. "no unnecessary unsafe code" is what it should be.

The author of the code doesn't get to determine "unnecessary". That should require extensive and hard to get justification.


> Could that replace std::collections::hash_map as the main supported map crate?

No, it is more specialized for certain kinds of loads.

> It imports std::collections::hash_map, which has unsafe code, but only for RandomState, which does not. If RandomState were pulled out of hash_map and moved to a crate with no unsafe code, that dependency could be removed and it would be safe crates all the way down.

That's a very vacuous distinction.

This is all in the same crate (std) anyway, it's just a different module.

> The author of the code doesn't get to determine "unnecessary".

The auditors (proposed in this blog post) do.

I've audited unsafe in the past to ensure our dependencies are fine, there are some pretty reasonable ways to define "unnecessary" here.


Sure, it could replace the stdlib hash map. It would regress performance for some usage and improve it for others. (And this makes it an unlikely candidate; pure improvements are a lot easier to get in.)


I think the presence of unsafe code makes it more important that the lib goes through this process, rather than less. Provided the unsafe code is necessary, of course.

That unsafe code is filling a want or need, so otherwise people are going to use crates that haven't gone through the process, or even worse, roll it themselves.


> Does the "Rust standard of quality" for these crucial crates include "no unsafe code"?

> Everything else with unsafe code is an optimization. Often a premature one.

Rc, Arc and Box require unsafe code for the obvious reasons that are not premature optimisations. Any attempt to use a syscall (look at the nix crate which wraps libc with safe APIs) requires unsafe code.

Unsafe doesn't mean "this code is bad". It is true that unsafe code should be treated very carefully, but it's purpose is so that a sufficiently clever human can implement safe code that the insufficiently clever compiler cannot verify.

Really it should've been called trustme rather than unsafe. ;)


The roughly equivalent annotation in D is called "@trusted", but IMHO "unsafe" is better because it sounds scarier :-)

https://dlang.org/blog/2016/09/28/how-to-write-trusted-code-...


In Rust, 'unsafe code' is really 'partially compiler check exempt'. It does not mean the code is unsafe. You can manually verify the 'unsafe' parts.

It is very unfortunately named.


It's unsafe in roughly the same way that Rust developers call C and C++ unsafe, right? The advantage is that it's a special, relatively uncommon mode, and it's explicitly marked. But I don't think you can minimize the reality that it's unsafe without also minimizing one of the major selling points of the language.


Because only 10% of your code isn't verified by the compiler but instead needs extra review and tests doesn't eliminate the benefit that 90% of your code is verified by the compiler.


Unsafe blocks are typically as small as necessary, so the burden is small to perform extensive manual verification


Yes, but the few exemptions means that you can effectively do basically anything. If you needed unsafe {} for something that block definitely needs more scrutiny and if possible should be avoided.


Right. That's the road to buffer overflow exploits.

The most recent CERT advisory reporting a buffer overflow exploit was April 17th, 2017.[1] About one per week is reported, year after year. Others not reported are probably being exploited. Rust can stop that, but one "unsafe" declaration can break Rust's safety.

Hash maps need to be 100% safe code. They're complicated, and involve elaborate calculations that output subscripts.

[1] http://www.kb.cert.org/vuls/id/676632


It's pretty easy to audit a couple lines of unsafe in a library.

It is not easy to do the same with a C library where it's basically prone to overflow issues everywhere.

These are vastly different issues. Yes, we should totally be strict on unsafe code. No, it is not the end of the world when the stdlib hashmap uses unsafe code. Unsafe is designed exactly for this purpose, dealing with the innards of safe abstractions. It does it well, and the hashmap code is pretty ok here.

(Sure, if it has a design which could be done in safe Rust, it should, but I suspect with the way the robin hood stuff works it might not. this has been on my list of things to do when I get more time, primarily to just learn about the hashmap impl, but also to improve it if possible)


The main reason it requires unsafe is for memory layout optimizations. Logically, a hash table is an array of buckets, each of which is either a (hash, key, value) triplet or empty. The naive representation would be Vec<Option<(usize, K, V)>>, but using Option would significantly increase the size of each bucket: it's only a one-byte tag to differentiate between Some and None, but alignment constraints would round the overhead up to at least the alignment of usize, typically 8. Instead, HashMap stores just (usize, K, V), and considers a bucket empty whenever 0 is stored in the hash slot; if 0 ever comes up as an actual hash value, it's changed to a different value before being stored.

To the extent I've described it, that might be possible to accomplish in Rust without unsafe code, using a safe abstraction around NonZero (an unsafe type which the compiler assumes will never be zero, allowing it to automatically use a zero value as a marker if the type is found in an enum, such as Option). However, HashMap goes a step further: it actually stores all the hashes contiguously in one array, and the (K, V) pairs in another. The nth index in the hashes array and in the (K, V) array together represent a single bucket, but storing them separately makes HashMap's typical access patterns somewhat nicer on the CPU's cache. However, this means that slots in the (K, V) array can contain either valid data (of arbitrary types, which might contain pointers etc.) or garbage, depending on a value stored somewhere completely different. There's no good 'automatic' way to make that safe.

I think the use of unsafe code here is fine - it's not that hard to audit, and if you don't audit you're in trouble anyway (hash table DoS is also a security risk) - but in the long term, it would be very interesting if Rust could integrate an optional theorem prover, with the ability to prove arbitrarily complex code safe, given sufficient annotations...


> It's pretty easy to audit a couple lines of unsafe in a library.

You have to audit more than just the lines within the unsafe block. For example, I discovered a buffer overflow in a Rust library this week that was caused by an integer overflow outside the unsafe block[1]. The unsafe code itself was written correctly.

[1] https://github.com/RustSec/advisory-db/blob/master/crates/ba...


Sounds to me like the unsafe block had the wrong scope. If safe Rust called a function with a buffer, and that buffer was too small, and that function internally used unsafe code to write to the buffer, then that function is leaking the unsafety past the `unsafe {}` scope, which is incorrect. So that function should be marked as `unsafe`, and the calling code (which calculates the buffer) is then responsible for ensuring that it's invoking the unsafe function with an appropriately-sized buffer.


Not necessarily. This isn't a question with a single answer within the community; there are reasons to tightly scope unsafe, and there are reasons to scope it to the full extent of unsafe-affecting invariants.

Scope your unsafe however you want, but check the invariants when auditing, and make sure they don't escape the module.


I'm aware. When I say "audit a line of unsafe" that includes all the invariants that it relies on, which could be outside the block.

This is still not that hard.


> Often a premature one.

Can you share the ecosystem wide analysis that you did that led to this conclusion?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: