It's pretty easy to audit a couple lines of unsafe in a library. It is not easy ...

comex · on May 5, 2017

The main reason it requires unsafe is for memory layout optimizations. Logically, a hash table is an array of buckets, each of which is either a (hash, key, value) triplet or empty. The naive representation would be Vec<Option<(usize, K, V)>>, but using Option would significantly increase the size of each bucket: it's only a one-byte tag to differentiate between Some and None, but alignment constraints would round the overhead up to at least the alignment of usize, typically 8. Instead, HashMap stores just (usize, K, V), and considers a bucket empty whenever 0 is stored in the hash slot; if 0 ever comes up as an actual hash value, it's changed to a different value before being stored.

To the extent I've described it, that might be possible to accomplish in Rust without unsafe code, using a safe abstraction around NonZero (an unsafe type which the compiler assumes will never be zero, allowing it to automatically use a zero value as a marker if the type is found in an enum, such as Option). However, HashMap goes a step further: it actually stores all the hashes contiguously in one array, and the (K, V) pairs in another. The nth index in the hashes array and in the (K, V) array together represent a single bucket, but storing them separately makes HashMap's typical access patterns somewhat nicer on the CPU's cache. However, this means that slots in the (K, V) array can contain either valid data (of arbitrary types, which might contain pointers etc.) or garbage, depending on a value stored somewhere completely different. There's no good 'automatic' way to make that safe.

I think the use of unsafe code here is fine - it's not that hard to audit, and if you don't audit you're in trouble anyway (hash table DoS is also a security risk) - but in the long term, it would be very interesting if Rust could integrate an optional theorem prover, with the ability to prove arbitrarily complex code safe, given sufficient annotations...

agwa · on May 5, 2017

> It's pretty easy to audit a couple lines of unsafe in a library.

You have to audit more than just the lines within the unsafe block. For example, I discovered a buffer overflow in a Rust library this week that was caused by an integer overflow outside the unsafe block[1]. The unsafe code itself was written correctly.

[1] https://github.com/RustSec/advisory-db/blob/master/crates/ba...

lilyball · on May 5, 2017

Sounds to me like the unsafe block had the wrong scope. If safe Rust called a function with a buffer, and that buffer was too small, and that function internally used unsafe code to write to the buffer, then that function is leaking the unsafety past the `unsafe {}` scope, which is incorrect. So that function should be marked as `unsafe`, and the calling code (which calculates the buffer) is then responsible for ensuring that it's invoking the unsafe function with an appropriately-sized buffer.

Manishearth · on May 5, 2017

Not necessarily. This isn't a question with a single answer within the community; there are reasons to tightly scope unsafe, and there are reasons to scope it to the full extent of unsafe-affecting invariants.

Scope your unsafe however you want, but check the invariants when auditing, and make sure they don't escape the module.

Manishearth · on May 5, 2017

I'm aware. When I say "audit a line of unsafe" that includes all the invariants that it relies on, which could be outside the block.

This is still not that hard.