Hacker News new | past | comments | ask | show | jobs | submit login
The Rust Libs Blitz (rust-lang.org)
433 points by aturon on May 5, 2017 | hide | past | favorite | 122 comments



This is something Haskell could really benefit from. Largely just through writing documentation for common libraries.

A post was recently on the frontpage of HN about using Haskell in production [1] that divided the common documentation experience between "hard" and "soft" docs. Far too often with Haskell you only get the 'hard' docs where you get descriptions of functionality and functions but it lacks why (and cohesively how) you would want to use the various functionality.

This makes a strong assumption you are already deeply familiar with the usecase and implementation concept.

This may apply to Rust as well. Rust will likely attract experienced developers, much like Haskell, where in most cases a decent level of code quality would be anticipated. But one of the hardest things to get right as an OSS developer is documentation. You're often so busy with the burden of maintenance that the explanatory side gets sidelined. Especially as a library and the underlying language evolves. So I hope this is a priority focus during their reviews.

[1] https://news.ycombinator.com/item?id=14266462


Agreed. I feel like the state of documentation of Haskell libraries in general could be much better.

There was actually a thread on /r/haskell a few months ago [0] in response to an HN comment. In that thread, a few libraries were mentioned as examples of having amazing docs.

For instance, Gabriel Gonzalez writes a Tutorial module for his libraries, which introduces the basics of the library and shows you the big picture. The tutorial walks you through different pieces of the library, how to use them, and how they come together.

It's hands down one of the most effective ways of documenting I've ever seen and I'd love to see other communities' take on it.

To point out a specific example, you should check out Turtle's tutorial [1].

[0]: https://www.reddit.com/r/haskell/comments/5khts3/on_haskell_...

[1]: https://hackage.haskell.org/package/turtle-1.3.3/docs/Turtle...


I've seen this problem in the Elixir library ecosystem as well. Regardless of the package, people do see the value in contributing to the commons with PRs for "hard docs" (because they themselves will probably need to read them again later, and don't want to get tripped up again by reading false docs.) But nobody wants to contribute "soft docs" to anything but the language core.

It feels like the real issue is that anyone who considers themselves a writer, as well as a programmer, tends to have a blog; and that people who have blogs are incentivized to write "soft docs" as tutorial blog posts for their blogs, instead of as encyclopedia-style or cookbook-style additions to the given library's docs.

I wonder if a programming language could adopt a Code of Conduct discouraging tutorial blog-posts in favor of soft-doc PRs...


I don't think blog posts lack merit.

There's only so much you can put in docs, and docs can only really encompass one way of learning things.

Blog posts help teach things in other ways, and having that diversity of approaches is great.

This is why, for example, I love Julia Evans' (http://jvns.ca/) work. Most of the blog posts don't really uncover undocumented stuff. They just explore the topic in a novel way, which some people may find more accessible.

Rust is open to importing blog posts into the docs; this has happened a few times (once to one of my own posts!). So I'm happy that there are blog posts out there.

That said, if you're writing a blog post, it's good to see if there are bits than can be put into the docs too.


A CoC that against economic incentives is unlikely to work out.

Instead, they should work to change the incentives -- likely by paying more attention to them as an organization, talking about people who contribute them on the project's page and blog the same as they do for major technical work, and possibly even bringing on maintainers focused on (soft) docs.

Trading attention and prestige for content might work, shaming the people who produce content for not giving you the credit for free is unlikely to.

I don't think projects give soft docs the credit they deserve, so why are people so surprised that people develop that content elsewhere, where they're recognized for it?


Soft docs are harder and take longer, and as said elsewhere are not appreciated enough. If I'm writing an Elixir library for (what I think) my own use, I'll stop at typespecs, docstring with an example, and a short description.

I'll only start doing the tutorial doc and/or blog post if others have interest in the library. There's just not enough time :/


The thing is, it's not that there aren't a lot of blog posts. There are! There's usually at least one blog post about any library I care to use. But it's only a blog post—it's something someone wrote at some point—and so it's not corrected for misapprehensions the author (who is usually not the author of the library) had about the idiomatic way to use the library, nor is it kept up to date with changes in the library like soft docs would be.

I don't think there's a dearth of people willing to write blog posts. There's just a dearth of people willing to do the equivalent job to what Wikipedia editors do for general-purpose articles: collating those "primary sources" into a single, coherent, up-to-date overview.


Yes, this is very true of most rust docs that I've seen as well. Lots of libraries have the auto generated docs describing the functions and types usually at a per-module level but fewer have anything higher-level


Most of the time I find that the lib's github's README.md contains the introduction and overview that I want to understand how the library works. Unfortunately it's not in the docs themselves, but I hear that the Rust team is working on improving that.

See for instance the "image" crate docs: https://docs.rs/image/0.13.0/image/

A simple API doc, I don't know where to start. But then if I go to the repo I find a nice intro with some example code: https://github.com/PistonDevelopers/image/blob/master/README...


Yes, and that's definitely something we hope to address with the Blitz, both by beefing up top-level library docs, and through the Cookbook.


I really love seeing articles like this come out about Rust.

It's language design the way it should be: incorporating the cutting edge ideas from academia while still striving to cater to beginners; drawing on the strengths of other languages communities to build out good library and solutions to package management; designing everything in the open, and constantly seeking feedback from their users.

It's a great blend of theoretical CS, HCI, computer systems, and application development, and it's always fun to hear about what they're up to.


> I really love seeing articles like this come out about Rust.

Yeah, me as well. It's a perfect marriage of academic know how and pragmatism. Very exciting year for Rust.


I'm going to use this opportunity to second the suggestion that people consider taking this year's Rust community survey: https://blog.rust-lang.org/2017/05/03/survey.html . I know it's mentioned in the post, but I figure the number of people reading the comments is much larger than the number who actually click through the link. :P And even if you don't or have never used Rust, we still value your feedback!


heads up: On page 15 it feels like the questions are not written from a user-centric point of view any more, while the previous ones were excellent. "The Rust project" has an unclear or at least less significant meaning to me as a user than probably to someone working on increasing Rust adoption, which I gather the first 2 questions on p. 15 are about? ("Rust project" could also mean "that thing you're working on using Rust", to me as user/dev. Usually you folks call it "Rust language" or something with "community" etc.) Great survey otherwise, just about to finish it!


Thanks for the reminder, done!


Thanks, I just took it.


May I suggest the bytes crate [0]? It's one of those small libraries providing a key building block (mutable and immutable byte buffers), and is a dependency of tokio-io and any other crate which implements tokio-io's Encoder/Decoder traits.

[0] https://crates.io/crates/bytes


Yes, thank you for the suggestion. I agree that's a good candidate and will note it on thread.


I do wish cargo packages were namespaced a la Github. Squatting on usernames is one thing, but package and project names are often the only way you hear about something. cargo react-svg might be a terrible project or a good quality one maintained by facebook, but you wouldn't know from the name. Because of the name, it'll be at least somewhat downloaded if that's a common need. It makes grouping by org difficult too.


I wrote cargo-esr[1], an alternative tool for searching crates, with the purpose of narrowing down good choices.

Feedback welcome.

[1] https://github.com/rust-alt/cargo-esr


Yeah I agree. It would be unfortunate if for example I were to register a crate called 'json' or 'http' or whatever but make it shit.


The article also links the state of rust survey. I visited the survey with intent to answer it but the first question "Do you use Rust?" only has the following three alternatives for an answer:

- "Yes"

- "No, I stopped using Rust"

- "No, I've never used Rust"

When making a survey the alternatives for the answers are very important. I feel that none of these alternatives apply to me. That's bad. Unfortunate because I would have liked to participate in the survey.

I've done a little bit of beginner programming in Rust in order to try and learn the language. However I haven't yet used it to implement anything actually useful so I wouldn't say "yes, I use Rust". It's been a while since last I did something in Rust but I wouldn't say "no, I stopped using Rust" because to me that implies that I have decided that Rust is not for me, which is not something that I feel, I want to use it, I just keep pushing it down on the list of things to do because other more immediate desires and problems keep popping up.


So, just take YES and move on? Surveys are about trends not about absolutes.


I would have thought he should say NO, or maybe that he quit given his situation. So maybe he could roll a dice* for the answer - but that wouldn't really provide information. One more slot wouldn't hurt.

*"roll a die" for those who prefer more atavistic conventions - see research on Latin plurals rendered in English over time and with regard to frequency of use.


I'm in the same boat but answered no. I'll go with the grandparent coment and suggest that an option indicating that you've tinkered and dabbled with it and no more. (Sorry, a more appropriate wording doesn't come to mind at the moment)


> i use it in production > i use it for hobby projects > i have played with it > i have never used it

Clojure survey had something along these lines


I'm also in the category of having tried it a couple of times, but never hit the velocity to be able to write anything myself.


There’s a countervailing mindset which, in its harshest terms, says “the standard library is where code goes to die”

This can be addressed in a language with sufficient annotation and good parser tools. In some future language, there should be a unification between the version control, the de-facto codesharing site, language/library versions, and syntax-driven tools to automatically rewrite code.

It should be possible to "publish" a language and its libraries such that any breaking changes will automatically be updated when you switch library versions. (This should also be applicable to Entity-Relation diagrams and Object-Relational mappings -- those can be treated as a versioned library.)


The problem is that the number of possible combinations of versions grows rapidly as you add versions. This makes testing harder, as it spreads the community thin - everyone's using a different combo than everyone else.

To avoid that, you need to standardize on a blessed set of versions to be tested together, much like assembling a release of a Linux distro.

People will still swap in alternate versions of libraries occasionally, but keeping things mostly standard and a few cherry-picks is still better than everyone choosing differently.


The problem is that the number of possible combinations of versions grows rapidly as you add versions.

The point is not to let people hang out in whatever obscure snowflake version-set they want to. The point is to make migration going forward as painless as possible. However, that expectation is not so much about the tooling as it is about the developer/language community.

To avoid that, you need to standardize on a blessed set of versions to be tested together, much like assembling a release of a Linux distro.

Yes, there should be this! However, you will still have some stragglers and outliers -- this is what the historical reality shows us. The point of such tooling is precisely to minimize the pool of stragglers, not to maximize them!


To avoid that, you need to standardize on a blessed set of versions to be tested together, much like assembling a release of a Linux distro.

That's pretty much the Stackage/LTS Haskell model. Seem to be working pretty well there.


> In some future language, there should be a unification between the version control, the de-facto codesharing site, language/library versions, and syntax-driven tools to automatically rewrite code.

You actually get a lot of that in Perl 6, which identified modules not only by name, but by version and authority (so you can choose to load author Bar's version of Widget instead of author Foo's version.


Swift is probably the closest thing to this today, where new versions of the language ship with a migrator tool.


> The product of this process will be a mature core of libraries together with a set of API guidelines that Rust authors can follow to gain insight into the design of Rust and level up crates of their own interest.

Is there a plan to make these things statically checkable by rustc/rustfmt/rust-tidy or some sort?


Yes! The statically checkable ones will be checked by rustc or Clippy. Though some of the guidelines are at a higher level than what those would be able to check. For example: rustc wouldn't be able to tell whether a particular one of your traits would be valuable to make accessible as a trait object.


(Also, if this isn't clear, clippy is more than happy to accept PRs for criteria like these.)


Beginner question: what would make a crate non-statically-checkable?


A guideline like "your tests should test all major functionalities of an API" can't be statically checked for example.

It's not about the crate being non checkable, it's about the check being something that needs a human to look at.


I realize that statically checking for tests is reducible to the halting problem, but would it be possible to have some sort of code-coverage checker that would make sure all externally-facing functions have non-zero coverage? That could be used to construct a checklist for a library developer.


Code coverage in itself isn't a very useful metric. Classic example is big test with no asserts will get you pretty good coverage.

There's a rather computationally intensive technique called mutation testing where you introduce bugs into code (e.g. change + to -, flip conditionals, etc.) and check that tests fail.


Code coverage has been useful to me in practice, so I think blanket statements like "code coverage isn't useful" is very misleading. For example, code coverage has helped me identify specific branches in my code that are untested, which in turn help guide me to write additional unit tests.

You don't need to completely dismiss other testing techniques. Mutation testing can be useful at the same time that code coverage is useful.


Totally agreed. Branch coverage has been far more useful to me than line coverage has.


Yes, there are code coverage tools and stuff :)

This could be used, sure.


One thing I don't see here: It's difficult to integrate libs into e.g. parallel when they don't derive all the various things (Copy). Will a goal be to aim for deriving standards for libs looked at?


It's part of the checklist:

https://github.com/brson/rust-api-guidelines#interoperabilit...

Maybe the checklist should get a more prominent link or notice in the blog post. It took me a while to find it.


Ah cool, thanks for the link.


bstrie is coming by my place tomorrow, we're going to take a stab at overhauling `rand`. We must've been discussing this for two years.


Oh great. Keep us updated.


I realise this is being done with the best of intentions and will probably be a big net positive in practise, but something about the way it has been presented here rubs me up the wrong way.

In short, the blog post makes very little mention of the role of the primary author(s) of the libraries in question, beyond "Every two weeks, hold a library team meeting [...] with the author in attendance." While I imagine the reality will be quite different, this sounds an awful lot like "oi, you, code review in my office now!"

One of the attractions of working on open source is that it offers more scope for autonomy and individual recognition than the typical commercial software job. It's slightly alarming that this doesn't seem to be recognised here.

As I say, I'm sure the reality will be fine (and I remain very keen to give Rust a serious try when time permits), but the rather collectivist presentation here is a tiny bit off-putting.


Most of the crates being focused on are either maintained by the libs team already, or are authored by a member of the libs team. It turns out that the people on the libs team have written a lot of widely used and pretty solid libraries ;)

Nobody is forcing stuff on random crate authors; if they didn't want to participate, then that's 100% okay.


Thanks, that puts a slightly different perspective on things!

I still feel there's a slightly-unfortunate ambiguity in the linked announcement post, though.


Totally, I hear you. It can be tough from both sides, that is, to people super heavily involved in Rust, they know these names, so the fact is obvious. So it doesn't get written down, and then when people from _outside_ of living and breathing Rust every day read it, they don't have the context.

Words are hard.


I suspected this was probably the case. I urge you to make it very, very clear.

As a prospective crate author - and every programmer is a prospective crate author - i interpreted this as saying that if i publish a crate which people find useful, i might be getting an unexpected visit from the Rust Police.


I would be thrilled to have my code reviewed! Quite often I see requests for "please critique my code" in subreddits of several programming languages.

Must say that I agree with you that the authors do not seem to take a very central role in the process as described. But I see that more as an easy fix in the announcement text.


I had the exact same reaction—it is very off-putting.

It's not just this, either, a lot of the top-down communication in the Rust ecosystem has a feel like this. I'm not sure what to make of it, except that I think their attitude is probably too authoritative.


Does the "Rust standard of quality" for these crucial crates include "no unsafe code"?

"Vec" currently needs unsafe code, because Rust doesn't have the expressive power to talk about a partially initialized array. Everything else with unsafe code is an optimization. Often a premature one. Maps should be built on "Vec", for example.


I would actually hope for the opposite. If your application needs unsafe code, it would be better for that code to be isolated into a "blessed" (i.e. widely vetted by those with the relevant expertise) library with a safe interface, rather than rolling your own.


Why do you say premature? Maps, for example, being such a crucial and widely-used data type, should be optimized as much as possible. There's nothing premature about using `unsafe` to build a highly-efficient Map.


Let's see the benchmarks justifying the use of "unsafe" for maps. Maybe there's a better way to do it without much of a performance penalty. It's a good way to find out what optimizations the compiler is missing. It may even turn out that unsafe code written early no longer is a performance win, since the Rust compiler is getting better at optimizing out redundant subscript checks.

When you start looking through Rust libraries, "unsafe" turns up way too often.


(The hashmap that is used in the winning rust benchmarksgame entry is 100% safe, fwiw)

> When you start looking through Rust libraries, "unsafe" turns up way too often.

You keep making this claim without substantiation. Yes, there is some level of unnecessary unsafe, but certainly not "way too often". I recall going through all the crates in my .cargo and finding very little unnecessary unsafe, and showing you the audit: https://news.ycombinator.com/item?id=13280347

Please stop throwing around this claim without substantiation.


Do you think there is some room for education here? Would it be useful for Crates.io to show some "coverage" metric? eg. "unsafe lines = 5%" or "unsafe instructions = 0.01%"

I think that would be a pretty easy way to put this convo to bed yea?

Have there been talks about this before? Is it worth me opening an issue/RFC on Cargo?


I don't think it's worth making changes like that to satisfy random misconceptions unless they are widespread.

There are good reasons for having an unsafe-code flag on a crate. This is not one of those.

This has been discussed in the past and the main concern IIRC is that this may make people afraid to use crates that contain unsafe code even if it is well audited. There's a finer balance you want to strike here.


> winning rust benchmarksgame entry

C is winning that benchmark, around 2x faster than Rust.

http://benchmarksgame.alioth.debian.org/u64q/regexredux.html


Wrong benchmark. I'm talking about k-nucleotide, which tests hashmaps.

http://benchmarksgame.alioth.debian.org/u64q/knucleotide.htm...

(And I was only really talking of the comparison of rust v rust, not rust v c, i.e. that ordermap is faster than the built in hashmap despite not using unsafe)


Ah, sorry, you were writing about hashmaps and I was writing regexes in my IDE in background while reading this thread from time to time, probably that's why I've checked regex instead of hashmap benchmark. I even wrote this benchmark in language not listed on benchmarks game lately to compare the speed. Brain can work in interesting ways!


We've seen this unsubstantiated denial before on YC, at "Why I’m dropping Rust"[1] and "Rust sucks if I fail to write X".[2] I once started going through the Rust library packages and listed uses of "unsafe". Try doing that.

The previous discussions made a few things clear:

- The big design-level problems with data structure safety are 1) partially initialized arrays, and 2) backlinks. The first is needed for growing arrays in Vec, and the second is needed for doubly-linked lists and some kinds of trees. It's very hard to handle either of those in safe Rust. A very small number of packages need unsafe code for those functions, and those should be tightly controlled.

- Foreign code remains a problem, and is inherently unsafe when calling unsafe languages. The most downloaded Rust crate is "libc". What could possibly go wrong there? Was it really a good idea to import unsafe "strcpy" into Rust?

    pub fn strcpy(dst: *mut c_char, src: *const c_char) -> *mut c_char;
- There's a lot of use of "unsafe" code that's not strictly necessary. You find unsafe code like "from_utf8_unchecked", which then turns up in the JSON decoder at (https://github.com/rust-lang-deprecated/rustc-serialize/blob...). There are no comments on the safety of that. Is there some way to create bad JSON, get bad UTF-8 into a string, and cause trouble further upstream? I don't know, but somebody "optimized" there, and created a potential problem.

I could give many more examples.

Most of these problems are fixable. They're not inherent in Rust. Fixing them is important to Rust's credibility. If Rust is going to replace C++, which it should, the holes have to be plugged. It only takes one hole to create a security vulnerability.

Denial is not a river in Egypt.

[1] https://news.ycombinator.com/item?id=12474445 [2] https://news.ycombinator.com/item?id=13670366


> which then turns up in the JSON decoder at (https://github.com/rust-lang-deprecated/rustc-serialize/blob...). There are no comments on the safety of that.

Maybe there's a reason it's in a repository labeled "deprecated", namely that it's deprecated. The replacement serialization framework, serde, has exactly one appearance of "unsafe", in a function that is only compiled when you explicitly enable the "unstable" feature flag and that in fact appears to be a reasonably safe use of unsafe.


The thing is, you cannot decide which crate someone will depend on, regardless how it is labeled.


But you can read the dependency list or even the code and decide whether to use it or not?


Not in the presence of binary dependencies.

Also the amount of CVEs in FOSS projects show that even the process of code review for patches isn't enough.


You mean, in projects written in languages that are unsafe from beginning to end rather than in small blocks?


I agree, just stating that plain code review isn't enough.

Those patches also come in small blocks.


> I once started going through the Rust library packages and listed uses of "unsafe". Try doing that.

That's ... exactly what I did? Talk about denial. I went through the libraries in my .cargo folder (which filters for libraries that actually get used, not just random libraries out there). I linked you to that audit in the comment.

Yes, libstd contains a lot more unsafe, but that's kind of the raison d'etre of libstd -- to contain OS-abstractions and very common internally-unsafe abstractions. It's better to have one unsafe implementation of Vec (in a library with a lot of eyeballs on it) than to have ten that the ecosystem relies on.

> which then turns up in the JSON decoder at

rustc-serialize is deprecated. It was deprecated before 1.0, and went into maintenance mode. It was still kinda-maintained because of the difficulty of using serde on stable, but now it's basically going to be full maintenance mode.

Also, that line is trivially safe to execute on untrusted input; `char` is utf8. Yes, a comment would be nice, but like I said, maintenance mode.

It is possible to do what that function does in safe Rust today, but it needs an API that probably didn't exist ~two years ago when that library was actually relevant. Fixing.

> The most downloaded Rust crate is "libc". What could possibly go wrong there? Was it really a good idea to import unsafe "strcpy" into Rust?

libc is just bindings (to everything in libc). You still need to use unsafe to call those functions (everything in `extern "C"` is unsafe to call even if not marked explicitly as such). This is not a valid example.

> A very small number of packages need unsafe code for those functions, and those should be tightly controlled.

which is pretty true already? The "partially initialized arrays" problem is generally handled by just using Vec. Yes, you need unsafe to write vec, but then you can just build things out of it.

Backlinks turn up rarely -- if perf isn't involved folks just use Weak safely, but otherwise people use petgraph or something.

> I could give many more examples.

Go ahead then. You literally never have, and none of these examples are valid "unnecessary unsafe" for reasons I gave above. I did substantiate with an audit. I'm genuinely interested in finding places where we have too much unsafe code, because I'd like to avoid depending on those crates and/or fix them.

> Most of these problems are fixable. They're not inherent in Rust. Fixing them is important to Rust's credibility.

Sure. And folks are always looking to improve this. But this doesn't mean that there's some widespread problem of unsafe being used too much in Rust. I'm not denying that this shouldn't be improved, I'm denying your allegations that `"unsafe" turns up way too often`.


rustc-serialize is deprecated

It's the fourth most downloaded crate.[1] Somebody may have "deprecated" it, but the users aren't paying attention. It's not listed as "deprecated" on its own Cargo page.[2] There's a weak note about deprecation on its Github page, but users don't look there.[3] It has 2,950,353 downloads, probably because other crates are pulling it in.

Will it be in the new set of "approved" packages? Or will all the crates that use it be fixed?

Again, denial. Not seeing many links from the "everything is OK, don't need to look at this" crowd.

[1] https://crates.io/ [2] https://crates.io/crates/rustc-serialize [3] https://github.com/rust-lang-deprecated/rustc-serialize


> It's the fourth most downloaded crate.[1] Somebody may have "deprecated" it, but the users aren't paying attention.

The most downloaded stat appears to be for the most downloaded of all time. Considering the difficulty of using serde on stable until recently, it's not surprising that it has been downloaded so much and would continue to be as projects move away from it.

> It's not listed as "deprecated" on its own Cargo page.

This is true, however as you said the GitHub page and the docs page both mention that. Unless one already knows how to use the crate, they are probably going to check the docs and see the deprecation. Although having a deprecation notice on crates.io does seem prudent.

> Again, denial. Not seeing many links from the "everything is OK, don't need to look at this" crowd.

You kind of skipped over the part where the parent poster explained why it wasn't even a big deal in the first place. It really feels like you're grasping at straws to make a point and I don't see why. No one seems to be denying that a lot of unsafe code is potentially bad, but there is really no solid proof that there is an unreasonable amount of unsafe usage. In fact, the parent has mentioned and linked to a list of such usages. If you're going to make such an extraordinary claim I think it behooves you to provide more than just a handful of examples.


Whether or not rustc-serialize is deprecated is kind of a red herring. It was already pointed out that the usage of there is trivially provably safe. The deprecation of rustc-serialize is really only relevant here because it explains why the crate is in maintenance mode and therefore why that particular unsafe block hasn't been replaced with a newer safe API.


Number of downloads is pretty terrible metric because nearly all downloads from a package repository are from automated/CI builds, not people. Depending on what's consuming a package, something which runs tests more frequently could easily inflate the number of downloads for any of it's dependencies.


> Foreign code remains a problem, and is inherently unsafe when calling unsafe languages. The most downloaded Rust crate is "libc". What could possibly go wrong there? Was it really a good idea to import unsafe "strcpy" into Rust?

And what would you prefer a tool like Corrode use? Or programmers manually doing an initial "no refactoring" conversion pass from C or C++ to Rust? Should they all use their own ad-hoc libc bindings when converting to Rust? That sounds harder to audit busywork with no upside. Or perhaps it should be impossible to incrementally convert a C codebase, instead being forced to do an up front monolithic rewrite? Although my experience has been that leads to more debugging, bugs, pain, and abandoned rewrites.

This isn't to say that improvements cannot or should not be made, but a reminder that perfect is the enemy of the good. Yes, you should really really really not use strcpy in new code if you can possibly avoid it - and you can. And you should refactor it away in existing code ASAP. But that argument probably extends to libc in Rust code in general. Being able to test-compile if your codebase builds yet when you remove libc sounds way better than trying to hunt down and remove all your ad-hoc defines.

And as a stepping stone, I'd suspect the good outweighs the bad here. Not because there will never be a security vulnerability in Rust code because of an unrefactored strcpy in Rust (there very well may be one if there hasn't already!) - but because it will encourage conversion, and thus subsequent cleanup and refactoring, that might not have otherwise happened.

> There are no comments on the safety of that. Is there some way to create bad JSON, get bad UTF-8 into a string, and cause trouble further upstream? I don't know, but somebody "optimized" there, and created a potential problem.

A quick look at blame shows this originally was applied to a buffer that was encode_utf8(...)ed. The refactoring (to remove a libunicode dependency) has made the safety less clear, yes. I could see the use for a method:

  let buf = v.to_str_with(&mut buf);
Ultimately, this is just shifting the "problem" from json.rs to str/mod.rs, but the latter was going to have to deal with unsafe code anyways, as we're getting into the details of "just how is str implemented anyways": https://doc.rust-lang.org/1.4.0/src/core/str/mod.rs.html#125 . Although perhaps there's something to be said for keeping that to the bare essentials as well...

> Most of these problems are fixable. They're not inherent in Rust. Fixing them is important to Rust's credibility. If Rust is going to replace C++, which it should, the holes have to be plugged. It only takes one hole to create a security vulnerability.

I encourage you to submit pull requests (at least for non-deprecated projects) if you haven't already.

I feel the need to restate the obvious, although I'm sure you're aware already: There will be one hole. In fact, there will be two holes. I dare say there will be no less than three security vulnerabilities, even - for mistakes are inherent to programming. Perfect safety is not an achievable goal - only better safety. Worse - some of the people using Rust are going to have a higher tolerance for unsafe than you. But at least they're no longer using C though, right?

(I'm afraid I'm still using C and C++.)


(I will mention that shifting unsafe to libstd is a good thing, this way you have more eyeballs on it and it's reusable, with less chance of people messing it up elsewhere. This is why a large fraction of what's in libstd is unsafe stuff so that people don't end up implementing their own.)


Unsafe rarely turns up in any of the libraries I've used. Where are you getting this from?


> Everything else with unsafe code is an optimization.

You need unsafe code to invoke a syscall. Doing I/O is not an "optimization".


I suspect removing unsafe is outside of the scope of Rust.

Rust tries to help you as much as possible but recognizes that it sometimes gets in the way and provides an escape hatch. The idea is to minimize and abstract the use of the escape hatch so its easily auditable.

I wish they hadn't named the escape hatch "unsafe". I think I heard the name came from PL theory but it makes it sound scarier than it is.

I remember hearing something about wanting to improve documentation around unsafe for how to best use it without running into problems.


IMO it's good that it's scarier than it actually is :)

http://doc.rust-lang.org/doc/stable/nomicon/ is the documentation for unsafe. I do plan on improving it heavily, but I'm very busy right now, and I'd like to wait for some of the unsafe semantics stuff to be pinned down so that I can go in full depth when I write this.


Thanks for your work

> I'd like to wait for some of the unsafe semantics stuff to be pinned down so that I can go in full depth when I write this.

Thats the work I was thinking of.


https://github.com/bluss/ordermap is 100% safe rust and pretty fast

I believe the stdlib one could be refactored, but I'm not sure. There are a lot of optimizations in that one.

IMO "no unsafe code" is a stretch. "no unnecessary unsafe code" is what it should be.


That "ordermap" is nice. Could that replace std::collections::hash_map as the main supported map crate?

It imports std::collections::hash_map, which has unsafe code, but only for RandomState, which does not. If RandomState were pulled out of hash_map and moved to a crate with no unsafe code, that dependency could be removed and it would be safe crates all the way down.

IMO "no unsafe code" is a stretch. "no unnecessary unsafe code" is what it should be.

The author of the code doesn't get to determine "unnecessary". That should require extensive and hard to get justification.


> Could that replace std::collections::hash_map as the main supported map crate?

No, it is more specialized for certain kinds of loads.

> It imports std::collections::hash_map, which has unsafe code, but only for RandomState, which does not. If RandomState were pulled out of hash_map and moved to a crate with no unsafe code, that dependency could be removed and it would be safe crates all the way down.

That's a very vacuous distinction.

This is all in the same crate (std) anyway, it's just a different module.

> The author of the code doesn't get to determine "unnecessary".

The auditors (proposed in this blog post) do.

I've audited unsafe in the past to ensure our dependencies are fine, there are some pretty reasonable ways to define "unnecessary" here.


Sure, it could replace the stdlib hash map. It would regress performance for some usage and improve it for others. (And this makes it an unlikely candidate; pure improvements are a lot easier to get in.)


I think the presence of unsafe code makes it more important that the lib goes through this process, rather than less. Provided the unsafe code is necessary, of course.

That unsafe code is filling a want or need, so otherwise people are going to use crates that haven't gone through the process, or even worse, roll it themselves.


> Does the "Rust standard of quality" for these crucial crates include "no unsafe code"?

> Everything else with unsafe code is an optimization. Often a premature one.

Rc, Arc and Box require unsafe code for the obvious reasons that are not premature optimisations. Any attempt to use a syscall (look at the nix crate which wraps libc with safe APIs) requires unsafe code.

Unsafe doesn't mean "this code is bad". It is true that unsafe code should be treated very carefully, but it's purpose is so that a sufficiently clever human can implement safe code that the insufficiently clever compiler cannot verify.

Really it should've been called trustme rather than unsafe. ;)


The roughly equivalent annotation in D is called "@trusted", but IMHO "unsafe" is better because it sounds scarier :-)

https://dlang.org/blog/2016/09/28/how-to-write-trusted-code-...


In Rust, 'unsafe code' is really 'partially compiler check exempt'. It does not mean the code is unsafe. You can manually verify the 'unsafe' parts.

It is very unfortunately named.


It's unsafe in roughly the same way that Rust developers call C and C++ unsafe, right? The advantage is that it's a special, relatively uncommon mode, and it's explicitly marked. But I don't think you can minimize the reality that it's unsafe without also minimizing one of the major selling points of the language.


Because only 10% of your code isn't verified by the compiler but instead needs extra review and tests doesn't eliminate the benefit that 90% of your code is verified by the compiler.


Unsafe blocks are typically as small as necessary, so the burden is small to perform extensive manual verification


Yes, but the few exemptions means that you can effectively do basically anything. If you needed unsafe {} for something that block definitely needs more scrutiny and if possible should be avoided.


Right. That's the road to buffer overflow exploits.

The most recent CERT advisory reporting a buffer overflow exploit was April 17th, 2017.[1] About one per week is reported, year after year. Others not reported are probably being exploited. Rust can stop that, but one "unsafe" declaration can break Rust's safety.

Hash maps need to be 100% safe code. They're complicated, and involve elaborate calculations that output subscripts.

[1] http://www.kb.cert.org/vuls/id/676632


It's pretty easy to audit a couple lines of unsafe in a library.

It is not easy to do the same with a C library where it's basically prone to overflow issues everywhere.

These are vastly different issues. Yes, we should totally be strict on unsafe code. No, it is not the end of the world when the stdlib hashmap uses unsafe code. Unsafe is designed exactly for this purpose, dealing with the innards of safe abstractions. It does it well, and the hashmap code is pretty ok here.

(Sure, if it has a design which could be done in safe Rust, it should, but I suspect with the way the robin hood stuff works it might not. this has been on my list of things to do when I get more time, primarily to just learn about the hashmap impl, but also to improve it if possible)


The main reason it requires unsafe is for memory layout optimizations. Logically, a hash table is an array of buckets, each of which is either a (hash, key, value) triplet or empty. The naive representation would be Vec<Option<(usize, K, V)>>, but using Option would significantly increase the size of each bucket: it's only a one-byte tag to differentiate between Some and None, but alignment constraints would round the overhead up to at least the alignment of usize, typically 8. Instead, HashMap stores just (usize, K, V), and considers a bucket empty whenever 0 is stored in the hash slot; if 0 ever comes up as an actual hash value, it's changed to a different value before being stored.

To the extent I've described it, that might be possible to accomplish in Rust without unsafe code, using a safe abstraction around NonZero (an unsafe type which the compiler assumes will never be zero, allowing it to automatically use a zero value as a marker if the type is found in an enum, such as Option). However, HashMap goes a step further: it actually stores all the hashes contiguously in one array, and the (K, V) pairs in another. The nth index in the hashes array and in the (K, V) array together represent a single bucket, but storing them separately makes HashMap's typical access patterns somewhat nicer on the CPU's cache. However, this means that slots in the (K, V) array can contain either valid data (of arbitrary types, which might contain pointers etc.) or garbage, depending on a value stored somewhere completely different. There's no good 'automatic' way to make that safe.

I think the use of unsafe code here is fine - it's not that hard to audit, and if you don't audit you're in trouble anyway (hash table DoS is also a security risk) - but in the long term, it would be very interesting if Rust could integrate an optional theorem prover, with the ability to prove arbitrarily complex code safe, given sufficient annotations...


> It's pretty easy to audit a couple lines of unsafe in a library.

You have to audit more than just the lines within the unsafe block. For example, I discovered a buffer overflow in a Rust library this week that was caused by an integer overflow outside the unsafe block[1]. The unsafe code itself was written correctly.

[1] https://github.com/RustSec/advisory-db/blob/master/crates/ba...


Sounds to me like the unsafe block had the wrong scope. If safe Rust called a function with a buffer, and that buffer was too small, and that function internally used unsafe code to write to the buffer, then that function is leaking the unsafety past the `unsafe {}` scope, which is incorrect. So that function should be marked as `unsafe`, and the calling code (which calculates the buffer) is then responsible for ensuring that it's invoking the unsafe function with an appropriately-sized buffer.


Not necessarily. This isn't a question with a single answer within the community; there are reasons to tightly scope unsafe, and there are reasons to scope it to the full extent of unsafe-affecting invariants.

Scope your unsafe however you want, but check the invariants when auditing, and make sure they don't escape the module.


I'm aware. When I say "audit a line of unsafe" that includes all the invariants that it relies on, which could be outside the block.

This is still not that hard.


> Often a premature one.

Can you share the ecosystem wide analysis that you did that led to this conclusion?


My biggest gripe with the crate situation is that some of them require nightly. E.g. everything coroutines AFAIK.


The coroutine libraries are the codegen ones, yes?

Coroutines are kinda more like a language feature that people have hacked libraries to do codegen for instead, so IMO it's pretty understandable that it needs nightly.

Many of these libraries are prototyping designs that will eventually be proposed as part of the language.


Since Rust 1.15, most libraries that require nightly seem to be this kind of prototyping, at least to my eye.


Rust sorta has a de-facto code style. It'd be interesting to add tooling to cargo to make it obvious how to comply with the evolved standard style for Rust.


There is definitely a changing Rust definition of good code hygiene in terms of how to write code in the most "Rustic" way, but in terms of code formatting, there is already rustfmt[0]

[0] https://github.com/rust-lang-nursery/rustfmt


Clippy has some checks for naming convention beyond what rustc and rustfmt does. Not nearly as much as https://github.com/brson/rust-api-guidelines, but the wrong_self_convention check helped me make sense of the as/to/into conventions.

https://github.com/Manishearth/rust-clippy


Some of the format rfcs basically went against the most dominant style (for example where clauses), so things are changing around.


Unfortunate name. I thought this was about libz aka zlib.


Ok, we'll cut the title's z supply.


That actually confused me as well :)


    andy@xps ~> python3
    >>> "Libz" == "libz"
    False


    >>> title_case("Libz") == title_case("libz")
    True


touché


Also works in python2 for those of us stuck in the 1950s.


It is an unfortunate name. I don't know why, given how inclusive the Rust community usually is and how eager they were to remove master/slave terminology, that they would choose to announce this effort under the banner of Nazi war tactics.


Please don't.

We detached this subthread from https://news.ycombinator.com/item?id=14275990 and marked it off-topic.


Yes, it is a well known fact that Germans have no word for 'lightning' anymore. Such a tragedy that 75 years later, we're still supposed to be fighting the Nazis on their terms.


I unironically agree with this. That's not a use of the generic German word for lightning, it's a specific reference.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: