Rust's high-level I/O definitely needs some optimization work. For example, the APIs which allocate a String for each line tend to be limited to about 25MB/s on my laptop. But if I use the low-level fill_buf and write APIs, I can get about 500MB/sec throughput: http://codereview.stackexchange.com/a/73770/61214
There are even lower-level APIs which would probably go faster, but above a half-GB per second, I/O is no longer a bottle-neck for my code.
So on OSX, the rust version (and a go version using buffered i/o) is about 4x faster than the c version (although my system can't find fputs_unlocked). I like watching performance, but for this kind of code with a language like rust this is a silly benchmark.
It seems like this is mostly benchmarking syscall overhead (and in the case of libc, mutex overhead).
I am honestly not terribly surprised. :-D. As the language stabilizes, it should see a lot more performance optimization. I'd be really interested in seeing how Rust 1.0 final performs in a test like this.
There are even lower-level APIs which would probably go faster, but above a half-GB per second, I/O is no longer a bottle-neck for my code.