Rkyv is faster than {bincode, capnp, cbor, flatbuffers, postcard, prost, }

kentonv · on March 12, 2021

How does schema evolution (versioning) work? Can you add fields to a struct and still be able to read data written by older versions?

Oh here we go, tucked away under "Tradeoffs" in the docs:

> rkyv is designed primarily for loading bulk game data as efficiently as possible. While rkyv is a great format for final data, it lacks a full schema system and isn’t well equipped for data migration.

This is a pretty important thing to call out upfront, especially when comparing against other serialization frameworks where this is considered a table stakes feature. Yes, there's a fine argument to be made that some use cases may prefer speed and simplicity over the ability to ever modify the schema, but users really need to understand that this is the trade off they are making.

taintegral · on March 12, 2021

> Oh here we go, tucked away under "Tradeoffs" in the docs:

This is a pretty unfair way to put it. It's on the main page of the docs, which every developer will see. If you're not familiar with the rust community, you may not be aware of how standard this practice is.

> This is a pretty important thing to call out upfront, especially when comparing against other serialization frameworks where this is considered a table stakes feature

It's a benchmark to measure performance, not an analysis of each of the serialization frameworks. A lot of the benchmarked frameworks don't support schema evolution, it's a really small minority that do.

> users really need to understand that this is the trade off they are making

If a developer reads one article about serialization framework performance and proceeds to use the fastest framework while ignoring every other property of it then that's on them. It's always good to have that sort of information available, but I can't hold everyone's hands while they choose their serialization frameworks.

All that being said, work is already being started on schema evolution capabilities. The plan is for it to be separate from the serialization library for better composability and broader applicability.

kentonv · on March 14, 2021

Sorry, my comment came off more negative than I intended.

However, the article leads by saying "rkyv is similar to Cap'n Proto and FlatBuffers", and then lists some "different design choices that make it stand out", but does not mention here the lack of schema evolution. Since schema evolution is usually considered a critical feature in any of these systems, it really ought to have been mentioned here. It also explains why rkyv is faster -- supporting schema evolution has overhead.

I think this could have been a really cool article if it were phrased as: "Here are the performance benefits that can be achieved in use cases where you can trade off schema evolution." That's legitimately an interesting area to explore!

(Disclosure: I'm the author of Cap'n Proto.)

taintegral · on March 15, 2021

I think that's a really good point, and I definitely agree that rkyv, Cap'n Proto, and FlatBuffers all have different goals and design decisions. rkyv was primarily made with the intention of handling bulk structured data for game development; that's just because that's my background. Having schema evolution and validation wasn't important to me, but those are features that are important to a lot of other people. In the interest of getting more people interested in trying it out I've slowly added those features and I think that rkyv has been made better for it. It's the only of the three ZCD libraries that doesn't support schema evolution, and I think that's going to cause people to choose not to use it. So I'll probably figure out how to get schema evolution in there eventually.

As an aside, I updated the benchmarks yesterday and addressed some other responses that mentioned that the three libraries all have different behavior around validation. That's another place where the three libraries all differ in their approaches and it does materially affect the benchmarks. I found out that Cap'n Proto does validate-on-read, which is is a really cool idea that gives it a big edge while accessing complex structured data that rkyv takes a long time to validate. It's got me thinking about how to get the same kind of validation-on-demand functionality!

(Disclosure: I'm the author of rkyv - thanks for Cap'n Proto, it's been a big influence)

kentonv · on March 15, 2021

To be fair, I think FlatBuffers was also designed with game data in mind. But I agree it seems like a great use case to skip schema evolution -- since you can always rebuild all the data files when you change something. You could probably get away with skipping validation, too.

vkaku · on March 12, 2021

I definitely think more work on serialization is welcome. For me, memcpy is fast - but handing off via shared memory is even faster, and then dereference; Zoned pointers are always the secret sauce to fastest ser-de.

I think the real issue that one really needs to solve is to add support for various languages and bother about wire compatibility between little, big and other messy endians. Add those things, the benchmarks start failing. The point of serialization is interop, not just rust-op. :)

Don't get me wrong, I was working with the colfer project on Github, and I definitely love all good work on serialization. Especially compression during serialization. Do start creating a spec and invite more language implementations. Thanks!

taintegral · on March 12, 2021

I actually agree with you; these are serialization issues that are very important to solve. I wrote a little about endianness and how it could be approached in the book FAQ [1]. rkyv's open type system enables a lot of really powerful extensions to the core library.

> The point of serialization is interop, not just rust-op. :)

I'm not personally interested in adding support for other languages, but I have been careful not to do anything that would make it especially difficult for someone else to do so. There's no reason why it being language-specific is so bad either, especially if it can take advantage of unique features to make it more expressive and performant.

[1] https://davidkoloski.me/rkyv/faq.html

solidsnack9000 · on March 12, 2021

That's cool. If you're just trying to save games or marshal data to and from worker nodes in a BEAM / Hadoop like system, single-language serialization can work. A Python / Node / Ruby client would ultimately bind to the Rust library for that.

eeZah7Ux · on March 12, 2021

> Rust-only: it doesn't sacrifice simplicity and performance for cross-language compatibility

A serialization format for one language only? That's pretty rich.

Macha · on March 12, 2021

Python's Pickle, java.io.Serializable, Ruby's marshal, plenty of examples of this, though pretty much all of them come with the caveat that they are _not_ for untrusted data sources, or even really data sources other than the application that produced them in the first place.

taintegral · on March 12, 2021

You can also turn on the `strict` feature for C compatibility, but without any way to get the type definitions into a C header it's kind of pointless.

I'm not interested in adding C compatibility because I don't want to write C, but I also won't go out of my way to impede others who might want to.