Hey olvyo, thanks for your nice comments! We should definitely connect!
> By the way, in your dissertation you don't mention RapidJson, nor json.net, nor simdjson. Is there a reason why you didn't compare them?
My research was focused on space-efficiency more than parsing performance, so I didn't talk about that problem. However the long-term goal is that by using JSON BinPack's binary encoding, you should have a better time parsing it, as you don't have to deal with the JSON grammar.
The json storage we currently use is kind of "secondary" to the database, and so there currently we don't verify it. The relational database we use does have a schema, of course.
Not verifying the json schema is something that is sorely missing, since the files get corrupted from time to time and this is discovered only during deserialization. Unfortunately this is something I don't have enough time to add, as we're swamped with other work (our app is huge and we are a very small time). We just tell our internal users to re-serialize the database to fix these corruptions, which is unfortunate and costly, but the best we can do at the moment.
I wrote more about that aspect of my internal app here.
Like I wrote, when in a future version we'll drop the database completely, and work only off those json files, I'll also introduce schema validation. That won't happen for some time though.
Thanks for your answer, I understand now why you didn't measure the performance of simdjson etc. into account.
I have thought in the past about using BinPack for my json documents, but: I want them to remain as human readable as possible, since the reason to move to json from a DB, was to make database diffs into readable diffs, and BinPack isn't readable.
I also want users to be able to use existing tools on the json files (e.g. the jq tool), but existing tools don't understand BinPack (yet?).
I think BinPack would shine in an RPC/IPC setting. Just recently there was this big discussion here about systemd replacing D-BUS with a json-based IPC and a huge discussion around the waste of using plain json to do that.
> By the way, in your dissertation you don't mention RapidJson, nor json.net, nor simdjson. Is there a reason why you didn't compare them?
My research was focused on space-efficiency more than parsing performance, so I didn't talk about that problem. However the long-term goal is that by using JSON BinPack's binary encoding, you should have a better time parsing it, as you don't have to deal with the JSON grammar.
Do you do JSON Schema validation too?