We didn't test the Java Binding for quick a long time, I am not sure if it can still compile the Java Binding well, please fill an issue on Github if you find it didn't work anymore, thanks!
I remembered that we tried it on Flink in early versions are the result is pretty good.
I remember reading about this a few years ago here. If I remember correctly back then the main selling point was that it used succinct data structure and it was only the compression algo that was not open source - everything else was.
But now when I look at the new repo and the online doc there is no mention of succinct data struct anywhere.
Also, the benchmarks back then claimed 10x or more faster than RocksDB. Now the performance claim is much more modest.
Does that mean TerarkDB no longer uses succinct data struct? Or are you just open sourcing a lower-end version of the software without the secret sauce?
Can you talk about what makes TerarkDB faster than RocksDB?
Thanks for your attention, glad someone here still remember our history, TerarkDB is now FULLY open source with `succinct data structures`.
Here's the reasons:
1. Our `all-in-one` docs are still under writing, we will cover that part later.
2. For the performance part, we are now showing real-world cases, not a well-designed benchmark.(We selected the best result to show our work few years ago, don't want to do it anymore)
3. About why TerarkDB is faster than RocksDB will be explained in our `all-in-one docs` in one week, and most of the reasons are not magic, just engineering efforts
I think the performance images would be a lot more clear if they were on the same scale, as it stands it was unclear what was happening with, e.g. the disk write image until i zoomed into the axes.
Your all-in-one docs[1] refuse to render at all in firefox? Seems like a strange restriction, and disappointing that it doesn't even let you read the document in non-webkit browsers. Feels like the IE days all over again.
"An error occurred. This browser is not supported, click here to learn more."
Works perfectly for me in Firefox (macOS). Might be a misleadingly worded "something went wrong, we don't know what" message. Probably want to check your console, could be a network problem.
Yeah there's some unreadable minified error stack traces in the console. Weirdly it seems to load fine and I can scroll the content for a second or two while its still loading in. Then it puts up an error dialog and I can't access the content any more. Once thats happened if I reload the page the error dialog comes up immediately and the page doesn't bother loading behind it.
I'm on firefox on MacOS too, and it happens with my ad blocker on or off. Chrome works fine. The support document linked in the error explicitly says only chrome and safari are supported on macos. I'm confused why my grandparent comment has been downvoted - this is a real bug report stopping me (and maybe others) from reading documentation that looks to have a lot of thought put into it. And given the content seemed to be loading fine before the error message came up, well, it feels forced.
1. We changed the source code too much that we are not able to merge it back to RocksDB easily (This project started at 2016 as an close-source project)
2. We have different road path with RocksDB (e.g. We will remove a lot of un-used code to make TerarkDB much more light-weight than current version in the future)
3. We have lots of third-party partners (e.g. Intel, on Opatane SSD/Memory and others with ZNS...) may participant in this project
so we want to handle all commits ourself to make sure everything is under control.
It's open source now, right? Outside of 2 and 3, could someone incorporate (some) of the improvements from TerarkDB into RocksDB? Or does it truly require some major rewrite to achieve the tail-latency benefits?
The comparison figures presented looked really impressive, thanks for sharing it.
First, it’s reeaallllyyyy expensive to invest enough in an open source project that you have a reasonable chance of steering it.
Second, even if you do the first, the whole thing gets screwed up again when you start trying to introduce vendor code into the mix. Generally, no one upstream gives a crap that you have super compelling business reasons to compromise on code quality (or even trivial things like how code is committed: tarballs vs good git hygiene), and vendors sometimes compromise a lot.
So it’s not surprising that sometimes groups choose to do the expedient thing to get something to market instead of doing things “the right way.” In a lot of respects, the original Android did this with Linux.
Imagine if there were multiple incompatible and competing linux kernels. What we have now is AMD/MS/Apple etc... contributing to the kernel through "vendor code". Imagine if AMD released a AMDLinux and Nvidia had NvidiaLinux.
This already happens, because most (?) people aren’t running vanilla kernels. Many (most?) distros compile their kernels with config options and patches that “make sense to them.” In the most egregious cases, you end up with things like bpf being intentionally broken by default.
It is perfectly in line with open source philosophy to be able to fork a project and have control over my fork. Especially given 2 where they have different goals from upstream.
Amazon did not fork mongodb, they won’t touch AGPL code, they reimplemented the server side protocol and a backend implementation on top of postgresql afaics.
It feels like this is healthy, organic and very much in line with the ethos of open source to see a project take this path and arrive back in open source. If the rocks team wanted to cherry pick some compatible advancements from this project they are now free to do so.
There are much more egregious and fundamentally different violations to open source namely those you mention in your comment.
Wasn't the driver for nvim specifically disagreements with the direction/priorities/steer of the project? Is progress in a different direction necessarily a bad thing, especially if that effort couldn't be directly applied to the original anyway?
Please someone feel free to correct me, but if I recall correctly a lot of the improvements in Vim 8 were a result of the popularity of functionality in NeoVim?
You're correct -- which is why I've used it as an example of forking for project-control reasons to be perfectly in line with an open-source philosophy.
I didn't know this. How do I contribute to Oracle's Unbreakable Linux or Redhat's RHEL? I know I can fork them, but not sure how I can push my commits into their code and didn't realize that was required!
Leadership or steering committee is a key factor for open source projects operated by companies. A closed pull request with comment "We won't accept the pull request because ..." should not be on the trajectory of an infrastructure project, which is to be/being widely used by any giant vendor.
So RocksDB came from LevelDB and here we go again.
We are working on our `all-in-one docs` which will explain everything.
I want to address that we are not meant to "get rid of" RocksDB (which lots of KV engine claimed). What we want to do is provide another solution for storage engine users with different road path (focusing on new hardware and heavy-write workloads).
For simple use cases, there will be no difference no matter what engine you use.
And for most cases, upgrade your hardware (e.g. SATA SSD to NVMe SSD) or tuning your RocksDB parameters would save you lots time, just make sure you understand what you are doing.
There's no cue for every workloads, try TerarkDB if RocksDB happens not fit your scenario.
The reasons we did a better job(from our own perspective) than RocksDB are:
1. We moved lots of code out side db_mutex (db mutex is convenient but costs too much)
2. We introduced a new KV separation implementation that we believe is better than RocksDB’s implementation (we didn't hear any production user are using RocksDB's KV separation yet)
3. We introduced a lazy compaction strategy that can delay compaction task while online services are dealing with short-time heavy writing.
4. Other optimizations like time histogram based TTL, pipelined WAL sync.
I see "#include <terark/fsa/cspptrie.inl>" in the "memtable/terark_zip_entry_index.cc" but I can't find "cspptrie.inl" in the repo.
Is the code auto-generated or not open source now?
Sorry for the unclear response. 1) We use TerarkDB under a distributed SQL database and TerarkDB helps to store its pages (16KB page), its one of the most widely used SQL database inside Bytedance. 2) We use TerarkDB under a Redis compatible distributed cache system to store raw key value pairs.
Almost all kinds of workloads are here since TerarkDB runs under too many database clusters (each cluster only serves a single application)
I remembered that we tried it on Flink in early versions are the result is pretty good.