I'm working on a versioned, temporal DBMS[1] called SirixDB in my spare time, which is the most exciting thing :-)
It's based on a university project on which I was working basically since day one in 2006.
I know it's crazy to work on such a large project initially alone. Lately, however, I'm getting the first contributions, and maybe I should start collaborating with the university or with the company of my former supervisor (who began the project for his Ph.D.).
I'm now more than convinced that the ideas are worth to work on, especially in the advent of modern hardware as byte-addressable NVM :-)
Currently, I'm working on the storage engine itself, to reduce storage space consumption further and to make the system stable. I'm experimenting with larger data sets to import (JSON and XML currently up to 5GB) with and without auto-commits, enabling/disabling different features, for instance, storing a rolling merkle hash for each node, storing the number of descendants, a path summary and so on.
Some of the features:
- the storage engine is written from scratch
- completely isolated read-only transactions and one read/write transaction concurrently with a single lock to guard the writer. Readers will never be blocked by the single read/write transaction and execute without any latches/locks.
- variable-sized pages
- lightweight buffer management with a "kind of" pointer swizzling
- dropping the need for a write-ahead log due to atomic switching of an UberPage
- rolling merkle hash tree of all nodes built during updates optionally
- ID-based diff-algorithm to determine differences between revisions taking the (secure) hashes optionally into account
- non-blocking REST-API, which also takes the hashes into account to throw an error if a subtree has been modified in the meantime concurrently during updates
- versioning through a huge persistent and durable, variable-sized page tree using copy-on-write
- storing delta page-fragments using a patented sliding snapshot algorithm
- using a special trie, which is especially good for storing records sith numerical dense, monotonically increasing 64 Bit integer IDs. We make heavy use of bit shifting to calculate the path to fetch a record
- time or modification counter-based auto-commit
- versioned, user-defined secondary index structures
- a versioned path summary
- indexing every revision, such that a timestamp is only stored once in a RevisionRootPage. The resources stored in SirixDB are based on a huge, persistent (functional) and durable tree
- sophisticated time travel queries
Besides the storage engine challenges, the project has so many possibilities for further research and work:
- How to shard databases
- Query compiler rewrite rules and cost-based optimization
- A brand new front-end
- Other secondary index-structures besides AVL trees stored in data nodes
- Storing graphs and other data types
- How to best make use of modern hardware as byte-addressable NVM
It's based on a university project on which I was working basically since day one in 2006.
I know it's crazy to work on such a large project initially alone. Lately, however, I'm getting the first contributions, and maybe I should start collaborating with the university or with the company of my former supervisor (who began the project for his Ph.D.).
I'm now more than convinced that the ideas are worth to work on, especially in the advent of modern hardware as byte-addressable NVM :-)
Currently, I'm working on the storage engine itself, to reduce storage space consumption further and to make the system stable. I'm experimenting with larger data sets to import (JSON and XML currently up to 5GB) with and without auto-commits, enabling/disabling different features, for instance, storing a rolling merkle hash for each node, storing the number of descendants, a path summary and so on.
Some of the features:
Besides the storage engine challenges, the project has so many possibilities for further research and work: [1] https://sirix.io or https://github.com/sirixdb/sirix