Implementing a B+ tree was the final assignment in a grad-level Algorithms & Dat...

jaytaylor · on Aug 19, 2021

As another data point, I implemented a B+ tree with rebalancing deletion functionality back in CS3090 at UVU circa 2008 from Dr. Curtis Wellborn.

It was an undergraduate course, but the professor emphasized that everywhere else it would be a graduate-level of course.

My B+ tree implementation served as the foundation for my own (shitty) DBMS ;D

As a funny sidenote, the professor claimed that even Stanford DB courses had a buggy / broken algorithm for balanced deletes. While preparing the course, he'd spent a chunk of the summer doing researching it, and had developed his alogrithm in Prolog. He shared the concepts and our implementations worked fine. However, it was ultimately mostly an academic exercise, as balanced deletions come with significant tradeoffs which make it not as useful as I initially thought.

Even full-fledged database systems don't typically implement B+ tree indexes with deletion+rebalancing support (the usual technique is marking nodes deleted and cleaning them up when the index is rebuilt).

psykotic · on Aug 19, 2021

> Even full-fledged database systems don't typically implement B+ tree indexes with deletion+rebalancing support (the usual technique is marking nodes deleted and cleaning them up when the index is rebuilt).

Yes, and it took a surprisingly long time before anyone bothered to do the theoretical analysis. Some good papers to check out if anyone is interested are Deletion Without Rebalancing in Multiway Search Trees (http://sidsen.azurewebsites.net/papers/relaxed-b-tree-tods.p...) and Deletion Without Rebalancing in Binary Search Trees (http://sidsen.azurewebsites.net//papers/ravl-trees-journal.p...). Jeff Erickson also has some good notes on using local and global rebuilds for search trees: https://jeffe.cs.illinois.edu/teaching/algorithms/notes/10-s...

It's intuitively clear that worst-case performance [1] can degrade to a function of the number of insertions since a rebuild (as opposed to the number of currently live items) when you use relaxed deletion. If you can afford to globally rebuild the tree you get to play the usual amortization tricks, but global rebuilds can be problematic in a large-scale live database; one solution is to rebuild from a snapshot in the background (on another thread or machine) and then replay late arriving operations before switching over to the rebuilt tree. But if someone is okay with the amortization spikes from rehashing a hash table, they should be okay with non-incremental global rebuilds for search trees; a hash table where deletions leave behind tombstones and the table is rehashed when the ratio of tombstones gets too high is basically the same idea.

Tostino · on Aug 18, 2021

There are actually some pretty good database courses at CMU for example, that go into what is required to build databases from the ground up.

So don't think all DB classes are the same, there are some vast differences in that type of class compared to a class teaching you how to use SQL.

abhishekjha · on Aug 20, 2021

>There are actually some pretty good database courses at CMU for example, that go into what is required to build databases from the ground up.

Can you name or link a few?

Tostino · on Aug 20, 2021

Here is the YouTube channel for them, tons of content on there. Lectures of previous years classes, etc: https://www.youtube.com/channel/UCHnBsf2rH-K7pn09rb3qvkA

Also have some awesome guest speakers, and the conversations they have are often quite insightful.

ddlutz · on Aug 18, 2021

I've seen quite a few database courses have you implement them, as they are the backbone for many database systems.