Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Implementing a B+ tree was the final assignment in a grad-level Algorithms & Data Structures call that I took. I thought it was a reasonable assignment in that context. It felt like a good capstone to the class, building on what we had learned and finally crossing the line from toy examples to something that approached real-world usefulness.

It seems very out-of-place for a databases course. Yes, B-Trees are important for a database, but I don't see how implementing one teaches you about how a database uses one or why it's important.



As another data point, I implemented a B+ tree with rebalancing deletion functionality back in CS3090 at UVU circa 2008 from Dr. Curtis Wellborn.

It was an undergraduate course, but the professor emphasized that everywhere else it would be a graduate-level of course.

My B+ tree implementation served as the foundation for my own (shitty) DBMS ;D

As a funny sidenote, the professor claimed that even Stanford DB courses had a buggy / broken algorithm for balanced deletes. While preparing the course, he'd spent a chunk of the summer doing researching it, and had developed his alogrithm in Prolog. He shared the concepts and our implementations worked fine. However, it was ultimately mostly an academic exercise, as balanced deletions come with significant tradeoffs which make it not as useful as I initially thought.

Even full-fledged database systems don't typically implement B+ tree indexes with deletion+rebalancing support (the usual technique is marking nodes deleted and cleaning them up when the index is rebuilt).


> Even full-fledged database systems don't typically implement B+ tree indexes with deletion+rebalancing support (the usual technique is marking nodes deleted and cleaning them up when the index is rebuilt).

Yes, and it took a surprisingly long time before anyone bothered to do the theoretical analysis. Some good papers to check out if anyone is interested are Deletion Without Rebalancing in Multiway Search Trees (http://sidsen.azurewebsites.net/papers/relaxed-b-tree-tods.p...) and Deletion Without Rebalancing in Binary Search Trees (http://sidsen.azurewebsites.net//papers/ravl-trees-journal.p...). Jeff Erickson also has some good notes on using local and global rebuilds for search trees: https://jeffe.cs.illinois.edu/teaching/algorithms/notes/10-s...

It's intuitively clear that worst-case performance [1] can degrade to a function of the number of insertions since a rebuild (as opposed to the number of currently live items) when you use relaxed deletion. If you can afford to globally rebuild the tree you get to play the usual amortization tricks, but global rebuilds can be problematic in a large-scale live database; one solution is to rebuild from a snapshot in the background (on another thread or machine) and then replay late arriving operations before switching over to the rebuilt tree. But if someone is okay with the amortization spikes from rehashing a hash table, they should be okay with non-incremental global rebuilds for search trees; a hash table where deletions leave behind tombstones and the table is rehashed when the ratio of tombstones gets too high is basically the same idea.


There are actually some pretty good database courses at CMU for example, that go into what is required to build databases from the ground up.

So don't think all DB classes are the same, there are some vast differences in that type of class compared to a class teaching you how to use SQL.


>There are actually some pretty good database courses at CMU for example, that go into what is required to build databases from the ground up.

Can you name or link a few?


Here is the YouTube channel for them, tons of content on there. Lectures of previous years classes, etc: https://www.youtube.com/channel/UCHnBsf2rH-K7pn09rb3qvkA

Also have some awesome guest speakers, and the conversations they have are often quite insightful.


I've seen quite a few database courses have you implement them, as they are the backbone for many database systems.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: