nice! I'd be interested in the method they use to put everything together. My best bet is some basic structure from motion weighted by the depth sensor...or maybe it's simpler than that...
Author here. It uses color info as well as depth for tracking. Otherwise, it'd fail if you pointed the camera at featureless geometry, e.g. walls, floors.