The first few are simple, photograph + DEM (digital elevation model) or DTM (digital terrain model) aka height map will give you a 3D landscape with photo-accurate texture. They add manual CAD modeling in the later images, and that gives them the extra detail they need for 3D buildings / structures.
For the interior, it's a lot more impressive. They're doing photogrammetry or SLAM equivalent, reconstructing depth from stereo photographs and stitching multiple pairs together to generate a scene. They're also adding CAD in that case, as you can see with the wireframe for the generator.