A raw point cloud is run through a series of processing steps to label each point with a class, e.g. "Ground", "Low/Medium/High Vegetation", "Building", "Transmission Tower", etc.
There will be a different algorithm for each feature class. For example, points that are part of a building might be identified by finding groups of points that form a very flat surface. ML models can also do this based on training data.
The final digital elevation model (DEM) is then just taking the "Ground" class from the classified point cloud and using them to triangulate a surface. This differs from a digital surface model (DSM), which will triangulate a surface based on ground+building+vegetation points.
Removing vegetation seems like a harder problem than buildings. Buildings generally have cuboids and other standard shapes, but how do you determine the difference between small trees, big trees, bracken etc?
It the Scotland we have heather that can coat hills but I’m not sure that you’d be able to tell the difference between that and a forest canopy to assume a height and then subtract. Maybe there’s more than the point cloud to work with.
Aerial survey LiDAR can process multiple returns from a single laser pulse. So, some energy might be reflected back from a leaf, but some energy will pass through (or around) the leaf, hit the ground, then reflect back to the sensor. Some systems can record 5+ points from a single laser pulse.
With this information, you can filter the point cloud to only include points from the final return, which is likely to be the ground/a solid surface unless the vegetation is very dense.
You don't even need multireturn, typically your point cloud will have points from the tree or whatever plus some that returned from the structure behind it.
I understand how vegetation could be removed, but buildings? How is that accomplished?