It's actually very clever. The key is it works both ways. By respecting the live...

It's actually very clever.

The key is it works both ways. By respecting the live robots.txt, and only the live one, data hiding must be an active process requested on an ongoing basis by a live entity. As soon as the entity goes defunct, any previously scraped data is automatically republished. Thus archive.org is protected from lawsuit by any extant organisaton, yet in the long run still archives everything it reasonably can.