That web domain dataset they get from the internet archive is interesting in lig...

That web domain dataset they get from the internet archive is interesting in light of the current discussion, in that I am supposing it probably has .uk content that has been removed from the actual internet archive by robots.txt changes.

I think if I were running a national or internationally mandated archiving initiative I would basically want to take in content from Internet Archive, and not remove things, and probably it would be less expensive that way than having my own crawler.