So what is the strategy to deal with legitimate 1st party subdomains and tracking/ads subdomains if they use random strings as identifiers? (I am guessing this is where we will need a combination of crawlers and machine learning algorithms)
Couldn't you blacklist all subdomains of the 1st party and whitelist the few that are actually real?
Or, assuming they have a small list of subdomains that redirect to ad servers, you could generate a list with a script that checks all their subdomains and creates a block list based on that. For example, the site discussed in the OP has all their subdomains listed here: https://crt.sh/?q=%25.liberation.fr
Edit: looking at the OP case, it seems like they only have one ad domain. I'm not sure I see this as a serious issue until multiple sites start rolling out thousands of subdomains, some pointing to back to the real server, others pointing to the ad server. Maybe that will happen but it's a pretty big barrier to entry, and just short of proxying everything through the 1st party.
I'm speculating that the balance is in the reverse favor. Last night I was looking at some file on GitHub which was redirecting to what looked like an S3 bucket subdomain named with a pattern like "github-production-f7e281a2", which I simply presumed to be cache-busting via subdomain instead of appending the hash to the filename. If my assumptions were correct, every time GitHub deploys a new build, you would have to whitelist that subdomain.
You would have to block entire wordlists to combat subdomains like that. It would make more sense to whitelist subdomains instead, but it would require much more effort in order to determine what subdomains are required for the website to function. Additionally, if the site in question ever decided to change anything around, someone would have to catch the breaking change and have it corrected on the whitelists for the site to function again.
Machine learning by analyzing what displays on the page by blocking different domains. Bots can be automated to do that continuously and update a decentralized database with such information.