It's more than likely anonymized, aggregated data that mostly just counts how many times a given picture was visited globally. If it's over X times, by at least Y unique devices and can be accessed publicly (I.e. Random Microsoft server can access it), then they probably upscale it via AI and redirect future navigations to their version of the picture.
Such simple heuristic also circumvents most (or all) the concerns about medical images.
IANAL but it seems like this potentially exposes MS to enormous liability. For example, doctors routinely use web based tools. When they look at images related to a patient's care and MS uploads them to the mothership, has MS violated HIPAA? What about someone browsing images of child sexual abuse? Is MS liable for storing that illegal content on their servers? When a user uploads illegal content to OneDrive it's easy to blame the user, but when MS is harvesting all the content the user views, I'm not sure they get off so easy.