Thank you for saying this. Memory suggested that this was the case; I think one problem that happens on this site is that people distrust Google so much that they will trust some completely unknown organization that they've never heard of before over one (Google) that has presumably made themselves legally liable if they use your data to track you.
(I would also note to everyone that you can simply disable sending referrers third party, which means that even if Google is using this data to track you, they won't know what sites you are visiting unless those sites use very specific combinations of fonts.)
The Google Fonts API is designed to limit the collection, storage, and use of end-user data to only what is needed to serve fonts efficiently.
There's an awful lot of weasel words in there.
If it was a simple "The Google Fonts API doesn't collect or store any user data" that would be good. But there's so much hidden language in that one sentence.
- "Designed" — Well, it was designed to do that, but it doesn't. After we're caught, we'll put out a press release saying We Can Do Better™.
- "Limit" - It limits the collection. It doesn't prevent the collection. It doesn't not collect any data. It just collects "limited" data. And "limited" is defined by us and can be revised whenever we want.
- "collection, storage, and use of end-user data" has so many ways to be abused.
- "efficiently" — Efficient for who? Google? Google's advertising department? Google's profiling department? What if there's an inefficient way? What if there's a more efficient way, but it gives Google less data?
All this may seem unkind, but Google has earned the planet's distrust. In the early years, Google didn't believe that reputation matters. It does. And that's why the legal departments of billion-dollar companies like the one I work for don't allow us to use Google products.
There is no such thing as absolute privacy. By virtue of being a web-hosted service, you will need to interact with the end server, and that already has the potential to expose details like IP, referer, user-agent, etc.
The wording around designing and limiting collection is acknowledging this inherent problem and letting the user know that they’ve done their best to prevent malice.
It’s not weasel wording except for anons who like hating on the internet.
Are you talking about the x-client-data header (which isn't unique, but is relatively high entropy at <= 13-bits)? [1] that is used for evaluating the effect of experiments that Chrome is running on other Google services, which does include ads. But it is not used for personalization (I wish they would say that publicly).
For example, when I look at a Google Fonts request in Chrome developer tools I see:
Each of those numbers represents an experimental treatment that is currently active for my Chrome instance. (It looks like more entropy because it's multiple values, but they're all derived from a single 13-bit per-instance seed.)
That is only true if-and-only-if we pretend those 13 bits are the only identifying information being sent to Google when requesting a font. The HTTP request is almost certainly being sent to Google wrapped inside an IP protocol packet. For most[1] requests, there are at least 24 additional bits (why 24? see: [3]) of very-identifying data in the IPv4 Source Address field. More fingerprinting can be probably done on other protocol fields, and IPv6 obviously adds an additional 96 bits. Yes, IP addresses are not unique, but ~13 bits is easily sufficient to disambiguate most hosts on a private network behind a typical NAT. Correlating the tuple {IPv4 Src Addr, x-client-data} received on a font request is trivial: it only requires a user to login to any Google webpage that includes a font request.
>> re: your [1]
A given Chrome installation may be participating in a number
of different variations (for different features) at the
same time. These fall into two categories:
Low entropy variations, which are randomized based
on a number from 0 to 7999 (13 bits) that's randomly
generated by each Chrome installation on the first run.
High entropy variations, which are randomized using
the usage statistics token for Chrome installations
that have usage statistics reporting enabled.
How many users have 'usage statistics reporting' enabled, and are there for a "High entropy variation"? Is it enabled by default and thus will only be disabled by the minority of people that know how to opt-out?
[1] Google reports[2] they currently see about a 60%/40% ratio of IPv4/IPv6.
leads me to believe that Google has PI when people visit sites using google fonts.
Even if they don't use it for advertising purposes long term log keeping is not required to serve fonts.
It doesn't really matter what the service is doing, they didn't ask for consent to log the IP of people downloading fonts.
To be perfectly clear: it wouldn't keep me from sleeping at night and fonts permissions should be bundled with cookie consent or there should be a permission prompt (just like when asking for youtube vid.).
"by including Google-Fonts-hosted font on its pages, passed the unidentified plaintiff's IP address to Google without authorization and without a legitimate reason for doing so"
It isn't about whether the IP address was logged, but about whether it was sent. Which is an unavoidable aspect of loading a resource from a server.
Secure from whom? The mob? China? The US government? Google?
I'm more worried about the last two than the first two. It'd be illegal for them to secure it against US law enforcement, and they don't claim they're secure the data they log against access from themselves.
Use of Google Fonts API is unauthenticated. The Google Fonts API does not set or log cookies.
In other words, data from font serving does not feed into advertising personalization.
(Disclosure: I used to work on ads at Google)