This. Chatter on the public web gets scraped and a connection is inferred by some alogithm somewhere.
However, what's an inside joke (or just sarcasm) to the in-ground might not be a joke to google.
A link to a news story to the tune of "boy steals car and crashes it into river" is obviously a joke when introduced as "look it's $driver's son" in the context of $driver's performance at a formula off road event. Sarcasm on the internet is hard for people to read and harder for machines. While the overall percent of straight up sarcasm and jokes are probably pretty small there's probably a lot of other noise signals in there as well that reduce accuracy.
When you start introducing data from other sources (IP addresses, geolocation, usage patterns) it gets very easy to spot correlations.
I'd assume Google is very good at making these sort of connections out to a few degrees of separation.
However, what's an inside joke (or just sarcasm) to the in-ground might not be a joke to google.
A link to a news story to the tune of "boy steals car and crashes it into river" is obviously a joke when introduced as "look it's $driver's son" in the context of $driver's performance at a formula off road event. Sarcasm on the internet is hard for people to read and harder for machines. While the overall percent of straight up sarcasm and jokes are probably pretty small there's probably a lot of other noise signals in there as well that reduce accuracy.
When you start introducing data from other sources (IP addresses, geolocation, usage patterns) it gets very easy to spot correlations.
I'd assume Google is very good at making these sort of connections out to a few degrees of separation.