It had everything to do with Snowden. Source: I was at Google at the time he sta...

lern_too_spel · on March 23, 2024

You were at Google at the time, but your memory of the ordering of events is off. Google used HTTPS everywhere before Snowden.[1][2] HTTPS on just the login form protects the password to prevent a MITM from collecting it and using it on other websites, but it doesn't prevent someone from just taking the logged in cookie and reusing it on the same website. That was a known issue before Snowden, and Google had already addressed it. Many other websites, including Yahoo, didn't start using HTTPS everywhere until after Snowden.[3] I know because this was something I was interested in when using public WiFi points that were popping up at the time. I also remember when Facebook moved their homepage to HTTPS.[4] Previously, only the login form POSTed to an HTTPS endpoint, but that doesn't protect against the login form being modified by a MITM to have a different action for the MITM to get your password, rendering the whole thing useless.

What changed after Snowden was how Google encrypts traffic on its network, according to an article quoting you at the time.[5]

[1]https://gmail.googleblog.com/2010/01/default-https-access-fo...

[2]https://googleblog.blogspot.com/2011/10/making-search-more-s...

[3]https://www.zdnet.com/article/yahoo-finally-enables-https-en...

[4]https://techcrunch.com/2012/11/18/facebook-https/

[5]https://arstechnica.com/information-technology/2013/11/googl...

fl0ki · on March 23, 2024

An important clarification is that the leaks about NSA snooping on Google motivated end-to-end encryption between all pairs of Google internal services. It was a technical marvel, every Stubby connection had mutual TLS without any extra code or configuration required. Non-Stubby traffic needed special security review because it had to reinvent much of the same.

People even got internal schwag shirts made of the iconic "SSL added and removed here" note [1]. It became part of the culture.

Over a decade later I still see most environments incur a lot of dev & ops overhead to get anywhere close to what Google got working completely transparently. The leak might have motivated the work, but the insight that it had to be automatic, foolproof, and universal is what made it so effective.

[1] https://blog.encrypt.me/2013/11/05/ssl-added-and-removed-her...

mike_hearn · on March 23, 2024

A minor quibble; iirc it was only connections that crossed datacenters that were encrypted. RPC connections within a cluster didn't need it, as the fiber taps were all done on the long distance fibers or at telco switching centers.

But otherwise you're totally right. I suspect the NSA got a nasty shock when the internal RPCs started becoming encrypted nearly overnight, just weeks after the "added and removed here" presentation. The fact that Google could roll out a change of that magnitude and at that speed, across the entire organization, would have been quite astonishing to them. And to think... all that work reverse engineering the internal protocols, burned in a matter of weeks.

lern_too_spel · on March 24, 2024

According to the reporting at the time, the NSA has shut down the email metadata collection program, which was the only leaked NSA program that parsed data on those taps, back in 2011; so the reverse engineering work was burned by an interagency review two years prior to Google's cross-datacenter encryption work.

mike_hearn · on March 24, 2024

They were tapping replication traffic on a database that included login IP addresses. I remember it well because it was a database my team had put there.

lern_too_spel · on March 24, 2024

I missed that leak. Any chance you have a link for me to fill in my gap?

mike_hearn · on March 25, 2024

Slide 5 (Serendipity - New protocols) in this presentation:

https://github.com/iamcryptoki/snowden-archive/blob/master/d...

It's heavily redacted but the parts that are visible show they were targeting BigTable replication traffic (BTI_TabletServer RPCs) for "kansas-gaia" (Gaia is their account system), specifically the gaia_permission_whitelist table which was one of the tables used for the login risk analysis. You can see the string "last_logins" in the dump.

Note that the NSA didn't fully understand what they were looking at. They thought it was some sort of authentication or authorization RPC, but it wasn't.

In order to detect suspicious logins, e.g. from a new country or from an IP that's unlikely to be logging in to accounts, the datacenters processing logins needed to have a history of recent logins for every account. Before around 2011 they didn't have this - such data existed but only in logs processing clusters. To do real time analytics required the data to be replicated with low latency between clusters. The NSA were delighted by this because real-time IP address info tied to account names is exactly what they wanted. They didn't have it previously because a login was processed within a cluster, and user-to-cluster traffic was protected by SSL. After the authentication was done inter-cluster traffic related to a user was done using opaque IDs and tokens. I know all about this because I initiated and ran the anti-hijacking project there in about 2010.

The pie chart on slide 6 shows how valuable this traffic was to them. "Google Authorization, Security Question" and "gaia // permission_whitelist" (which are references to the same system) are their top target by far, followed by "no content" (presumably that means failed captures or something). The rest is some junk like indexing traffic that wouldn't have been useful to them.

Fortunately the BT replication traffic was easy to encrypt, as all the infrastructure was there already. It just needed a massive devops and capacity planning effort to get it turned on for everything.

mike_hearn · on March 23, 2024

The first two links are about Gmail and personalized results in web search specifically. Even as late as 2011 SSL being activated for a product was treated as unusual enough to write blog posts about, and it was up to individual projects whether or not to activate it and how to trade off the latency costs.

You're right that I might be mis-remembering the ordering of things, but I'm pretty sure by the time Snowden came around the vast majority of traffic was still unencrypted. Bearing in mind that lot of Google's traffic was stuff you wouldn't necessarily think of, like YouTube Thumbnails, map tiles and Omaha pings (for software update). Web search and Gmail by that point made up a relatively small amount of it, albeit valuable. Look at how the Chrome updater does update checks and you'll discover it uses some weird custom protocol which exists purely because at the time it was designed Google was in a massive LB CPU capacity crunch caused by turning on SSL for as many services as possible. Omaha controlled the client so had the flexibility to do cryptographic offload and was pushed to do so, to free up capacity for other services.

> What changed after Snowden was how Google encrypts traffic on its network, according to an article quoting you at the time.[5]

That also changed and did so at enormous speed, but I'm pretty sure by June 2013 most external traffic still didn't have TLS applied. It looks like Facebook started going all-SSL just 8 months before Snowden.

lern_too_spel · on March 24, 2024

I had completely forgotten about YouTube. I think it switched to https video serving post-Snowden, but I can't find the announcement.

Edit: Here it is. Only 25% of YouTube's traffic was encrypted at the start of 2014. https://web.archive.org/web/20160802000052/https://youtube-e...

pkaeding · on March 23, 2024

Right, I remember (as an outsider to google) the push for https coming after Firesheep [1] and the google research on the actual CPU cost of https [2], both in 2010. Snowden's revelations came in 2013.

[1] https://en.m.wikipedia.org/wiki/Firesheep [2] https://www.imperialviolet.org/2010/06/25/overclocking-ssl.h...

KennyBlanken · on March 23, 2024

That's nice and all, but the "why" is more important than the "what".

Google was driven not out of some panicked rush to protect user privacy, but to protect Google's collection and storage of user data.

Google has 10+ years of my email. It doesn't treat that like Fort Knox because it gives a shit about my privacy; it treats it like Fort Knox because it wants to use that for itself and provide services to others based off it.

You do know that Google was heavily seed-funded by the NSA, right?