As a programmer, I can completely believe I'd write a system for capturing and process beacons packets - to store the MAC and transmission power - that would also inadvertently capture and store data packets. In fact, using some off-the-shelve open source tools like Kismet, capturing data packets is the default - you need to manually disable it. That alone makes for an easy way to mess it up.
What I can't find is a "motive for the crime". What possible reason would they have to capture small samples of random data packets from random unencrypted APs?
per another comment below, i'm not arguing that they necessarily intended to collect additional information from the outset; just that they must have known that lots of other information would likely be swept up with beacon packets once they started collecting.
thinking about it counterfactually, they could have pretty easily discarded everything that obviously was not what they were claiming to capture at the point of collection (and interfaces to do so are built into the open-source tools you allude to), but it seems pretty clear that they proceeded for a pretty long time.
I think it is more likely than not that it was more or less unintended. Meaning that I think there is a good chance that none of the people building up and deploying the collection system had any interest in looking through packet fragments (which is where I would personally draw the mistake/intention line).
i'm mostly with you on this; i don't think anyone working on this project for google actively intended for lots of additional information to be exploited, even in relation to the project.
however i'm much less sympathetic to the notion that google didn't know that lots of additional information would be collected before they started collecting it.
so i think they may not have intended to use it before they started collecting, but i think they intended to collect all of what they collected.
care to elaborate? are you arguing that saving/processing the full contents of raw captures from monitor mode is somehow easier/cheaper to process than filtering out obviously extraneous information at the point of collection?
that's not at all obvious to me, and and one of the reasons i didn't find google's claims credible.
it seems much more reasonable that they would have realized upfront that limiting the volumes of captured data could have saved considerable time/money.... unless of course they had perceived additional value from capturing more information.
This data was recorded. It was not processed, it was not saved, and it was not filed under "lets-take-a-look-at-the-passwords-in-this-log.txt". The engineers thought "Hey, let's capture wifi data in aggregate and do cool geolocation stuff!" and ran with it without considering the fact that, in such data would probably be cleartext passwords.
As for time and money, those are two things Google has an indefinite supply of, so they aren't really relevant.
> ran with it without considering the fact that, in such data would probably be cleartext passwords.
This part, I just don't buy. Maybe full-take captures was the most expeditious at the time, but to me the notion that Google's engineers didn't know or didn't realize that they would end up doing a lot of "incidental collection" in the process is laughable.
And yet it's the one that Hanlon's razor explains very well. Goes to show that being terse in my original comment was the right move and you just can't wrap your head around the idea that, around the world, people do stuff without considering the consequences.
The people in charge of your country, of your internet, of your health and your whole life, they're all humans and don't have weird superpowers, omniscience or anything like that. They make mistakes, and a lot of them are going to do stupid stuff. Consequences may vary.
Not sure if you read the rest of my previous comment but Hanlon's razor is predicated on malice, so I still disagree.
I never assumed assumed malice here. But it's still preposterous that a company with such skilled engineers wouldn't know until they were sued that they had been collecting large volumes of additional information over the span of years.
That claim basically requires that this global mapping endeavour involving many people over multiple years was run by people who didn't understand the most basic aspects of wi-fi protocols but somehow flipped wireshark into monitor mode and then never looked at any raw captures anywhere to extract information from them, even while setting up collection systems used across the globe.
Doing that once could be a mistake; even collecting all of the data could be a mistake; but not even realizing they had collected vastly more than beacon packets--probably over most of the planet--for multiple years?
Hey...I guess anything is possible, but it doesn't pass the smell test for me.