Having seen this at too many companies, we at fair.com decided to adopt stronger policies to prevent this, viz:
- all inbound API requests first go to our API proxy in the secure layer.
- the API proxy encrypts all PII using the encryption service in the secure layer
- then API proxy sends the request on to the appropriate service, having swapped all PII for tokens.
- all services in the general layer are not able to talk to the encryption service to decrypt data.
- thus all data that would normally be considered very sensitive (e.g. credit reports) can be stored or passed around the services because values like SSN and others are tokenized.
- our services with UI's like our CRM, wherein the customer service rep needs so see the data unencrypted, works perfectly bc the response from the API proxy outbound decrypts the data, but selectively so based on the person's permissions through our abstracted auth layer.
Thus in the Lyft example, the majority of employees could have access to "God view" but with all the PII encrypted (so they couldn't search their friend's account by email, for example) but they can still look at rides and transactions, while just those who need to see the decrypted PII could be given those permissions.
Of course, this assumes that the encrypted PII is sufficient for anonymization. If you can look for all rides within a block of an address in their God view, then you could quickly figure out which person was your friend by narrowing down to rides originating from his house and his work. But again that comes down to properly limiting certain search capabilities in the UI.
I don't get why more companies don't follow our approach. I've seen way too much personally sensitive data in plaintext in databases over the years.
I sincerely hope that your approach becomes common enough that I will see it implemented in practice. Right now it is rather the opposite, plain text data flying around everywhere and at rest in test databases and backups is the norm. And if you are really unlucky you will find it on the laptops of developers in the form of a two week old copy of main database. And if you are even more unlucky that developer does not have his hard drive encrypted and takes the bus every day to work. And already lost another laptop, just like that one, only nothing ever came of it so the data is most likely still safe.
A database would already provide a layer of security (most people, including many "professionals", would not know what to do with it).
Customer data is often stored in a large, denormalised table in an Excel file called "customers" which contains the entire set of customers past and present, including those who unsubscribed or have not ordered for years, with all attributes (phone, address, etc.).
There is also no oversight on who has access to this file, since "marketing needs it for Facebook" or "customer service needs it for returns" or whatever the excuse du jour is. Even the newly hired intern with a stronger than usual interest in every system's credentials gets a copy.
You get some measure of security when there are more than one million customers and they need to start partitioning.
The most secure setup I've seen was a company whose entire repairs department ran on paper slips. Since nobody had the time or inclination to enter the information somewhere digital, no internal staff knew how to find customer information, even if they walked to the shop floor. I think a couple of old timers knew how to navigate the pile of slips and were the de facto DBMS engine.
Technical solutions are great, but ultimately this is a human problem. Not every firm has the resources or know-how to do all that, but any firm can have a clear policy of termination and lawsuits for anyone found abusing access to customer data.
> Not every firm has the resources or know-how to do all that
The upcoming EU Data Protection law (GDPR) has fines for data protection breeches. It can be up €20 million or 4% of global revenue. And NGOs can sue you on behalf of affected people. When there's a price tag on non-compliance, suddenly it becomes easier to justify allocating resources to doing it (or paying someone else to tell you how to do it).
Obviously a company should have policies against the abuse of data - and I'm sure almost all of them do. But you need a technical solution to be able to determine when people have accessed information they shouldn't have, otherwise the policy is worthless because you won't know that the abuse has happened until it blows up in your face.
So, since you already need a solution to be able to audit activities, adding controls to help cut down on the number of people who are able to abuse the data reduces your potential liability immensely.
I'm curious about one aspect though, have you put much thought into what happens and the side effects of doing something like key rotation if you're encrypted service is potentially compromised / leaked.
The second aspect I'm curious about, is you mention services in your general layer are not able to talk to the encryption service to decrypt data, but what about encrypting data? The reason I'm curious is the tricky part with anonymization, is I don't necessarily have to decrypt PII to unmask it.
I don't really know what you're service does, but say it's tracking location, and one of the pieces of PII is phone number. If I can go to the encryption service and ask for the encrypted version of a phone number I know, I then have the encrypted phone number that I can use to search the dataset.
The service acts more like a key value store (this is a simplified explanation, but for your questions it will do).
You give it a value, it gives you back a token, which you can later exchange for the original value.
This means the real value is stored in the encryption service, not in the receiving applications database.
This gives us the flexibility to perform key rotation (and even upgrade our ciphers as the crypto landscape evolves) at any time without having to worry about where the the encrypted value is being used, as the only data stored outside the service are opaque tokens.
As for de-anonymizing, the service is not designed to take an encrypted value and return its token.
If that were possible, we wouldn't have done a very good job encrypting it ;)
For de-anonymizing, the idea is to give the encrypted service the plain text and get a matching token. But then that will be more of a hash. If you are encrypting where all the tokens are different, you can't do a join or analysis. You can't for instance count how many unique phone numbers you have. If a user is using your app, how do they see their PI data?
> If you are encrypting where all the tokens are different, you can't do a join or analysis.
That would hopefully be part of the reason for doing it this way.
I once worked on a system where we encrypted most customer data on registration and took it entirely off line once a day (so new data was in encrypted form online for a day, and then was air-gapped permanently).
The fact that marketing etc. had to request reports to be run manually on the airgapped customer database was an important barrier that made them think about how they could meet their needs without it.
Sometimes, of course, they had genuine needs that needed access to the unencrypted data, but it was rare.
I'm a big fan of making it take extra effort to do these things - time and resources seems to be a far stronger barrier than requiring authorization.
You're correct that it does make certain kinds of analysis more difficult.
However that doesn't mean we can't ever get access to the original data.
Most of our current BI needs to can be met using the un-encrypted data, but for example, if we did want to answer your phone number question, we could craft a special purpose program to perform the analysis without compromising user privacy.
1. Select all phone number tokens
2. Decrypt
3. Produce counts (total unique, etc)
Said program would have to go through normal code review and approvals, and then deployed into the secure zone (so it could access the encryption service).
For an API call that requires decrypting data it can add about 5-10ms, most of that comes from reading the values from the token database.
Encrypting will incur a similar performance penalty (again from the database).
The core pieces (API Proxy, Encryption Service, secure/general infrastructure divide) we're done by me in the first 2-4 months.
We now have a team of three (myself included) who maintain those systems (among many other responsibilities).
Productive this and sell it at a low cost if it’s that simple. I’d love to use something like this but building it from scratch in house is cost prohibitive.
While I like that, and is a cool approach, what is really the difference between just storing PII encrypted in the same database and then keeping your keys locked down based on permission levels?
The API proxy would be great if you have a trusted 3rd party in charge of it or something. But abstracting it out to a separate layer doesn't seem necessary since it's all within the same company anyway, and developers will need access anyway
- Logs: we can log all params within the general layer without worrying about leaking PII to the logs
- Monitoring Network traffic. If I need to use wireshark or something similar in the general layer, all the data there is already encrypted/tokenized so it is safe to do so.
- If you let each service encrypt the data itself, then all of those services are in scope from a security perspective. Access to those services from engineering's perspective would have to be considerably more locked down, potentially preventing engineers from access to their ENV variables, memory dumps, sshing into those boxes/containers, etc, for fear you could get the encryption keys (of course while many of those things should be locked down anyway).
- Further, having each service do the encryption itself means you are duplicating that solution over and over again and introducing more opportunities for error. Having a single encryption service within the secure layer allows us to change our approach more cleaning than it being spread out everywhere.
In addition to the points already raised by
ryan_j_naughton, another consideration is how to perform key rotation and cipher upgrades over time. This is a lot more complex if every application is doing it on their own, but is quite easy if you centralize all (or most) crypto to a single service.
Probably due to granularity of permissions, key revocation etc reasons. Parent's method is analogous to a high school hall pass to access the restrooms; your proposal is to tell the janitor a "secret" password.
The key problem with both Lyft and Uber is that employees who had the right to view the data abused the privileges.
While encryption can help to enforce the privileges against technically capable employees, the main problem is the privilege system itself, or lack thereof. 95% of the abuse would be eliminated with no encryption, just proper design of the internal interfaces and queries talking to the plain text database.
And many employees really do need elevated access for their work. It's then imperative to create a privacy focused internal culture where it's clear that abuses result in termination and civil/criminal actions, and put in place strict access logging and enforce these policies.
Agree completely. Encryption or tokenization is great but a much easier first step which covers the first order problem of employees abusing access privileges is simply a secure audit trail and someone to actually look at the logs and discipline or terminate employees who misuse their access.
There are probably innumerable reasons to have access to a particular ride’s exact route, or a particular users ride history, or feedback history, etc. Just like the police have many perfectly valid reasons to run a plate.
Audit logging and request throttling are the low hanging fruit. If your system can tie each request to the service ticket which prompted it, even better.
To add to this (which is excellent btw), we have taken the following measures:
Rate-limiting on admin interfaces and APIs (so a "rogue" human with admin rights can't just suck out PII in bulk).
Access controls on sensitive data and dangerous operations (such as account deletion) that allow these things to be limited to a small number of admin staff.
Ability to mark specific accounts (e.g. celebrities, senior management) as inaccessible by admins, overridable only by special privileges that require sign off.
Regular audit of log files and raw database content to verify that no PII is leaking into uncontrolled areas.
There are probably other measures I can't remember but the basic idea is to throw sand in the face of a potential internal attacker while still allowing legitimate activities to be carried out. In a larger organization you might be able to use ML to detect anomalous admin activity.
Update : I remembered another thing -- don't collect sensitive data that you don't need. I've had several discussions over the years along the lines of "we found that we can get xxx (PII field of some sort , e.g. the user's cell phone number even if they didn't know they gave it to us), where should we store that?". Answer : don't.
We considered doing something like this on a project once but we had concerns about performance. How much latency does your API proxy add in for responses with thousands of tokens?
Additionally we found that this broke any sort of database indexing. Let's presume that Lyft would need to tokenize the start and end points of journeys (otherwise it would be quite easy to de-anonymise someone) then just doing simple queries like I want to know how many journeys happen in this area either become incredibly slow or you have to have anticipated needing this type of query and providing a non-tokenized "area" field which can be indexed (but is sufficiently coarse not to leak data). Were you able to come up with any sort of solution for this issue?
The biggest issue is that consumers don’t read privacy policies or user agreements. If a company gets a user to waive their privacy, their data can and will be traded on the secondary market. It’s just simple economics.
I think most people who reside in the US would be shocked if they knew how much PII is traded about them. Very few people these days don’t have a personal record. Knowing what’s out there, I wish I lived in the EU.
Isn’t it pretty standard practice to confine (overt, obvious) PII to the users table/user CRUD service, so that the rest of the infrastructure only speaks in user IDs?
Not if you have credit reports and other data that could contain PII as well. You don't want the name or address in the credit report to just point to your user's table, with an assumption that that was the name in the report. And if you don't know with certainty that the credit report pulled was for that person (there are collisions, mismatches, etc), then you want to store that data as associated to the report and not the user (which the report is then associated to the user).
For proper data integrity and data provenance, you want to know what you knew at each point in time. Thus, simply pointing to a user_id and hoping the data on the user's table was the data at some point in time in the past will result in leakage for data science (https://www.kaggle.com/wiki/Leakage).
The tricky part about this sounds like it would be specifying in the proxy service which parts of structured requests are responses are PII - how do you handle this?
For a system like Lyft, I think you'd want all rides to be anonymized by the user's app, so that in case the user contacts support, they can if need be give permission (and send the key) for support to review a single ride. Something like that.
A bit more tricky for drivers though, due to billing and tracking and such.
Would you please consider open sourcing some of this? If not the code then perhaps a more comprehensive design doc, explaining the different components involved, some API samples and flows so people that want to can recreate this?
We did a similar thing for credit card storage. We did it in house because I failed to find any open source solution that I liked that would fit in with the rest of our stuff. Things in this space (free or commercial) seemed to have strong opinions about how the rest of your software should work.
I received at one point permission to release our stuff as open source, but didn't have time to clean it up for outside release. I could probably get that permission renewed and find time to do it, but I'm not sure if it would be worth it, for a couple reasons.
1. Once upon a time I got totally fed up with the idiocy and complexity that is SOAP and the way I kept running into subtle incompatibilities between SOAP servers and SOAP clients when the clients were provided by different vendors than the servers and were in different languages. I yelled "SCREW SOAP!" and hacked out that weekend something I called RADIO (named after the old joke [1]). RADIO lets you write a service in Perl with some annotations in comments that describe the services provided. You then run that through the RADIO generator, and it spits out a Perl CGI that implements the service, and client modules for it in Perl, PHP, and Python. The Perl CGI also provides a human usable interface on the web that gives you a form-based interface that provides documentation for each API call and lets you invoke them from a browser.
I used RADIO to implement the credit card storage service. I'm not sure many people would be interested in running a Perl service implemented using a weird sort of framework sort of code generator.
2. How it uses cryptography (choice of cipher, mode, padding, and such) has not been reviewed by a cryptographer. I know enough to not implement my own cryptographic primitives, of course, and so used well-known implementations from CPAN. However, I still had to make choices about HOW to use those, and those choices have not been checked by an expert.
It's so specific to our architecture that there would be little value in open sourcing it.
That being said, there are some lessons we learned along the way that I do think are worth sharing.
As I mentioned above the API proxy supports REST and gRPC API calls.
For REST based APIs, we developed a YAML syntax for declaratively specifying (per route) the input/output keys that needed to be encrypted/decrypted.
This is accomplished using a JsonPath like syntax, along with attributes for each key that specified the type of data, as well as letting us perform limited validation on the data.
However this approach provided little visibility into what fields are not being encrypted, leaving room for mistakes.
This is where gRPC/Protobufs come in to play, it's the newest addition at the API Proxy level, but is not new to our internal architecture.
Using protocol buffers as our IDL has been a huge help in terms of our data auditing capabilities.
Unlike our YAML based solution before, with protobufs we can see every single field that is expected as an input or output of an API call.
We've utilized this feature by creating wrapper message types that indicate when a field contains data that needs to be encrypted/decrypted, and have helper libraries that can traverse an instantiated protobuf message, perform the encrypt/decrypt operation, and swap out the original value(s) for the one(s) returned from the encryption service.
Once we have the library to perform the traversal and the compiled protobufs, integrating it into the API Proxy (along with grpc-gateway to translate to REST/Json) is actually pretty straight forward.
I suppose it's difficult to answer what's specific to your architecture..... but it does seem interesting. I really wonder if there can be a open-source solution that makes it easier to do what you're suggesting, but for anyone.
When I did an internship at a national lab, a lot of the hard rules about security relied on the fact that you had gone though their hiring process and would follow the rules.
There were different access levels, for sure, but only like 2 or 3. You might have "had access" but you shouldn't be anywhere you didn't have a good reason for being.
Lyft should be checking on this, running audits and whatnot, but they also should be setting good policy and culture to not abuse access.
Basically, I think its reasonable to both allow many people access and expect them to not abuse it.
> Basically, I think its reasonable to both allow many people access and expect them to not abuse it.
Indeed. The FCRA accounts for bored clerks looking up random peoples' credit history.
Just because you have access to something doesn't mean you're allowed to touch it without a valid business reason.
I'm no fan of regulation but the wild west of PII is long past needing to be tamed. Companies need to be held responsible for their intelligence and how it gets used.
That might be how we deal with children who can't handle responsibility, but the absence of technical controls for every nuance of life is why ethics and code of law exists for adults.
Access controls are not a substitute for maturity.
> Access controls are not a substitute for maturity.
But maturity is not a substitute for access controls either.
In any organization of some size, no matter how much you hire for "maturity", eventually people will slip past who have all kinds of reasons they'll be able to justify to themselves for deciding it's too tempting to look at things they shouldn't.
Agreed, too bad humanity's level of maturity as a whole is very far away from ideal. If everyone did what they should we would probably be living in an utopia. But that's not the case.
I don't mean that everything should be super-locked down to the point where it's inaccessible, just tweak it enough to not be misused.
The idea of an audit trail is good, since you can go back in history and make any misbehaving parties accountable. Or design a system where the client authorize a rep to look into her records ---just like banks do when you ask for your balance.
There's a bit of pragmatism involved in the level of control.
If you add too much friction to the process of accessing information, then it can actually impede on actually handling user support. For example, having access to someone's ride history when trying to resolve a dispute seems relatively normal.
Of course in Lyfts case it seems pretty clear that there can be more programatic locks. And auditable logs are able very good idea in general.
But programatic locks are tricky. How do you transform and e-mail from a user confirming permission to history into an unlock code?
If only we had modern phone infrastructure that could actually transfer useful info to the appropriate person... Instead we have terrible phone support that requires me to repeat the same info 3 times.
That info should be unlocked the millisecond I am connected with a rep . It's not a moonshot
My feeling is that stuff is doable, but hard-ish. For example, for this case now you're writing something to interface with the phones? How do you know the phone number is for a certain client?
Though I definitely see someone writing a thing where your ticketing/support system grants partial data access, you end up either making the support system pull in information from the DB... or your DB access controls being controlled through the support system.
the latter one can potentially introduce security issues. The former one's easier but you can easily run into the "oh, this information's not gettable through the ticketing system".
Too much reliance on programmatic access controls causes people to think “if it’s allowed by the controls, it’s allowed by common sense” which is rarely the case.
On the flipside, systems that are too cumbersome to use because of access controls lead people to do things like maintain shadow systems in Excel spreadsheets just to get their work done. Of course with no security at all.
Couldn't this come under violation of terms of service under Lyft's end and open them up to class action, so there's no need for new regulation at least for this current issue.
This was how it worked when I worked in admissions during college. You had access to every applicants' information, grades, essays, etc., as well as counselor feedback. But you were told that if you looked up yourself, someone you knew, or any celebrities, then you could be fired.
I don't know if there were automated checks for that kind of thing, but everyone knew there was a line you didn't cross.
At Lyft people did think there were automated checks, did know there was a line that shouldn't be crossed, and yet there was rampant abuse. Don't you suspect that many of the students in your position abused their access?
I think companies should be responsible for implementing effective security, whether that means preventing improper access or at least detecting it and punishing it after the fact, not just establishing a "culture." The most dangerous people, the ones who commit violent crimes, aren't limited by culture anyway, because they despise norms and have very different perceptions of risk compared to most people.
In your case, your fellow student workers might simply have not felt safe sharing their crimes with you. "Naughty" behavior can be taboo yet widespread.
Whereas folks I worked with at a support vendor for AT&T HomeZone, when that was a thing, regularly looked up celebrities’ private phone numbers in the unified customer systems with no repercussion, despite the same onboarding spiel. HomeZone was unique in that support reps by necessity had access to AT&T, Dish, and Yahoo! (email) CRM, which covers a very broad range of people and activities; I didn’t do the looking, but I overheard a sampling of notables and legislators who subscribed to the “500s” (Dish lingo for porn at the time, don’t know if they’ve moved the channels since).
Yahoo! was the only one that limited access reasonably. It always struck me as odd that auditing didn’t pick up that query behavior. For your main job, PeopleSoft would take care of everything and limit you to who you needed to see, but there were a plethora of other systems and places to look.
This type of thing certainly isn’t limited to Lyft.
My "elite" college had crisis counselors, student volunteers. Guess what happened if you dated one of them. They searched the counseling files to look for dirt on you.
In hospitals in the U.S. the way it works in some is nurses can view a lot of the patients charts (including VIP). And then someone is supposed to audit who viewed those VIP patients (celebrity or what not) but every hospital is different and it's a mess.
In Stockholm, there is one journal system everyone is mandated to use, written in APL (was called Take Care but could have changed name), and completely without access control.
I believe each institution get printed access logs sent to them, which is never looked at.
Here in Canada if you have access to the central medical records you can look up anyone but (a) if you are not a doc and are not assigned to the case or (b) you are looking up yourself or a family member, you immediately get a call and get fired on the spot.
(Source: Wife works at the hospital and has seen some people get fired shortly after unauhorized access.)
My hospital system has similar policies in place. I like to tell people, "If I look at your record, I'll probably get fired. If I look at my own record, I will absolutely get fired."
Of course, both of them are prohibited (without a valid business reason), but the latter is easier to detect in an automated fashion.
Not infrequently there are notes in patients charts that only care providers can see. This practice is becoming frowned upon in some circles however, in lieu of more transparency. I imagine it being really useful for certain scenarios (e.g. mental illness or documentation a sensitive domestic relationship, where you wouldn't want a potentially abuse family member to see comments about their behavior in a relative's chart).
You don't really own your medical record, also doctors add notes to your medical record that you may not like. For example your medical record may say that you don't follow up with medication, abuse drugs, have psychiatric/personality problems, etc. which patients could be sensitive about.
And trying to exercise that right by using tools you have access to as an employee of your own health care provider rather than going through the proper channels that any other patient goes through to request such access (which involve the employer having documentation of the request, and auditability as to the purpose of any access involved in serving that request), is likely to violate internal procedures designed both to protect PHI and assure that all access is within job function.
You would think that if they have the ability to audit at that level and with such prompt responses they would have the resources to lock down the systems properly and to implement a consent policy that works. Allowing everybody access is a bit like binding the cat to the bacon and then getting upset because the cat can not be trusted with bacon.
Better to keep the cat and the bacon separate, the temptation to peek is large and if there is one thing I know about people then it is that curiosity is a pretty common affliction.
And that is assuming that those accesses are on purpose, people can make honest mistakes as well and they will also look like unauthorized access.
Except putting barriers can cause even worse problems. If I can’t look up someone’s allergies because the system doesn’t think I should, fuck them right?
Exactly, barriers to looking up a patient's information can be fatal. I need to give someone medication now to stabilize them. What medications are they on now? Can't look it up? Better give it to them and hope there's no adverse reaction.
Better to avoid that situation and implement auditing while making sure people know the rules are enforced.
"override authentication" -> You have chosen to override the authentication protocol, you are logged in as John Doe, all your actions will be subject to internal affairs review, continue yes / no?
That tends to happen when the user logs in, so they'll probably see it multiple times per day. I think that's pretty standard. The system doesn't have to be the wild west. But if someone can click through an authentication override then it's not really doing anything.
This system broke down at UCLA a few years ago. They ended up paying out quite a bit of money when it was revealed that hospital staff had viewed the medical records of celebrity patients.
My wife is a music therapist, and her previous employer had a contract with a local hospital system. During one visit, she (or a colleague—I don't recall) was working with someone with the same last name, but no relation; this set off an automatic alert in the EMR, leading to a follow-up inquiry. VIP records have similar alerts configured to verify that the accessors have legitimate business reason.
I’m at a financial services firm, and we have an entire internal risk department to ensure employees aren’t exceeding their authority. Surfing the wrong websites? Badging in and out at abnormal hours? Accessing internal apps in ways you shouldn’t? Access immediately flagged for human intervention and you’re locked out. Our data scientist team improves on the heuristics constantly.
At some point, organizations with data have to learn how to manage IAM [1] properly.
The FFIEC considers your heuristic system “Innovative” according to the Cybersecurity risk assessment methodology. Certainly not typical for a financial institution. Pretty cool stuff though!
I’m on our cloud security team; our leadership chain gives us wide latitude (Hacker News A-OK) since our entire job is to be subject matter experts. Literally “Know All The Things” was how my job req was described to me.
> Basically, I think its reasonable to both allow many people access and expect them to not abuse it.
I couldn't disagree more. Eventually, you're going to hire an idiot (and/or budding rapist). When you have PII of this nature, if you're going to allow lots of people access, you need individual access controls, logging, and most importantly, auditing of the aforementioned data. And auditing may not be enough; you probably need individual inspection and approval to eg look up user info not tied to a ticket you're processing.
Particularly after seeing the Uber god view scandal, there's just no excuse not to have this basic stuff in place.
I worked for a much much smaller startup handling data that was significantly harder to tie to a actual person and we did the above.
I remember a briefing when I worked for BT in the UK some one looked up for a mate his exes new address - who then got murdered.
There was also the case involving hit men who found the address of a targets mom and dad who where also killed by bribing some one.
BT took security v seriously and you had better hope if you did something bad that the cops or the even the secret service (MI5) got to you before the internal security team did.
>BT took security v seriously and you had better hope if you did something bad that the cops or the even the secret service (MI5) got to you before the internal security team did.
I would rather be "dealt with" by BT than with the cops or secret service. BT can fire you, the cops and SS can take away your rights (with due process)
As a former PTT IB or SD was descended from the unit in the GPO that dealt with stealing for the post so had some odd quasi legal standing - its also where the secret squirrels worked.
They have a bad reputation as in the bad old days some of the confessions involved falling down stairs, which I was hinting at :-)
Lyft tells TechCrunch that staffers in several departments that might need access to this data for their job have the ability to look up this information
See, that's a complete lie and that's the attitude that needs to sop.
No-one needed access.
Analytics definitely didn't. Engineers never did. Customer services should have to request permission from the customer before accessing sensitive data, with a valid reason. Insurance definitely did not. And "trust and safety", just like customer service, should have had to get customer permission first.
What's the need?
These are the laws that need to come into place, if your company is bigger than X, safeguards on personal data must be in place to stop anyone accessing a customer's personal data without explicit permission from the customer/senior (legally culpable) manager or as needed to fulfil an order.
https://danluu.com/wat/ apparently this is normal:
Facebook famously let all employees access everyone’s profile for a long time, and you can even find HN comments indicating that some recruiters would explicitly mention that as a perk of working for Facebook. And I can think of more than one well-regarded unicorn where everyone still has access to basically everything, even after their first or second bad security breach. It’s hard to get the political capital to restrict people’s access to what they believe they need, or are entitled, to know. A lot of trendy startups have core values like “trust” and “transparency” which make it difficult to argue against universal access.
It doesn't need to be normal. There's no reason companies couldn't build a system that required approval from your manager before being able to access customer data. Any time a manager granted access, that could be audited by some second tier.
"Okay, stay on the line. My manager went to the bathroom 15 minutes ago, he should be back any minute, and then we can proceed with..."
After-the-fact accounting for all 'sensitive' actions would probably be more practical for most business needs.
I'd put a wizard in front of the thing that grants the access token to figure out the purpose and scope of the token needed.
Information request: "Rider History"
User: current caller
Scope: Between 9 AM and 11AM today
Reason: Lost an item this morning, need to lookup driver
If you were fancy you might even be able to convert the wizard's contained information into a request against the backend. Select trip.driver, trip.time from trips where user_id={caller_user_id} and time={9:00-11:00 today}
It actually seems easy to argue against: the company might trust me, but surely not all the customers trust every employee. You're simply putting the customers first.
> I can think of more than one well-regarded unicorn where everyone still has access to basically everything
Hard to be sure, but GitHub support signed up for a friends-only beta of one of my products before I had told anyone outside my immediate circle about it.
Have you tried to debug the same two problems when you do not have access to the infrastructure, versus when you do have infrastructure access? It's a radically different experience.
In addition, even if no one needs access, a bad actor who writes the code can always put a backdoor in place and read whatever they want.
I agree with you. I worked at a BlueCross and when debugging front-end apps, and I'd often dig through the database to check to see what is in each table and what could be throwing an error. I'd often bright up a debugging tool called TeaLeaf that let me watch the entire web session (this was back in 2009; so the full DOM tracking we complain about now has been available for quite some time).
Technically, we couldn't see any Group44 information (BlueCross employees) without special permissions. But we had to build that logic into all the debugging tools, and if I wanted to, I could query the database directly.
We were all covered by HIPPA, so we could face serious fines over leaked personal medical information. And honestly, the security and audit team were actually pretty good; often identifying Group44 issues and they'd have SSH and su log scanners to constantly check for unauthorized access on servers.
It's all about trade-offs. Lyft is still a growing company and they probably just need to implement some more automated auditing. And to be fair, these aren't medical records. It's going to become an issue if a customer service rep gives anything to the police without a warrant though.
That you casually talk about breaking the law with incredibly sensitive data is shocking. And worse still, you built in a system that circumvented the official policy, simply because you weren't capable of doing your job.
You basically built in an illegal back door. If you were my employee I would fire you on the spot for gross negligence when I found that. Or maybe if you were a junior you'd be given a massive dressing down, put on notice, re-sent on all the mandatory company tutorials about sensitive security of data and an urgent fix put in to remove the backdoor.
This is just bad engineering, if you can't recreate bugs without having to constantly resort to live data, get someone in who can. It's not particularly hard, it's a learn-able skill of our profession.
And if you're really having a problem, and it does happen once in a blue moon, you should have the ability to create an anonymized version of the live db that strips out all the personally identifiable info. That you can still only get with a signed, approved request and a full justification of why you needed it.
For the record I have worked somewhere that had highly sensitive data. And they took it seriously, unlike you two. None of the engineering team had access to the live db, the live site or the passwords, apart from some senior team members.
> This is just bad engineering, if you can't recreate bugs without having to constantly resort to live data, get someone in who can. It's not particularly hard, it's a learn-able skill of our profession.
In any large enterprise, you're probably working with data spanning multiple in-house databases, vendor applications, message buses, responsible teams... I can develop a greenfield system that makes identifying bugs trivial without accessing production data. Unfortunately, the real world is rarely that simple—especially for any enterprise that has grown organically over time (i.e., all of them).
My employer recently retired some systems that were older than me (I'm in my mid 30's), and the organization is simply too large to be understood by any one individual.
Wat? No, we build the system to block group44 access to all our own diagnostic tools for downstream users. I'm just saying we could also get around it because we build it and had direct db access. We didn't though, or at least I didn't.
I did not violate the law at all. We might have had to look at peoples' health claims information to debug issues, and that's fine, so long as we never disclose that data.
That was a decade ago, so they might have tightened things up. Still, at some point, some people are going to have access to all the data (A DBA is not going to be very effective at his/her job if they don't have admin access on production).
The best approach is to make it the exception not the rule. This ultimately protects everyone and provides a nice paper trail in the event something does happen and needs to be audited. You can partially mitigate the bad actor problem by putting safe guards in place to make sure all changes have to be peer reviewed before shipping.
What's the need for anything? Your argument seems to be predicated on a very specific definition of what need is which is somewhat at odds with most of the rest of the world.
Can you actually refute the parent comment's argument? Because it seems more than reasonable to me.
Analytics and Engineering definitely don't need this level of data access for any sort of day-to-day work. I work on analytics tools, and at best, anonymized and generalized data is needed, but never specific customer data. We specifically strip out any PII on data that might reach developers and need to request permission to access any sort of customer data (and I haven't had to do that in a long time).
CS/T&S/Insurance might need per-customer data, but they also don't need blanket access to all customers' data. If they need to handle a specific customer's data, there should be a request made that is logged and audited, and ideally with customer approval.
It is probably possible to design systems to avoid access, but they will get more complex. Engineering has to debug bugs. For example, suppose there is bug where the rate calculations aren't working for certain types of routes. The engineers will want to look up those routes to understand what is causing it. If you are designing an algorithm to detect to fraud, you are going to want to look at cases of fraud to understand how to design the algorithm. Further, if you want to do usability testing you are going to need to test with. You might want to check the different types of names used in the system to make sure they display properly. You may also want to sample the list of customers to user-test with live data or survey customers.
> For example, suppose there is bug where the rate calculations aren't working for certain types of routes. The engineers will want to look up those routes to understand what is causing it.
Then show routes without names.
> If you are designing an algorithm to detect to fraud, you are going to want to look at cases of fraud to understand how to design the algorithm.
Then show names without routes.
> Further, if you want to do usability testing you are going to need to test with. You might want to check the different types of names used in the system to make sure they display properly. You may also want to sample the list of customers to user-test with live data or survey customers.
Use a library that can generate realistic but fake data. I feel there is no excuse to not compartmentalize. If data security is not important to the business...well it only takes one bad article like this to cast doubt on the whole company.
The trouble is, if you're at this level of engineering, you're probably going to be writing the isolation layer. So you already have access to the raw data and you're just going to either make things harder for yourself or only be designing for downstream security.
You can and should restrict customer service reps (only allow them to access routes/users/drivers who they have active tickets on), but at some point you're going to need to trust your developers since they can usually just query the database directly.
As I said in another comment, there's absolutely no need for an engineer to have access to live data in normal circumstances.
Learning how to recreate bugs without trace stacks or live data is a debugging skill you can learn. Often it's as simple as following the steps described in the ticket, something some developers seem to not realize, or their reading comprehension is bad.
For really complex bugs, you might need some sort of access to see the specific conditions, but it should be attempted with an anonymized version of the db that you had to request and get signed off on.
If you then really need to put in tracing, if should be temporary, the data access should be heavily restricted and it should be deleted/removed once the bug is fixed.
> Often it's as simple as following the steps described in the ticket, something some developers seem to not realize, or their reading comprehension is bad.
That sounds like a pretty simple set of bugs you are dealing with. I don't think anyone is arguing that they need this level of information to solve "When I click this button it crashes".
For debugging, you generally can access things like server logs, which would already have a lot of the data you might need. It'd be behind a session ID or some other type of anonymizer, but the customer can provide that and other debugging information so you can look up their entry. It might be a bit harder for mobile apps, but there's plenty of telemetry products that provide crash dumps (with many of them supporting the stripping of PII). You definitely don't need always-on access to all customer's data to do this role, or even access to their PII data for the purposes of debugging.
Rate calculations 100% do not need specific data access. All you need to do that job is to have a generalized set of data based on the routes in question. You don't need to see that Joe Smith in SF took a route from A to B and what went wrong with it, you just need to see what all routes/rates from A to B were, and then from there look for anomalies.
Fraud is a bit more tricky, but if you already know what you're looking for, then you don't need the specific customer data set, you just need to know what deviations from the norm a general fraud request had.
Usability testing should be done with completely fake data, preferably created by someone with knowledge on how to do that specific job. This one is by far the easiest one to argue against needing access to real customer data since most places already have this type of fake data created specifically for this purpose.
Overall, none of your examples really need data access. Sure, it'd be nice to have access to it for some of these points, but it'd also be nice to have a million dollars. It doesn't mean you can't do your job if you didn't have it.
A balance has to be made. The parent remark does not show an understanding of the realities of working on a service on all levels. There are many times when developers need access to specific customer data. Pretty much every company has in their TOS verbiage to note that employees may have to access customer data without their specific consent. That said, companies can and should have multiple layers to help prevent abuse of customer information:
1. Access levels: Different employees need access to different types of information. Access at a specific level should be approved by at least one individual, often the employee's manager, but may require multiple layers of approval, depending on the sensitivity of the data. In some cases, background checks may be required. Access should need to be reviewed any time your job duties change.
2. Separation of networks: Customer data should not be on your normal company network except for specific approved instances (such as when a customer sends a file to support that they can use for testing and pass to developers if needed). It should not be possible to pull information from the network containing customer data to your company network, but it might be necessary to be able to push data over.
3. JIT access: Access to the network and systems containing customer data should require elevation. Access to systems and data on this network should still be subject to your access level (an example would be being able to access unscrubbed logs that may include some private information, but not be able to otherwise access data a customer has entered or uploaded.
4. Auditing: There should be an audit trail of who approved an employee's access levels and when. Access to customer data should always be audited, as should access to unscrubbed log files. Other access may need to be audited based on legal requirements or a company's stated commitments to their customers. --That part isn't an absolute, and while more audit logs are generally better, in some cases it may be too much noise.
5. Encryption at rest: Customer data should be encrypted when it is not in use. This might be file level encryption, database encryption, something else, or some combination.
6. Encryption in transit: Customer data should be encrypted while it is being transported to or from the customer, between networks, and preferably within networks.
This stuff can be difficult to do, but by the time you're as large as Facebook, Uber, Lyft, etc., there's no excuse not to be doing it. You can bet that while the implementation differs, Amazon, Google and Microsoft are also doing this.
But I do have understanding. Because I've actually worked in a company with these safeguards. It handled extremely sensitive data of tens of millions of Britons. Normal engineers had no access to live data, 2 senior managers were the only ones in the engineering team with access to live passwords, etc.
It is practical to put extremely heavy restrictions in place between engineers and the live data and they can still do their job. Our normal day-2-day was not impeded in any way.
I only worked there 3 months for other reasons, but regardless of my view on other parts of their operations, their dedication and practical solution to protecting customer data impressed me.
It seems to me "need" is very specific almost by definition. One needs access to only that which is fundamentally required to perform work. That need in most cases is actually quite constrained. Often it is far smaller in scope that some might like.
It isn't needed all the time but the one scenario I can think of is for debugging. Sometimes it's hard to dissect or even replicate a bug without the original data.
Discussed elsewhere in the thread, it's definitely not needed day to day.
The number of devs popping up in this thread who think routine access is needed to live data just to be able to debug is quite worrying.
In the rare instances it is, an anonymized version of the live db should be used to recreate the bug. If this doesn't work, then you might put in temporary logging accessible to only approved people, which is removed after the bug is fixed and the logs deleted.
To be a bit more explicit, I think small companies have a justifiable reason to be a bit more relaxed about live db access, but once you get to having larger teams of people, you're under a huge risk of a malicious actor. Which is easily mitigated by restricting access.
From reading all of your comments here, it occurs to me that you might just not be talking about the kind of software that involves actually processing user data as the service. When customer data is incidental to the core engineering work, then yes you should be able to abstract it away. When the service is "upload your data and we will perform analysis on it and give you results", then it really does require that data to figure out why this customer says their results aren't making sense.
Yes I agree which is why that was the first sentence of my comment, which was in response to your comment which stated,
“No-one needed access... Engineers never did.” It’s great that you changed your position, but please don’t make it seem like I was advocating for unfettered, 24/7 access to customer data when I clearly didn’t write that
How about plain old abuse? People using services to break the law, particularly crimes with victims? Safety risks?
- An uber passenger sees their driver has a gun in the cup holder; they report it to uber.
- A Square merchant is using Square to launder serious money and Square catches it.
- A Dropbox user is uploading child pornography that indicates active child abuse.
In these situations, you think the company should consult the user first before they take a look at PII? Or ask a senior manager? The former is laughable, and the latter is not scalable. Senior manager clearance might work at a small or even midsize company, but for a large tech company abuse happens thousands of times a day. Review must be operationalized.
I would love to see a law that could strike the right balance, but I don't see how it's possible. Accommodating both small and large companies would be very challenging If you mandate the kind of structure that big companies would need, it could cripple smaller companies operational budgets. If you mandate the rules suited for small companies, it fetters large companies at scale. And then how would you enforce it? Another regulatory agency with audit authority?
This kind of regulation is better served by the market, imo, and we've seen that with Uber. In regions where Lyft is a viable option, it has seen significant business increase in the wake of Uber's many scandals. The key is sunlight, which is usually cast by 1. journalists 2. whistleblowers 3. EFF and other watchdogs.
Pretty simple: some departments have access but everything gets logged and audited. If you cannot connect a request to a ticket, you'll get questioned. If you abused your access, you'll be fired immediately. Other industries handle it that way (e.g. banks). I know enough people in banking and know that there's no chance they would ever risk looking up my accounts.
Of course there are exceptional circumstances where access would be granted by a manager without customer approval or routinely by extremely specialised abuse teams.
What I'm talking about is general day to day access to customer data. The analytics, customer service, engineering, etc. teams. It's a different discussion you're trying to have.
These laws already exist as part of Sarbanes-Oxley but aren't as strict as what you're proposing. Certain public companies are required to implement safeguards that prevent most employees from having access to customer PII(personal identifiable information). Non-public companies don't have to comply with SOX regulations but maybe some of them should be expanded to include large private companies.
Someone I know was just commenting that from convos w/ people in other companies, it seems many startups have benefitted from not being under the limelight, and thus had the chance to quietly clean up their own messes while Uber was taking all the heat from the media.
Does anyone else find this whole thing more than a bit hypocritical?
Lyft is always telling you they are the better/ethical/woke ridesharing option - now their employees are reporting a culture of abusing customer privacy. I wish they had taken a look at their own practices before running million dollar ad campaigns full of celebrities celebrating their ‘wokeness’.
Much of the criticism against Uber concern top-level decisions, such as the creation and use of "Greyball", or actions and decisions from Kalanick himself and other top executives. Lyft maybe deserves blame for not building better and more automated audit systems and policies after knowing that these lookups could be abused.
I worked there. I was an engineer and definitely needed access to these data. Fraud and abuse is constantly evolving and touches every part of the business. Everything was audited and I never saw or heard of a single abuse of access. Privacy was talked about seriously at onboarding and other trainings. I have no doubt if somebody was caught abusing this they’d be fired.
Why do you need access as an engineer? Databases for testing should have all sensitive information removed (you can still debug errors). I cannot think of many cases where an engineer will need to have read access to the production database.
It's really difficult to audit all this stuff. I was at a health insurance company where someone used su to go to a different user on a box they had root on and it did get picked up by the security team, but only a few weeks after it happened.
I was offered a security job at one shop and turned it down, keeping my development role. They had 3 security people for the company (total IT size was 500) and it involved a ton of log parsing, DDoS work, and they were starting to develop an internal white listing application tool. They wanted to bring me on because the desperately needed a developer to add some automation parsing the important from the chaff. (A younger me would have probably done this back when I wanted to be a Pen-tester. I only got interviewed/offered the position because I made the mistake about talking about going to Defcon on a company Slack channel and the security guy insisted I interview).
Whether it’s Uber or the NSA stories of staff spying on people for a variety of reasons... it always comes down to people who seem to have access to things that they probably shouldnt have gotten access to in the first place. Users should be protected by having their data encrypted and anonymized so no other human being (staffers, governments or hackers) can connect an ID to the data. This way they can still access the data and use it for what ever work related purpose, with less risk of these things happening
This works until you need some kind of ombudsperson. At some level the data needs to be accessible and audit-able, otherwise what am I to do if my driver just drops me off at a different place than where I asked, or doesn't pick me up.
You need to know that I was in their vehicle, otherwise how can they charge me if I ruin their car. You need to know they were my driver.
There absolutely should be data privacy guarantees that are as strong as possible. But "encrypt and anonymize everything" doesn't work. (edit: and note, I think this is an unfortunate truth, but still a truth).
Obviously when you make a support request your record should be displayed for the customer service agent, but this is the other way around where they can seek out people. I don't think that's a valid use case and there's the obvious abuse case.
Right, but the solution to that isn't 'encrypt everything', its 'define reasonable (and this definition may vary, but its certainly not "none") access controls for user data and pii'.
You could completely anonymize when certain key variables are met. In your example, when the ride is successfully completed and both parties confirmed this, the data can be anonymized.
Sadly there is money moving on both sides of this transaction. Depending on the state/country, there may be requirements to retain this data in a certain way or to report it to the government.
When you are moving money (especially when you are moving lots of money) you start having to deal with KYC, risk models, and all other kinds of fun.
> At some level the data needs to be accessible and audit-able, otherwise what am I to do if my driver just drops me off at a different place than where I asked, or doesn't pick me up.
You're seriously arguing that cryptography has no technical means for a driver to send an unrepudiatable attestation of intent to drive from A to B at time C to an anonymous passenger? And that an ombudsman armed with driver GPS data cannot compare the attestation to the GPS data and instead needs identification data on the passenger in order to verify whether the driver went to the correct drop-off?
> You need to know that I was in their vehicle, otherwise how can they charge me if I ruin their car.
By using one of the growing number of datasets to figure out the person's identity after the fact. E.g., same thing you do if you get in a fender-bender or get cut off by a cyclist.
You can even have a multi-hour buffer for an in-vehicle cam which you consult if somebody ruins your car. But the likelihood of someone ruining the driver's car without the driver noticing is so unlikely that there is just no way it justifies collecting and keeping a company-wide database of identity data on every single passenger.
> You need to know they were my driver.
Who is arguing for driver data to be anonymous? I think nobody.
Same for GNU Taler, where the merchant data isn't private but the customer data can be.
In fact, same for E-cash from the 90s. Look up blinded tokens, they are quite fascinating from a technical perspective.
> You're seriously arguing that cryptography has no technical means for a driver to send an unrepudiatable attestation of intent to drive from A to B at time C to an anonymous passenger? And that an ombudsman armed with driver GPS data cannot compare the attestation to the GPS data and instead needs identification data on the passenger in order to verify whether the driver went to the correct drop-off?
There are at least three things you need here:
(1) the agreed trip, and
(2) that the passenger showed up and got in,
(3) that the passenger did not make some demand that ended the driver's obligation to make the agreed trip en route.
What do you do when you pay with 20$ for something, but get change for 10$? Why are "argue with them", "accept the loss and move on", "karate chop" and infinite other things not among the options?
> You need to know that I was in their vehicle, otherwise how can they charge me if I ruin their car. You need to know they were my driver.
How did taxi drivers handle that for the last nearing 100 years? People before us managed sticky situations without destroying human civilization, so can we.
Well, back in the old days if we wanted to complain we'd write it down on a piece of paper, wrap that piece of paper inside another piece of paper, and then put that in an unlocked box out in our yard.
That's similarly how we'd order products, out of magazines. Only in that case on the paper we'd include things like our bank account information that we'd put in the unlocked box in our yard.
I'm not suggesting that things can't or shouldn't be better. I think it's just important to have a realistic perspective on where we are in this continuum of service vs privacy.
I generally agree with you, but I think it should be pointed out that unauthorized access to that unlocked box carries severe punishment. A big part of the problem with things like this is that not only are there no controls preventing access to private info, but there are also few if any consequences.
How much control do you think there was in the businesses that got those wrapped pieces of paper with your banking account information? Sure, there are penalties while it's in the box in your yard, but not really once it was at its destination. I bet it's much more strict today than it was a few decades ago.
> What do you do when you pay with 20$ for something, but get change for 10$? Why are "argue with them", "accept the loss and move on", "karate chop" and infinite other things not among the options?
Well, some of them are illegal for the aggrieved party to employ, and in any case not providing better options than that list, for a party facilitating and profiting from the transaction, who is thus not an uninvolved third party, will greatly increase the risk of legal liability and PR damage.
Because otherwise people, on forums such as this one, will argue (loudly) that the information is clearly available in $ABC's systems and refuses to act on it. And I'm not sure those people are wrong.
> if my driver just drops me off at a different place than where I asked...
"Hey, this isn't the right place. Take me where you said you would please."
> ...or doesn't pick me up.
"Hey gran, I'm going to be late for lunch, the darn taxi driver hasn't turned up. I'm calling another firm - guess I'll see you when I get there."
> otherwise how can they charge me if I ruin their car
They will prevent you from departing, and demand cash. If you refuse, they'll either call in law enforcement, or in some places, call in several of their colleagues. In either case, they get their money one way or another.
Look, I'm deliberately being a little flippant here, but none of the problems you outline are in any way unsolved. I'm struggling to think of any situation where any of this data needs to be recorded, never mind viewed later.
Human interaction has dealt with all these obstacles for millennia, and new ways of mediating it don't turn them into new obstacles.
I don’t completely disagree but there are some ways to handle at least parts of what you’re talking about. For example, either persistent or ephemeral pseudonyms (yet still uniquely identifiable, at least for a time) and threshold decryption if at least 2 out of the 3 parties agree to some kind of privilege escalation (most likely would be either lyft and rider or lyft and driver, but rider-driver is interesting).
> Is your issue with Lyft's business model, or my
> comments about encryption?
Both, as it goes :)
I think you're using the word need in a much more narrow context than I am. Without Lyft/Uber/whoever convincing their customers that they're needed, there is no need for the data to be recorded in the first place.
And most of the time, you'd still get to Gran's when you said you would.
I know very few people who would tolerate not being able to get a refund. I know many who would end up giving up fighting for it, but would never use the service again.
And, lets bring it up to something more relevant. The driver steals from or assaults a passenger. Having that data would be key in actually bringing charges against the driver.
The screenshots from the leaker mention that they are using "redshift", which is the name of Amazon's RDB product. Which means this is about people who have access to the database. This is unsurprising that they could access customer data given access to their database. I'm not sure how you prevent this without preventing access to the db (and there are legitimate reasons people within the company would have access -- DBAs, Engineers who are writing direct queries, etc).
Not defending people accessing PII, but just saying there are legitimate reasons why someone could have access.
Even if all their data is within Redshift, you can still control access in a way that is more fine grained than just "everyone has access to everything"
Eh, I think the only thing that works is you kill your velocity and institute a bunch of processes, and bureaucracy or put up light gates and hire people that you trust to do the right thing. It sounds like they have some gating, but perhaps they need better auditing, and logging?
Anyone who can access that database can _probably_ deploy code too. If you can deploy code, you can sneak in whatever you want.
You can't prevent access but you can log all access and require a written reason for access. That, followed up by routine audits of access logs will reduce and discourage abuse as described in the article.
This is about Redshift, Amazon's cloud-based data warehousing tool. Auditing and logging every individual access made by data analysts, engineers, and others making use of Redshift would make their jobs impossible. One day of queries would take weeks to audit and validate a legitimate use case for all the individual data that got touched, and if you're just going to say "oh they needed everyone's PII because it was a big analytical query like they do all day", you're back at square one.
The reality is that some people are going to need wide-reaching access. You could monitor for certain problematic access patterns, like someone who is supposed to be doing primarily aggregate queries doing a lot of specific ones, and I'm sure that'd be a good thing to do, but to be honest there are probably much higher priorities since employees who need sensitive access are probably going to be able to avoid that type of detection.
Redshift doesn’t have column level access permissions but it does have schema/table level access permissions. That is more than sufficient to prevent this kind of issue.
You can do all development, testing and debugging on databases that have sensitive information removed. Just replacing names and email addresses goes a long way and shouldn't make a difference for the engineer. If there's a bug with a certain ride, the ID for that is enough to test.
Read access to production data is only necessary in very few cases and can be heavily audited.
I can't even imagine how much PII is sitting in AWS Redshift from various startups... It's brutally slow if you don't tune the encoding types _extremely carefully_, which no one ever does. One has to wonder, if this is such a big deal, what the access rights from AWS' perspective looks like xD
Imagine the data team convincing the rest of the company that they're doing anything to make their redshift _slower_. Encryption? People want their graphs, damnit!
What circumstance would be needed to have a view where an employee can find riders by name and look at their whole history? If there is a complaint it should allow the customer service agent to see the ride and perhaps some history (ratings make sense, but including full locations/times seems unwise), but I can't think of a reason why this would ever need to be a process started by a Lyft agent and not the customer or driver.
This kind of system usually comes into place because an organization combines a number of different pieces of software to make a cohesive whole, rather than designing a single all-in-one solution with perfect foresight.
While the customer can open a support ticket in the ticketing system, that system is probably not directly connected to the service that allows support agents to view ride data. This may be because either of these systems weren't (or at least weren't initially) built in-house, or just because this kind of thing is harder to implement and wasn't a design priority back in the day.
So a support agent sees a customer issue, and then goes looking for relevant data in another system. If the issue ends up being a bug, and that bug seems to depends on customer data, engineering might need to look at it in order to reproduce and resolve it. It's great if the support person or engineer can look only at the very narrow slice of data they actually need to address the problem they're working on, but often that's not possible (for example, the system could be set up to tightly control ride history access, but once you've been granted that privilege, you can access the _entire_ ride history for that customer.)
The company ends up creating policies for data safety, and teams to enforce those policies, but there's always a balance between being able to quickly triage and address a problem (be it a customer reporting that they feel unsafe and need help quickly, a hard to find bug that only seems to happen for people with last names having certain accented characters, or a customer complaining that certain parts of their ride history have mixed-up addresses) and having to go through established channels to justify your need for access. And it's hard to perfect this process.
I'm not claiming any of this is ideal, and all in all it sounds like some of Lyft's policies were too lax or too loosely enforced. But I definitely understand _how_/_why_ it could happen.
I used to be in the habit of taking my cat to the vet in a regular taxi, with a local company. I always got the same guy and the same car as they always 'knew' where I was going. The guy I got liked cats, didn't mind waiting around and made sure everything was looked after. Others were allergic to cats or only doing airport trips, so I had no problem with them looking at my history and doing their best for me.
That’s great for you, not great for say someone who’s being stalked by an abusive ex. It’s such an obvious and foreseeable problem, they need to own it.
Not really. Uber created the Hell Map, Greyball, and had the two harassment memos. This just seems to be a failure to actively monitor their audits and do permissions correctly .. something that Uber got hit with and fixed too.
Lyft just needs to deal with this issue now that it's public, not it's not specific to Lyft. It's really difficult to effectively scan for abuse, even when you audit everything. I've seen similar issues at many companies.
This Lyft situation might be more oversight where Uber seems to be more malice.
Disagree. This press report, rather, tries to piggyback on the Uber narrative. Hey, it got me to click!
It's clear from the stuff in the report that Lyft prohibits misuse of PII. The substance of the article is "sometimes people violate our prohibition." The substance of Uber's problems is "our executives and everybody else have free rein on the PII we collect." It's different.
I was talking about this with my girlfriend at dinner tonight.
While we were eating, I noticed a few cameras that got a view of the whole restaurant, and wondered: of course filming the restaurant might be useful in case of a robbery (?) or for insurance, etc., but what are the chances the minimum-wage employees that checks those DON'T use it to check out hot women or embarassing stuff that happens from time to time..?
With us being tracked with data and video all the time nowadays we are safer--or the companies that have the footage are safer--but of course there is a lot of room for abuse and improper usage.
Perhaps not actively, but I bet in case of a hot woman or something weird going on they'll know where the monitors are.
I also have two separate family members that have businesses with cameras, and they both look at what's going from an app on their iPads while they watch TV in the evening (actually, having seen them do it is probably why I know videos at businesses are not handled properly).
That's the whole deal with the "creepshots" deal on Reddit/Tumbler/etc. The subjects are in a public place. If someone does candid shots of lots of different types of people in a city and puts them in an art gallery, they're often considered art (even if 10% ~ 15% of the photos are of really pretty men or women that might be taken as being sexual). However, if you have a site that's only attractive men/women in public settings .. then it's not art ... or is it?
It all comes down to context. Legally it's fine to take public photos of people in many countries (and probably should be, because we don't want to go down that slope). Is it immoral? Well, that's another issue, and that depends on the context the photos were taken in. But you have to set a line somewhere, at least with the legal concerns. You might not like it, but it's just a part of being in a free society you have to deal with.
Now if someone takes a photo in a private area, like a bathroom or spying into a home, that's a different legal issue (unless it's looking up a skirt in Texas[1])
Sure, but it was just an example. What about if I was with my lover, and I didn't want my wife to know? If someone from the staff knew my wife, how do I know they won't send the video to her?
What if I was drunk and tripped or made a fool of myself in some other way and I didn't want that to end up on YouTube or /r/PublicFreakout on Reddit?
Etc., etc.
We just don't know. Of course I could sue the person if what's been done is illegal, but it's a video and the damage would be done and hard to repair.
Well that person who knows your wife probably only needs their voice to tell your wife...
From what I’ve seen, there may be a display in view of the staff and/or public. But the DVR and controls are usually locked up for the simple reason the owner doesn’t want to have to pay to call in a tech every time an employee monkeys with the system.
Protection of PII has, for a long time, been a central tenet of USA-based health care IT (due to the HIPAA / ARRA-2009) regulations.
It's possible to do that fairly well, and still leave need-to-know exceptions. (The subsititute nurse on the intensive care unit needs to know if a particular patient has Crohns disease, for example).
My point is, PII CAN be protected reasonably well. It takes executive will to do so, and training, and monitoring.
I worked in a hospital for a while. They had good training on how to avoid misusing PII. It starts with "don't look up your ex or your senator" and goes into ways to keep patient data safe.
When there IS a leak HIPAA-covered operations are obliged to disclose it. See here for the catalog of recent disclosures.
Doing privacy right is systemically possible. But it's a systemwide task, not just a one-off training or audit.
(Now, we can talk about whether HIPAA's main point--preventing insurers from abusing patient data--is working or not. Your doc makes you sign a permission slip letting insurers see your information, so you waive that protection in return for reimbursement. But that's a different issue.)
> PII CAN be protected reasonably well. It takes executive will to do so
These is where the EU's new data protection law (GDPR) could help. It has large fines, and the ability for NGOs to sue you on behalf of users. When the alternative is a big fine, it's easier to find the will.
I happen to know that the former country manager of Uber in Malaysia used driver personal information to bully and then swindle a gig worker at another (but not competing) gig-economy startup.
Long story short, he pressured the worker into doing a gig for free if he could guess his birthday. Not quite a fair gamble considering he knew the guy's birthday from when he'd signed up as a driver for Uber.
This would still be a terrible misuse of confidential information if he'd stopped at the poorly executed joke, but the highest ranked Uber employee in Malaysia instead insisted the marginally employed freelancer hold to the agreement.
Can unfortunately confirm that friends at both Lyft and Uber have in the past known my ride history. I admittedly had to push a bit jokingly for either to look it up, but the fact that it is even possible for insiders to access internal production databases makes me suspect this problem is far more widespread than just at ridesharing companies.
I wonder who at Fastmail can read user emails? Who at Heroku can access my code or ENV secrets? Can bank employees see recent transactions, bypassing ACH verification deposits’ “Security”?
Sad how rare end to end encryption is as a feature in 2018.
Many companies have userdata access heavily restricted and audit every access. It's not that hard to log all queries to production that don't come from the system. And if a developer runs queries against the production database that should pop up somewhere immediately.
Not different in banks, any lookup of customer accounts is monitored and checked. Looking up a friend's account will get you fired immediately (or worse).
I thought 'staffer' was only used when talking about people working in government or for a political party. Aren't 'staff' or 'employees' just as good for the headline?
> I thought 'staffer' was only used when talking about people working in government or for a political party.
Nope.
> Aren't 'staff' or 'employees' just as good for the headline?
No. Staff has a different connotation; as a mass noun, it implies things that are true generally* of the staff. “Employees” would be about as good as “staffers”; the latter is shorter, though, which often is preferable in headlines.
Very very few people I hope. There is no economic incentive for companies to invest in secure design/architecture. The costs of building it right almost always lose to other product owner features and timelines.
Until we can actually get incentives aligned, companies will continue to build leaky, poorly designed apps with no layered controls.
Yes, someone will jump on me and say "but my company does is right!". Great for you! Share your knowledge! You are a snowflake, and probably fortunate to have ethical leadership! The rest of us are doomed.
None of the data that was available sounds like sensitive PII so I'm not sure why anyone would be surprised by this. I would probably think that rider/driver feedback isn't PII at all.
I suppose it might be a bit questionable if Lyft was creating and providing tools to make it easy to look this stuff up and promoting it within the company but that doesn't sound like the case either.
Not all PII is inherently "sensitive" though. Meaning not everything that can be used to actually identify you needs to be encrypted and protected. I don't know for sure but I don't think names or addresses qualify as that.
I absolutely would say it would, especially where there's a very good chance that the home and work addresses are part of that list, and the idea that someone would use a database like that to spy on an ex and harass and/or assault them is an actual thing that happens.
“This was said to be used to look up ex-lovers, check where their significant others were riding and to stalk people they found attractive who shared a Lyft Line with them... One staffer apparently bragged about obtaining Facebook CEO Mark Zuckerberg’s phone number.”
This is a shame for obvious reasons but I don't like the comparison to "god view". Uber designed a UI around accessing PII for their employees to use wanton, Lyft sounds like they need to get their data access under control, stop making excuses for why anyone "needs" real user data, and actually run audits.
requiring a second person to verify/supervise access to confidential data solves a lot of this problem (at a cost though). like it doesn't prevent a strong adversary from getting access to data they shouldn't but it prevents normal people from abusing power they have stumbled upon. require two logins + audit and a lot of the abuse goes away. people are less inclined to lose their job because their colleague wants to check up on their girlfriend.
I don't know if I blame the company for this. Anecdotal queries are useful debugging tools. Or makes more sense for employees to just have some sense of decency.
This is the biggest point I've been making to all my friends. What they believe Uber is so evilly doing, it will probably show that companies like Lyft and Didi and Grab, etc are all engaged in the same sorts of actions. Uber is the whipping boy for ridesharing, but I'm willing to bet that they are do basically the same thing.
My question for Steve Yegge is:
He seems to somehow believe that Grab has the moral high ground over Uber and Lyft. What is he going to do when he finds out that Grab behaves in exactly the same manner as Uber?
- all inbound API requests first go to our API proxy in the secure layer.
- the API proxy encrypts all PII using the encryption service in the secure layer
- then API proxy sends the request on to the appropriate service, having swapped all PII for tokens.
- all services in the general layer are not able to talk to the encryption service to decrypt data.
- thus all data that would normally be considered very sensitive (e.g. credit reports) can be stored or passed around the services because values like SSN and others are tokenized.
- our services with UI's like our CRM, wherein the customer service rep needs so see the data unencrypted, works perfectly bc the response from the API proxy outbound decrypts the data, but selectively so based on the person's permissions through our abstracted auth layer.
Thus in the Lyft example, the majority of employees could have access to "God view" but with all the PII encrypted (so they couldn't search their friend's account by email, for example) but they can still look at rides and transactions, while just those who need to see the decrypted PII could be given those permissions.
Of course, this assumes that the encrypted PII is sufficient for anonymization. If you can look for all rides within a block of an address in their God view, then you could quickly figure out which person was your friend by narrowing down to rides originating from his house and his work. But again that comes down to properly limiting certain search capabilities in the UI.
I don't get why more companies don't follow our approach. I've seen way too much personally sensitive data in plaintext in databases over the years.