What's the need for anything? Your argument seems to be predicated on a very spe...

mynameisvlad · on Jan 25, 2018

Can you actually refute the parent comment's argument? Because it seems more than reasonable to me.

Analytics and Engineering definitely don't need this level of data access for any sort of day-to-day work. I work on analytics tools, and at best, anonymized and generalized data is needed, but never specific customer data. We specifically strip out any PII on data that might reach developers and need to request permission to access any sort of customer data (and I haven't had to do that in a long time).

CS/T&S/Insurance might need per-customer data, but they also don't need blanket access to all customers' data. If they need to handle a specific customer's data, there should be a request made that is logged and audited, and ideally with customer approval.

gersh · on Jan 26, 2018

It is probably possible to design systems to avoid access, but they will get more complex. Engineering has to debug bugs. For example, suppose there is bug where the rate calculations aren't working for certain types of routes. The engineers will want to look up those routes to understand what is causing it. If you are designing an algorithm to detect to fraud, you are going to want to look at cases of fraud to understand how to design the algorithm. Further, if you want to do usability testing you are going to need to test with. You might want to check the different types of names used in the system to make sure they display properly. You may also want to sample the list of customers to user-test with live data or survey customers.

vageli · on Jan 26, 2018

> For example, suppose there is bug where the rate calculations aren't working for certain types of routes. The engineers will want to look up those routes to understand what is causing it.

Then show routes without names.

> If you are designing an algorithm to detect to fraud, you are going to want to look at cases of fraud to understand how to design the algorithm. Then show names without routes.

> Further, if you want to do usability testing you are going to need to test with. You might want to check the different types of names used in the system to make sure they display properly. You may also want to sample the list of customers to user-test with live data or survey customers.

Use a library that can generate realistic but fake data. I feel there is no excuse to not compartmentalize. If data security is not important to the business...well it only takes one bad article like this to cast doubt on the whole company.

djsumdog · on Jan 26, 2018

The trouble is, if you're at this level of engineering, you're probably going to be writing the isolation layer. So you already have access to the raw data and you're just going to either make things harder for yourself or only be designing for downstream security.

You can and should restrict customer service reps (only allow them to access routes/users/drivers who they have active tickets on), but at some point you're going to need to trust your developers since they can usually just query the database directly.

mattmanser · on Jan 26, 2018

As I said in another comment, there's absolutely no need for an engineer to have access to live data in normal circumstances.

Learning how to recreate bugs without trace stacks or live data is a debugging skill you can learn. Often it's as simple as following the steps described in the ticket, something some developers seem to not realize, or their reading comprehension is bad.

For really complex bugs, you might need some sort of access to see the specific conditions, but it should be attempted with an anonymized version of the db that you had to request and get signed off on.

If you then really need to put in tracing, if should be temporary, the data access should be heavily restricted and it should be deleted/removed once the bug is fixed.

newfoundglory · on Jan 26, 2018

> Often it's as simple as following the steps described in the ticket, something some developers seem to not realize, or their reading comprehension is bad.

That sounds like a pretty simple set of bugs you are dealing with. I don't think anyone is arguing that they need this level of information to solve "When I click this button it crashes".

mynameisvlad · on Jan 26, 2018

For debugging, you generally can access things like server logs, which would already have a lot of the data you might need. It'd be behind a session ID or some other type of anonymizer, but the customer can provide that and other debugging information so you can look up their entry. It might be a bit harder for mobile apps, but there's plenty of telemetry products that provide crash dumps (with many of them supporting the stripping of PII). You definitely don't need always-on access to all customer's data to do this role, or even access to their PII data for the purposes of debugging.

Rate calculations 100% do not need specific data access. All you need to do that job is to have a generalized set of data based on the routes in question. You don't need to see that Joe Smith in SF took a route from A to B and what went wrong with it, you just need to see what all routes/rates from A to B were, and then from there look for anomalies.

Fraud is a bit more tricky, but if you already know what you're looking for, then you don't need the specific customer data set, you just need to know what deviations from the norm a general fraud request had.

Usability testing should be done with completely fake data, preferably created by someone with knowledge on how to do that specific job. This one is by far the easiest one to argue against needing access to real customer data since most places already have this type of fake data created specifically for this purpose.

Overall, none of your examples really need data access. Sure, it'd be nice to have access to it for some of these points, but it'd also be nice to have a million dollars. It doesn't mean you can't do your job if you didn't have it.

tallanvor · on Jan 26, 2018

A balance has to be made. The parent remark does not show an understanding of the realities of working on a service on all levels. There are many times when developers need access to specific customer data. Pretty much every company has in their TOS verbiage to note that employees may have to access customer data without their specific consent. That said, companies can and should have multiple layers to help prevent abuse of customer information:

1. Access levels: Different employees need access to different types of information. Access at a specific level should be approved by at least one individual, often the employee's manager, but may require multiple layers of approval, depending on the sensitivity of the data. In some cases, background checks may be required. Access should need to be reviewed any time your job duties change.

2. Separation of networks: Customer data should not be on your normal company network except for specific approved instances (such as when a customer sends a file to support that they can use for testing and pass to developers if needed). It should not be possible to pull information from the network containing customer data to your company network, but it might be necessary to be able to push data over.

3. JIT access: Access to the network and systems containing customer data should require elevation. Access to systems and data on this network should still be subject to your access level (an example would be being able to access unscrubbed logs that may include some private information, but not be able to otherwise access data a customer has entered or uploaded.

4. Auditing: There should be an audit trail of who approved an employee's access levels and when. Access to customer data should always be audited, as should access to unscrubbed log files. Other access may need to be audited based on legal requirements or a company's stated commitments to their customers. --That part isn't an absolute, and while more audit logs are generally better, in some cases it may be too much noise.

5. Encryption at rest: Customer data should be encrypted when it is not in use. This might be file level encryption, database encryption, something else, or some combination.

6. Encryption in transit: Customer data should be encrypted while it is being transported to or from the customer, between networks, and preferably within networks.

This stuff can be difficult to do, but by the time you're as large as Facebook, Uber, Lyft, etc., there's no excuse not to be doing it. You can bet that while the implementation differs, Amazon, Google and Microsoft are also doing this.

mattmanser · on Jan 26, 2018

But I do have understanding. Because I've actually worked in a company with these safeguards. It handled extremely sensitive data of tens of millions of Britons. Normal engineers had no access to live data, 2 senior managers were the only ones in the engineering team with access to live passwords, etc.

It is practical to put extremely heavy restrictions in place between engineers and the live data and they can still do their job. Our normal day-2-day was not impeded in any way.

I only worked there 3 months for other reasons, but regardless of my view on other parts of their operations, their dedication and practical solution to protecting customer data impressed me.

traviscj · on Jan 26, 2018

Why does anyone have "access to live passwords"? That point alone calls into question all of their practices, to me...

sidlls · on Jan 26, 2018

It seems to me "need" is very specific almost by definition. One needs access to only that which is fundamentally required to perform work. That need in most cases is actually quite constrained. Often it is far smaller in scope that some might like.

s73ver_ · on Jan 25, 2018

I think it's more that the absolute "need" has not been demonstrated.