I am always intrigued by a lot of these announcements that clarify that only 'so...

falcolas · on Aug 28, 2016

One very simple option would be to kill all queries which take more than a few hundred ms, or which return over n rows, and send an alert. Such queries are almost always slow, and tend to stand out amongst normal traffic.

Doing so keeps your db responsive against programmer errors and limits data exfiltration. I've been doing this for the first reason for years.

moonshinefe · on Aug 28, 2016

Not every user has access to every table or database. So if a company separates such things based on various categories, brandnamecars.com's hacker exploited account might get all the car users' credentials, but not the case for brandnametrucks.com, who had a separate table or database, and restricted the brandnamecars.com user account properly with permissions so it couldn't get that info, even if the same server handles both the databases or tables.

taspeotis · on Aug 28, 2016

Say there's two tables, users and user_preferences. Someone goes in, takes the contents of users (hashes and salts and all). Only some of the user information was obtained!

cyberferret · on Aug 28, 2016

I get it about normalised data spread across multiple tables - but usually (from how I interpret it), they seem to be talking about number of rows - i.e. "We think only 10,000 users had their information compromised...".

I believe in the case of the LinkedIn breach, they said that something like "less than 20% of their user passwords were leaked". I take that to mean that not all rows were exposed, but only some - that's why I am intrigued as to whether the query was shut off mid stream, or the bulk download of exported data was detected and cut off or similar?

arnaudlaudwein · on Aug 28, 2016

This is the case when attackers don't get access to the database itself - imagine they were able to listen to connections between users and front-end servers, and extracted authentication information. This would only concern users connecting during a specific timeframe.

In this post for instance, they indicate that attackers got 'sync users’ passwords' while storing only 'encrypted/hashed data'.

Other possibilities: they accessed a partial backup (or prod data used in dev), a caching system, a message broker (Kafka)...