I co-manage https://hackerone.com/googleplay and the top contributor there probably makes 5x - 10x of an average software engineering salary for his home country.
Not a lot of hackers care about Android app security so there's barely any hackers participating and little competition. Most apps have never had anybody do a security review.
Additionally the scope of the program is so wide that you can look through hundreds of apps from companies that have no security posture at all. Finding bugs is easy and payouts are more than generous.
Great summary! By nature of my job (eng lead of a major mobile malware detection team) I have a lot of startups pitch their ML solutions to me. A couple of thoughts:
- There are no publicly available data sets for training available. There are a few small ones and a few old ones, but they don't reflect the reality of 2016. Companies that approach me and pitch me solutions to the malware of 2012 are not useful.
- The majority of mobile malware is based on some kind of social engineering. On a code level these are indistinguishable from legitimate applications (the same APIs are used in the same fashion). The only difference is whether app behavior meets user expectations or not. Making this decision automatically seems intractable so far.
- Malware is not really a well-defined term. There is phishing, toll fraud, Trojans, privilege escalation exploits, ... If you generically look for malware, the signals you will look for are going to approach the complete set of APIs made available by your OS. Your results will just be a giant blob where everything is connected. Pick a single malware category and focus on just that at a time. ML signals for priv esc will look very different from those for phishing.
- ML is sexy. Malware analysis is not. Startups seem to hire too many ML people and not enough malware analysis people. I've had startups pitch to me that had literally zero people on staff who knew what mobile malware actually looked like. They just did anomaly detection and then tossed the results over to my team to verify the results. That's not how it works. We're not your QA team. :)
Hey, just finished a malware ML custom system for one of the largest european corporations, large enough that some malware is targeted at them. Result is 97% accuracy (they did retrain and check on their own held out dataset). More careful analysis is needed (many malware have high entropy 'zones' that may help the classifier find the right category), but overall it does work.
See the Microsoft / Kaggle challenge on classifying malware families, winning solution is > 99% accuracy IIRC.
Can you describe a security setting where 97% accuracy is actually useful? Unless the events you're looking at are low volume or you somehow have much more malicious data than everyone else that seems like a recipe for your results being primarily FPs.
For context, a company can easily get ~1B security-related events a day, so even reporting say 0.1% of those wrong a day means some poor junior analyst has 1,000,000 tickets to slog through. If you expand that to full packet captures as suggested in the article... ouch.
(We do some cool visual analytics work here, including unsupervised learning / classification, and target more of the problem of "given an incident you're already investigating, what else should you now look at from across all your tools?")
The 99% means little when it suffers from a similar sort of problem that the immune system has with cancer. Adversary's lack of stationarity vs a fixed model.
That's what the research under the banner security via diversity and "moving target" are doing. I recall the Hydra firewall from Sentinel did that sort of thing. OpenBSD and grsecurity do in OSS for parts of their OS. Such methods can be combined with these.
Interesting name. Reminds me of a security scheme, Symbiotes, I briefly evaluated on Schneier's blog. Injected security into legacy, embedded applications with various tradeoffs. Where did you get the name from?
Focusing on your last point: this is true in a lot of fields.
ML is amazingly powerful, but if you don't have sufficient domain knowledge, or you aren't collaborating very closely with actual experts, you can make very dangerous mistakes. Domain knowledge helps a lot - not just in malware, but in biology, image analysis, etc..
I've had the exact same experience with Vint (https://www.joinvint.com/) - Go through personal trainers until you find one you like and then move off the app. The hourly cost will be lower but the personal trainer will still make more money.
I've also had Uber Black drivers give me their personal limo service business cards. The difference is that a cab is a commodity while a personal trainer is something that needs to click on a personal level.
I see no future for Vint even though I loved it when I used it.
It's not just that rides are a commodity, they also have an immediacy effect. I don't care who drives me, but I care that they pick me up in the next 3 or 4 minutes. When an uber driver gives me a card that's great, but it's kind of worthless because they're unlikely to be nearby. Hell in the time it takes me to call them I can likely have another car at my front door if I just use the app.
Depends. I like being able to schedule rides to the airport in advance, so I know exactly when I'm going to be picked up. And the rate will be pre-determined.
The only situation where it's nice to have a known driver is when you are scheduling a ride in advance, i.e. to the airport or something. In those cases, I frequently call a day ahead to a driver I know.
Possibly, but not on the free tier. Anyway, our current plan can only handle 20 concurrent connections to Postgres and I am not planning to pay more to satisfy spikes.
Can you clarify on why not? If you get interest and a big news hit, isn't it important to you to capitalize on that as much as possible and leave a good impression so people come back to use your service?
If you don't have the funds that's one thing and I get it, but seems like spikes are exactly the kind of thing worth spending on.
The website a weekend project, not something I care for whether people use it. On average it has five to ten visitors a day unless linked to from somewhere big which only happens like twice a year.
From my impression, I don't think it's any kind of for-profit service he's running where people are paying for and expecting uptime. He's taken a public data source and make it very easy for people to peruse it, and sees that as his reward.
Heroku doesn't automatically scale it for you, though you have the ability to increase the number of dynos. There are services that will do that for you, but it's probably better that you have to explicitly scale so you don't get hit with a sizable bill you weren't expecting.
In any case, there are other constraints than just the amount of webservers that could be causing the outage e.g. the database.
Yep depends on the building. I had a single office at Microsoft, so did many others. Some shared in pods of 4-8 engineers. I don't recall any spaces with an open plan (except in the UK), but I also didn't visit every single building Microsoft had.
Not a lot of hackers care about Android app security so there's barely any hackers participating and little competition. Most apps have never had anybody do a security review.
Additionally the scope of the program is so wide that you can look through hundreds of apps from companies that have no security posture at all. Finding bugs is easy and payouts are more than generous.