This is exactly what is wrong with ametur machine learning today. People think t...

zappo2938 · on March 23, 2016

The volunteers are going out in the field with mobile apps to record all the data. They have a very large complete data set over years. Here is group of people with no programming experience and they already use this data in unsophisticated ways to predict when a egg will hatch so they can send one of the many volunteers to the site to make sure the turtle makes it to the ocean. They don't have any money and they weren't asking for help. If I had time I'd do this myself to start with machine learning. I was suggesting that this might be a useful dataset for machine learning. Build a model on two years of data and there is still two years of data to test it on top of people with mobile devises collecting environmental and case data every day.

Seriously, isn't this a good example where machine learning might be useful? "I made a million dollars saving sea turtles," said no one ever.

nonbel · on March 23, 2016

It sounds like if the data is in a sane format, simply plugging it into xgboost would give something useable. If the data was available on their site I would have done that right now (assuming clean data, it would probably take <30 min). This wouldn't yield the best possible predictions, but would be much, much better than nothing.

nl · on March 23, 2016

Protip: Kaggle solutions are hacked together scripts deliberately designed to get as close to overfitting as possible without killing your private LB score. They are rarely hand labelled (though there is plenty of hand engineered features). Over engineering is rarely a problem!

(Source: Kaggler. Also the Kaggle blog)