TensorFlow Implementation of Deep Convolutional Generative Adversarial Networks

zappo2938 · on March 23, 2016

I have an idea for anyone who is interest in getting their hands dirty with TensorFlow.

There is a very large conservation group in Fort Lauderdale that works with Sea Turtles.[1] Thousands of turtles are born on Fort Lauderdale beach every year, however, there is a problem because once hatched they move towards a light source. So they are crossing the road towards hotels with bright lights and street lamps at night instead of crawling into the ocean. What the volunteers are doing is collecting massive quantities of data to show how devastating artificial lights are to the baby turtles.

They have a massive quantity of data and they have a very good idea of when the eggs will hatch. They go out on the beach when the eggs hatch and pretty much make sure all the turtles make it to the ocean. Great program. They use data like air temperature and amount of daylight to try and figure out when the eggs will hatch so they are ready to usher them into the ocean.

They have years of data. They know the data is linked to hatching times. But, they don't have sophisticated models.

If someone wanted an interesting project to start using TensorFlow with I would suggest getting in touch with S.T.O.P. and request their data sets so see if a prediction model can be developed for when the eggs hatch. Being able to know when the eggs hatch would help the scores of volunteers who are out on the beach protecting sea turtles.

[1] http://seaturtleop.com/

argonaut · on March 23, 2016

I understand that it's difficult for people not experienced with machine learning to tell what problems are actually suited for what models, but using deep learning / Tensorflow for this is like using Hadoop to sort a 10 megabyte file of numbers. Or writing a new Ruby on Rails app and deploying to AWS in order to create a survey for a class project.

Plain ol' statistics, plotting, and some hand rolled features (this is bascially "data science") is probably the best fit for this problem, especially since it doesn't seem to me like this would be a massive dataset.

ska · on March 23, 2016

This is the big problem when an technique breaks through to name recognition in mass media. The last big one was SVM/kernel methods but it wasn't this bad. Now we have lots of people running around wanting to hit things with the "Deep Learning" hammer, with no real idea where it is appropriate (or worse, offering to do it for your company without much better understanding)

anewhnaccount · on March 23, 2016

Arguably those things might still be a useful way for one to make their first forays into Hadoop or Ruby on Rails.

phreeza · on March 23, 2016

This would also make a great kaggle competition. My feeling is something like boosted decision trees would do better than deep networks here (as is often the case on kaggle).

nl · on March 23, 2016

If the data is clean, then auto-sklearn[1] will probably do a pretty good job (Or just throw it at XGBoost. XGBoost will do better than TensorFlow for something like this)

[1] http://auto-sklearn.readthedocs.org/en/master/

bottled_poe · on March 23, 2016

How precisely does the timing need to be known? Minutes? Hours? Why wouldn't a webcam be sufficient for this purpose?

blaabjerg · on March 23, 2016

I imagine advance knowledge of the timing is desirable if the project involves many volunteers going to a beach to herd turtles at night. Allows for planning, doesn't require anyone to monitor the webcam at night, etc.

benhamner · on March 23, 2016

Can you send an introduction and a sample of the data? I'm interested in publishing it on Kaggle (b at kaggle . com)

babo · on March 23, 2016

Do you have a link to the data? I checked the webpage but haven't found it.

Chronic51 · on March 23, 2016

This is exactly what is wrong with ametur machine learning today. People think they can open source some data and expect a deep learning expert to come give them a model for free. Protip: the solutions coming from Kaggle are over engineered solutions with massive amounts of hand/human labeling.

zappo2938 · on March 23, 2016

The volunteers are going out in the field with mobile apps to record all the data. They have a very large complete data set over years. Here is group of people with no programming experience and they already use this data in unsophisticated ways to predict when a egg will hatch so they can send one of the many volunteers to the site to make sure the turtle makes it to the ocean. They don't have any money and they weren't asking for help. If I had time I'd do this myself to start with machine learning. I was suggesting that this might be a useful dataset for machine learning. Build a model on two years of data and there is still two years of data to test it on top of people with mobile devises collecting environmental and case data every day.

Seriously, isn't this a good example where machine learning might be useful? "I made a million dollars saving sea turtles," said no one ever.

nonbel · on March 23, 2016

It sounds like if the data is in a sane format, simply plugging it into xgboost would give something useable. If the data was available on their site I would have done that right now (assuming clean data, it would probably take <30 min). This wouldn't yield the best possible predictions, but would be much, much better than nothing.

nl · on March 23, 2016

Protip: Kaggle solutions are hacked together scripts deliberately designed to get as close to overfitting as possible without killing your private LB score. They are rarely hand labelled (though there is plenty of hand engineered features). Over engineering is rarely a problem!

(Source: Kaggler. Also the Kaggle blog)

midko · on March 23, 2016

I can't recommend this video because I haven't watched it yet but looking at the slides it looks like a good survey talk about generative models and an intro to DCGAN https://www.youtube.com/watch?v=KeJINHjyzOU

gnarbarian · on March 23, 2016

They all look the same to me.

Love to see with with more training data.

platz · on March 23, 2016

> "Deep Convolutional Generative Adversarial Networks"

Only 4 qualifiers? We can do better.. 5, 6 even 7 are now on the horizon!

ruraljuror · on March 23, 2016