RoboSat: feature extraction from aerial and satellite imagery

danieljh · on July 8, 2018

Daniel from Mapbox here. Happy to answer questions or talk through design decisions. Interested to hear your feedback.

We are mostly focusing on making the process accessible to a broader audience in the geo space, building a solid production-ready end-to-end project.

There are more resources and a step-by-step guide for running on openly available drone imagery in Tanzania:

https://www.openstreetmap.org/user/daniel-j-h/diary/44145 https://www.openstreetmap.org/user/daniel-j-h/diary/44321

throwawaymath · on July 8, 2018

Would you be open to chatting over email? I'm working on a noncommercial software project with a lot of geospatial data and I think you (and more generally folks working at Mapbox) could provide useful technical insight. My email is in my profile if you're up for it :)

smallhands · on July 8, 2018

Congratulation to you and the team. I have just one question in the world of tensorflowjs how do I run this project on a browser I was hoping to use this project to introduce high students to data science ?

danieljh · on July 8, 2018

I haven no experience with TensorFlow.js. That said, using the RoboSat ONNX model exporter (rs export) you should be able to go from a trained PyTorch model to a portable ONNX protobuf, then from there to a TensorFlow model, and eventually to TensorFlow.js. At least that's how I would approach it. Keep me posted if you look into it and get it working, interesting use-case for sure.

smallhands · on July 8, 2018

Interesting, shoot me a email tejioford@yahoo.com

nl · on July 8, 2018

Have you released pre-trained models?

It would be pretty useful if you did, even as a just basis for transfer learning.

Also a description of the model that is used? I assume this is the code[1], which references https://arxiv.org/abs/1806.00844, but the code doesn't seem to use WideResnet (although I really know Keras much better than PyTorch so I'm probably missing something.

[1] https://github.com/mapbox/robosat/blob/master/robosat/unet.p...

danieljh · on July 8, 2018

We haven't released pre-trained models yet. Mostly for two reasons: 1/ The PyTorch checkpoints depend on the specific Python model class. Even if you refactor only e.g. a MaxPool layer into a direct functional.max_pool function call, loading old checkpoints will no longer work. We have an ONNX model exporter now (rs export) which allows for self-contained and portable protobuf model and weight files. This workflow needs some more time and careful evaluation, though. 2/ The models for Tanzania I was working on in my spare time I can open up for sure. If there is community interest maybe we can come up with a publicly available model catalogue hosting ONNX models and metadata where folks can easily upload and download models. For our internal models and the data we extract we are thinking through a broader strategy since a lot of time and resources are going into creating and cleaning datasets, doing hard-negative mining, running multiple training iterations and so on. They're also bound to the Mapbox aerial imagery on specific zoom levels.

The model architecture is kept simple on purpose. It used to be an encoder-decoder U-Net'ish architecture which we trained from scratch. Recently (https://github.com/mapbox/robosat/pull/46) I switched out the encoder to a pre-trained ResNet, as proposed by Alexander Buslaev. It's a mix of the papers listed in the docstring at the top with a focus on simplicity and maintainability:

https://github.com/mapbox/robosat/blob/1e687552fe9b254a14d55...

Internally we were also exploring a multi-class PSPNet but decided not to move forward with it right now: the RoboSat model is currently a binary model (feature vs. background) which makes a few things easier in practice, such as efficiently storing results which is needed when scaling it up e.g. to all of North America.

nl · on July 9, 2018

Personally I always try pre-trained models very very useful.

If I'm working in a new domain (which this is to me) then I prefer to get the workflow right (files in the right directories etc) before changing the NN architecture. It's a pretty big time investment to train a NN just to try it.

Atanahel · on July 9, 2018

Interestingly, we've taken the same approach to process historical document (like 18th Venetian manuscripts).

We even use a Unet architecture with a pretrained resnet50 encoder, and some postprocessing to go from prob maps to polygons, like this project does. Of course, we are much more limited than what you propose, but it is reassuring our side project took the same course as what bigger entities do.

https://dhlab-epfl.github.io/dhSegment/

rishabhj_says · on July 8, 2018

Hi Daniel, in your experience, what features is this model most suited for and with what granularity of imagery? For example, buildings/roads with landsat(30m)? Cars with 30cm resolution imagery?

danieljh · on July 9, 2018

We are running it internally on our aerial imagery from the Mapbox Maps API. The zoom level even there depends on the feature you want to extract, for example z18 seems to work well for parking lots.

There is not a single feature this model is most suited for: you can add arbitrary features (e.g. tennis courts, swimming pools) in pre-processing and train your model. Then the imagery quality depends on your feature, for example it will be hard to impossible to spot swimming pools in Landsat imagery.

alexcnwy · on July 8, 2018

Very cool! Does anyone know any good global free satellite imagery datasets?

danieljh · on July 8, 2018

Sentinel 2 data could be interesting for you. With its high refresh rate it can be used for change detection. The multi-band data is especially useful for adding more input channels to the model e.g. for water detection based on NDWI: https://apps.sentinel-hub.com/eo-browser/#lat=52.51747&lng=1...

I can also recommend http://openaerialmap.org for playing around with openly available drone imagery. I did this last week as a side-project on my evenings for drone imagery provided by http://www.zanzibarmapping.com. Here is a step-by-step guide for giving it a try on the Tanzania drone imagery: https://www.openstreetmap.org/user/daniel-j-h/diary/44321

throwawaymath · on July 8, 2018

Sure, the NOAA has extensive and freely available satellite imagery data: https://www.ncdc.noaa.gov/data-access/satellite-data/satelli...

NASA also releases satellite imagery for the public.

alexnewman · on July 8, 2018

Planet.com