Hey Curt, Most of my own runs, using the 2 small VM default, resulted in 3 norma...

Radim · on Dec 16, 2011

that's for the crawl sample, not the entire 4TB index, right?

how much data was that?

ssalevan · on Dec 17, 2011

That was just for the crawl sample, yes, and was approximately 100M of data, though you can specify as much as you'd prefer.

The cool thing about running this job inside Elastic MapReduce right now is the ability to get at S3 data for free, and for cost of access outside of it, both pretty reasonable sums. Right now, you can analyze the entire dataset for around $150, and if you build a good enough algorithm you'll be able to get a lot of good information back.

We're working to index this information so you can process it even more inexpensively, so stay tuned for more updates!

mat_kelcey · on Dec 17, 2011

How is the $150 broken down?