Can anyone give me a quick rundown on how exactly one gains access to all of thi... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

GigabyteCoin on Nov 28, 2013 | parent | context | favorite | on: 102TB of New Crawl Data Available

Can anyone give me a quick rundown on how exactly one gains access to all of this data?

I have heard about this project numerous times, and am always dissuaded by the lack of download links/torrents/information on their homepage.

Perhaps I just don't know what I'm looking at?

wpietri on Nov 28, 2013 [–]

Did you try this?

http://commoncrawl.org/get-started/

I haven't tried that one, but I've poked at other of the Amazon Common Datasets collection:

http://aws.amazon.com/datasets

If you're already familiar with using Amazon's virtual servers, it's pretty straightforward.

I also note that the Common Crawl project publishes code here:

https://github.com/commoncrawl/commoncrawl

Join us for AI Startup School this June 16-17 in San Francisco!
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact