Hacker News new | past | comments | ask | show | jobs | submit login
How to host a full mirror of Wikipedia.org (github.com/pirate)
38 points by nikisweeting on Oct 1, 2019 | hide | past | favorite | 16 comments



For people who are interested in doing this sort of thing, you can also fully mirror and self-host openstreetmap, with copies of all global tiles, a tile server, etc. If I recall right the full dataset is about 900GB in size.


Arent pre generated tiles just a form of caching? What is the internal source data format? something vector based? How big is that?


I think there should be a "prepper" kit with a durable laptop loaded with wikipedia, spreadsheet, CAD program, videos on survival, repair, whatever else. Maybe thingiverse local copies, a 3d printer and a couple square feet of solar panels and batteries to power it all. Keep it all inside a metal case for a faraday cage.


Not all of it, but there was a device that was basically a "portable wikipedia" about 10 years ago:

https://en.wikipedia.org/wiki/WikiReader

eventually we will have the device you describe...

(aka the cornucopia machine from https://en.wikipedia.org/wiki/Singularity_Sky)


How useful would Wikipedia be in a post-apocalyptic situation? I feel like everything else you've mentioned is like, 2 orders of magnitude more important, and any sort of book on survival (even something like "how to survive a zombie apocalypse" or whatever) would beat Wikipedia in terms of usefulness.


I think you could get basic metallurgy, rocks, minerals, factory techniques, description of various machines, motors, engines, pumps and irrigation, stills, solar stills, horticulture, animal husbandry and other things from wikipedia at a level that would be useful. Things like "is this plant edible" are potentially useful.


I see where you're coming from. I just feel like so much of wikipedia is pop culture and history (I could be wrong though, I have no idea how to check this) that there would probably be a much more efficient resource for a post-apocalyptic scenario.


79GB for all of the English articles minus the media. That's smaller than I would have guessed. You can fit this large slice of our culture on a $20.99 flash drive and with 49GB left over. That seems like a good econo-cultural indicator, storage cost per wikipedia. I wish I could short that index.


When thinking about this sort of thing I always find it fun to think about information density perception. I could hand you a USB drive and it could either contain a significant chunk of the sum of human knowledge, taking you lifetimes to even skim through, or it could contain a 2.5 hour movie you'd think nothing of.

Multiple layers of things at work there of course but that's what makes it fun to think about.


>79GB for all of the English articles minus the media.

I think thats an error on the github, wikipedia_en_all_novid is all text + pictures, just no videos. Text alone is ~15GB zipped. My 2014 Media dump was ~76GB, so that 80GB for full text+media checks out.


Does wikipedia_en_all_novid really include pictures? Wouldn't that be many hundreds of GB?


I think just the pictures which are embedded in pages, not all media assets.


Still, that seems way too small to me considering there are ~6m articles.


Apparently I was wrong! They got it super small.

[ ] wikipedia_en_all_mini_2019-09.zim 2019-09-18 03:16 10G [ ] wikipedia_en_all_nopic_2018-09.zim 2018-09-26 16:43 35G [ ] wikipedia_en_all_novid_2018-06.zim 2018-07-18 21:21 77G [ ] wikipedia_en_all_novid_2018-10.zim 2018-11-06 12:43 78G


From what I gathered ZIM picture library is re compressed for lower quality/size.


If "without media" means without images as well, it is larger than i expected




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: