Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I was commissioned to recover ideawave.ca from archive.org as its owner lost its database so pretty much all what was left was only on archive.org. I think it was under WordPress but he asked me to port it to Jekyll.

I scraped its contents (blog posts, pages, etcetera) with Python's beautifulsoup and redid its styling "by hand", which was not something otherworldy (the site was from line 2010 or so) and had the chance to put some improvements.

The thing with the scraping was that the connection was lost after a while and it was reaaaaaaaaaally sloooooooooow so I had to keep a register on memory of what was the last successful scraped post/page/whatever and, if something happened, restart from it as a starting point.

Got pennies for it, mostly because I lowballed myself, but got to learn a thing or two.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: