Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I use greasemonkey on firefox. Recently, I have written a crawler for a major accomondation listing website in Copenhagen. Guess what? I got a place to live right in the center in 2 weeks. I love SCRAPERS I love CRAWLERS.


I did almost the same thing.

I used 1 week to selectively go through accommodations manually, then proceed to complain to a friend of mine.

She's barely human, and she found literary one-of-a-kind apartment dead center at a good price. The apartment was mine next day.

Human scrapers man.


Similarly I wrote a scraper for a local used item marketplace and whenever I need to purchase something that isn't urgent and I'm OK with it being second hand, I plug in the relevant stats and load it in to cron. Near instant notification with contact information and details in my email when a match is found.


Well the problem is when someone scrapes ALL the good listings then pre-purchases them for resale at double the cost.


How is it different than paying 50+ low-wage remote workers to "scrape" the phonebook for you and then using the information acquired for profit?


One difference is that Feist v. Rural Telephone says that the data in a phonebook can't be copyrighted.

https://en.wikipedia.org/wiki/Feist_Publications,_Inc.,_v._R....


What about using those employees to "crawl" the web for you then?


I suspect it's roughly the same as a crawer -- same issues of fair use, TOS/CFAA, etc -- but likely there's no expectation that humans will read and follow robots.txt.


I made a web app that does the same thing for campgrounds: http://reserve.wanderinglabs.com/

Fairly simple and it people seem to find it really useful.


so can you use greasemonkey to follow links, load new page, parse new page, just like a headless crawler?


I think you can use a 3rd party backend service to store state of the crawler. So, when page reloads, you know which state you are in.


right. so basically a greasemonkey script is scoped to the current page? Is there any scripting solution that is not scoped to current page? In chrome maybe?


Browser automation via (realistically Seleniun) WebDriver or a proxy that inserts scripts (like TestCafe).


selenium with webdriver(io) or phantomjs. I like selenium more, though.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: