Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

you know, or the real world reasonnable mature engineering answer, a Java/C#/C++ scalable parallel tool using modern libraries and MPI if it ever needs to scale.


You usually don't need that though, that's the point. If you're building an entire service around page crawling, sure. If you're doing a one-off task, don't bother.


That would be my first gut reaction too, but if it is as simple as downloading webpages, this is actually a really great solution. I suspect he used that when he built Milo, a now defunct startup sold to eBay where they had to update prices and inventory data regularly. A startup should make different choices than Google.


Yeah if you're Google. Most people are not, and wget is plenty. After all it's written in C.


Or use curl, for a slightly better engineered wget.


I'm pretty proficient in both, and I think that's a mischaracterization of the two tools. wget is more suited to pulling down large files, groups of files, etc. curl is more suited to API calls where you might need to do something complicated at the protocol level. Each has their use.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: