Hacker News new | past | comments | ask | show | jobs | submit login

It appears to be the exact opposite to me, `git clone --depth 1 ...` will give you a code that you can know exactly how to parse, vs. webpages that have all sorts of semantical issues.



Git clone is a very expensive operation. Git hosts generally will try to prohibit mass git clone:ing for this reason.


What makes it so expensive? I’d always assumed it downloaded the .git directory statically, and the computational bits were down by the local client.


I'd assume this is in relation to how much other operations cost. With 'git clone' you at least download the whole repository. Compare that to 'git fetch', which is essentially a lookup at the last-modified timestamp.


Yeah. Git repositories can grow very large very quickly. A single clone here and there isn't too bad, but if you're scraping tens of thousands of projects, you can easily rack up terabytes in disk and network access.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: