Hacker News new | past | comments | ask | show | jobs | submit login

I would've said you should download only archives, but really I think commits are also very important data since that shows the actual changes in the code which would be very useful to train AI to suggest changes to the code.



There are valid non-evil reasons for git hosts to want to throttle and put up obstacles toward scraping as well, both via crawlers or 'git clone' or whatever. These are very expensive operations.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: