We've done that, but it's tough to rate-limit a botnet because of the ip address spread. Also, their crappy scraper software doesn't even bother to check if requests are successful; it spews them just as fast no matter how our site responds.
No. They botnets works through multiple regions on multiple cloud providers - that's how they achieve such high throughput. For any single IP address, the load is reasonable, but for the whole botnet it's absurd.
Currently bot traffic accounts for 2/3 of my load, meaning that the cost of providing my service is 3x what it would be without these persistent bots.