As someone that's used IP proxying services that provide millions of IPs for scraping purposes, that is a very mature industry, and they advertise (and I believe them) "millions" or IPS, even for what you might consider hard to supply ones, like mobile IPs, and they let you slice and dice them however you want? Datacenter IPs? Residential IPs? Mobile IPs?[1] What state or city would you like them in? Would you like the site you're hitting to not have been accessed by this IP (through proxying at least), and if so how many days? Do you want some mix of that? Make your own configurations and set them up as proxy endpoints, etc.
Fighting against abuse at the level of IP address attributes seems like a losing game to me. Honestly, the best I saw at this (3-5 years ago at least) for traffic was Distil networks, where they put a proxy device in front and examine your traffic and captcha or block based on that.
Since you have content being submitted, there's a lot more you can use to classify, such as how you used ML, so that's good. Part of me worries that this is all sort of reminiscent of infections and antibiotics though. The continual back-and-forth of you finding a block them finding a workaround feels kind of like you were training the spammers (even if you were training yourself at the same time). At some point maybe we'll find that most the forum spam is ML generated low information content posts that also happen to be astroturfing that is hard to distinguish from real people's opinions.
1: Fun fact, to my knowledge anonymous mobile IPs are provided by a bunch of apps opting into an SDK (like an advertising/metrics SDK) which while their app is open (at least I hope that's a requirement) registers itself to the proxying service so it can be handed out for use by paying proxy customers. Think about that next time you play your free "ad-supported" mobile game.
Fighting against abuse at the level of IP address attributes seems like a losing game to me. Honestly, the best I saw at this (3-5 years ago at least) for traffic was Distil networks, where they put a proxy device in front and examine your traffic and captcha or block based on that.
Since you have content being submitted, there's a lot more you can use to classify, such as how you used ML, so that's good. Part of me worries that this is all sort of reminiscent of infections and antibiotics though. The continual back-and-forth of you finding a block them finding a workaround feels kind of like you were training the spammers (even if you were training yourself at the same time). At some point maybe we'll find that most the forum spam is ML generated low information content posts that also happen to be astroturfing that is hard to distinguish from real people's opinions.
1: Fun fact, to my knowledge anonymous mobile IPs are provided by a bunch of apps opting into an SDK (like an advertising/metrics SDK) which while their app is open (at least I hope that's a requirement) registers itself to the proxying service so it can be handed out for use by paying proxy customers. Think about that next time you play your free "ad-supported" mobile game.