If you’re worried about the security risks, edge cases, maintenance pain and scaling challenges of self hosting there are various solid hosted alternatives:
Looking at your urlbox - pretty funny language around the quota system.
>What happens if I go over my quota?
>No need to worry - we won't cut off your service. We automatically upgrade you to the next tier so you benefit from volume discounts. See the pricing page for more details.
So... If I go over the quota you automatically charge me more? Hmm. I would expect to be rejected in this case.
In my experience our customers are more worried about having the service stop when they hit the limit of a tier than they are about being charged a few more dollars.
Maybe I'm misreading. It sounds like you're stepping the user up a pricing tier - e.g. going from 50 a month to 100 and then charging at the better rate.
I would also worry about a bug on my end that fires off lots of screenshots. I would expect a quota or limit to protect me from that.
That’s right. On our standard self-service plans we automatically charge a better rate as volume increases. You only pay the difference between tiers as you move through them.
It’s rare that anyone makes that kind of mistake. It probably helps that our rate limits are relatively low compared to other APIs and we email you when you get close to stepping up a tier. If you did make such a mistake we would, like all good dev tools, work with you to resolve. If it happened a lot we might introduce some additional controls.
We’ve been in this business for over 12 years and currently have over 700 customers so we’re fairly confident we have the balance right.
I'm not a customer, so don't take what I say too seriously, but to me it seems like you are unilaterally making a purchasing decision on my behalf. That is, I agreed to pay you 50 dollars a month and you are deciding I should pay 100 (or more) - to "upgrade" my service. My intuition is that this is probably not legal, and, if I were a customer, I would not pay for a charge that I didn't explicitly agree to - if you tried to charge me I would reject it at the credit card level.
If I sign up for a service to pay X and get Y, then I expect to pay X and get Y - even if my automated tools request more than Y - they should be rejected with a failure message (e.g. "quota limit exceeded").
https://www.scraperapi.com/ is good too. Been using them to scrape via their API on websites that have a lot of captchas or anti scraping tech like DataDome.
What’s the chance you’re affiliated? Almost every one of your comments links to it. And curiously similar interest in Rust from the official HN page and yours. No need to be sneaky.
robots.txt isn't legally binding. I am interested to know if and how services even interact with it. It's more like a clue on when the interesting content for scrapers is on your site. This is how I imagine it goes:
there's also our product, Airtop (https://www.airtop.ai/), which is under the scraping specialist / browser automation category that can generate screenshots too.
Hey I'm curious what your thoughts are on whether you need a full blown agent that moves the mouse and clicks to extract contents from webpages or a more simplistic tool that can just scrape pages + take screenshots and pass it through an LLM is generally pretty effective?
I can see niches cases likes videos or animations being better understood by an agent though.
Airtop is designed to be flexible, you can use it as part of a full-blown agent that interacts with webpages or as a standalone tool for scraping and screenshots.
One of the key challenges in scraping is dealing with anti-bot measures, CAPTCHAs, and dynamic content loading. Airtop abstracts much of this complexity while keeping it accessible through an API. If you're primarily looking for structured data extraction, passing pages through an LLM can work well, but for interactive workflows (e.g., authentication, multi-step navigation), an agent-based approach might be better. It really depends on the use case.
- https://browserless.io - low level browser control
- https://scrapingbee.com - scraping specialists
- https://urlbox.com - screenshot specialists*
They’re all profitable and have been around for years so you can depend on the businesses and the tech.
* Disclosure: I work on this one and was a customer before I joined the team.