If you’re worried about the security risks, edge cases, maintenance pain and sca...

ALittleLight · 2025-02-06T23:22:13 1738884133

Looking at your urlbox - pretty funny language around the quota system.

>What happens if I go over my quota?

>No need to worry - we won't cut off your service. We automatically upgrade you to the next tier so you benefit from volume discounts. See the pricing page for more details.

So... If I go over the quota you automatically charge me more? Hmm. I would expect to be rejected in this case.

jot · 2025-02-06T23:43:19 1738885399

I’m sure we can do better here.

In my experience our customers are more worried about having the service stop when they hit the limit of a tier than they are about being charged a few more dollars.

ALittleLight · 2025-02-06T23:47:59 1738885679

Maybe I'm misreading. It sounds like you're stepping the user up a pricing tier - e.g. going from 50 a month to 100 and then charging at the better rate.

I would also worry about a bug on my end that fires off lots of screenshots. I would expect a quota or limit to protect me from that.

jot · 2025-02-07T08:29:19 1738916959

That’s right. On our standard self-service plans we automatically charge a better rate as volume increases. You only pay the difference between tiers as you move through them.

It’s rare that anyone makes that kind of mistake. It probably helps that our rate limits are relatively low compared to other APIs and we email you when you get close to stepping up a tier. If you did make such a mistake we would, like all good dev tools, work with you to resolve. If it happened a lot we might introduce some additional controls.

We’ve been in this business for over 12 years and currently have over 700 customers so we’re fairly confident we have the balance right.

ALittleLight · 2025-02-07T18:04:13 1738951453

I'm not a customer, so don't take what I say too seriously, but to me it seems like you are unilaterally making a purchasing decision on my behalf. That is, I agreed to pay you 50 dollars a month and you are deciding I should pay 100 (or more) - to "upgrade" my service. My intuition is that this is probably not legal, and, if I were a customer, I would not pay for a charge that I didn't explicitly agree to - if you tried to charge me I would reject it at the credit card level.

If I sign up for a service to pay X and get Y, then I expect to pay X and get Y - even if my automated tools request more than Y - they should be rejected with a failure message (e.g. "quota limit exceeded").

edm0nd · 2025-02-06T21:09:22 1738876162

https://www.scraperapi.com/ is good too. Been using them to scrape via their API on websites that have a lot of captchas or anti scraping tech like DataDome.

rustdeveloper · 2025-02-06T21:22:03 1738876923

Happy to suggest another web scraping API alternative I rely on: https://scrapingfish.com

xeornet · 2025-02-07T12:05:10 1738929910

What’s the chance you’re affiliated? Almost every one of your comments links to it. And curiously similar interest in Rust from the official HN page and yours. No need to be sneaky.

bbor · 2025-02-06T22:21:00 1738880460

Do these services respect norobot manifests? Isn't this all kinda... illegal...? Or at least non-consensual?

basilgohar · 2025-02-06T22:32:30 1738881150

robots.txt isn't legally binding. I am interested to know if and how services even interact with it. It's more like a clue on when the interesting content for scrapers is on your site. This is how I imagine it goes:

"Hey, don't scrape the data here."

"You know what? I'm scrape it even harder!"

bbor · 2025-02-07T00:02:53 1738886573

Soooo nonconsensual.

Maybe bluesky is right… are we the baddies?

tonyhart7 · 2025-02-07T01:56:29 1738893389

it is legally binding if your company based on SV (only California implement this law) and they can prove it

theogravity · 2025-02-06T23:13:19 1738883599

there's also our product, Airtop (https://www.airtop.ai/), which is under the scraping specialist / browser automation category that can generate screenshots too.

kevinsundar · 2025-02-06T23:19:08 1738883948

Hey I'm curious what your thoughts are on whether you need a full blown agent that moves the mouse and clicks to extract contents from webpages or a more simplistic tool that can just scrape pages + take screenshots and pass it through an LLM is generally pretty effective?

I can see niches cases likes videos or animations being better understood by an agent though.

theogravity · 2025-02-07T17:57:13 1738951033

Airtop is designed to be flexible, you can use it as part of a full-blown agent that interacts with webpages or as a standalone tool for scraping and screenshots.

One of the key challenges in scraping is dealing with anti-bot measures, CAPTCHAs, and dynamic content loading. Airtop abstracts much of this complexity while keeping it accessible through an API. If you're primarily looking for structured data extraction, passing pages through an LLM can work well, but for interactive workflows (e.g., authentication, multi-step navigation), an agent-based approach might be better. It really depends on the use case.