Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Hyperbrowser – Scalable Browser Infrastructure for AI Apps (hyperbrowser.ai)
57 points by ashekhawat 8 months ago | hide | past | favorite | 45 comments
Hey HN!

Excited to share a project we've been working on called Hyperbrowser. It’s a tool that makes scaling headless browsers ridiculously easy. It allows you to spin up hundreds of browser sessions in secure, isolated environments, with sub-second launch times. We automatically solve captchas, use residential proxies and manage concurrent sessions so that you can focus on your own business.

The idea for Hyperbrowser came from our own struggles building AI apps and agents like sales tools, automations, and AI editors. Every project seemed to hit the same roadblock: interacting with the web. Whether we needed web data as input or web browsing as output, we faced constant challenges—getting blocked, setting up proxy services, solving captchas, and scaling everything in Kubernetes.

On top of that, we had to build custom functions and services to convert websites into LLM-friendly markdown and crawl entire sites for relevant data. Keeping all of this running became a full-time job!

To make this easy for everyone else we built Hyperbrowser. It packages everything we learned and built, with a nice frontend that gets rid of the boilerplate and lets you hit the ground running. Hyperbrowser works seamlessly with tools you already know, like Puppeteer, Playwright, or Selenium, while removing the hassle of infrastructure and scaling.

If this sounds interesting, we’d love for you to give it a spin! You can sign up and start playing around with a free plan. Would love to hear your thoughts, feedback, or ideas! Check it out here at hyperbrowser.ai.

If you have any questions at all feel free to reach out to me at akshay@hyperbrowser.ai! Ideally share the website you'd like to scrape or automate. I can provide a script for it or we can create a custom API endpoint!



Nice one - We have a similar service provided from Charity Engine, our "crowdsourced cloud service" - https://www.charityengine.com/marketplace . (Relevant docs here: https://www.charityengine.com/docs/Smart+Proxy+Service )

Of note: as Charity Engine is a general-purpose distributed compute platform, it's possible to run applications to post-process web content "at the edge" of the network, as the data is streamed back. Ex we're now running small LLMs along with the browsers, so it's possible to do things like run sentiment analysis, generate image descriptions, summarize content, etc – all at scale, enabled by hundreds of thousands of processors.

For more, see this talk: "Distributed Intelligence for Distributed Data" - https://youtu.be/YQe--AZFUuQ?si=_HoqkdooNPSR7dDQ (13mins)


Your pricing is too confusing. Either price it based on number of requests like ScrapingBee does, or base it entirely on bandwidth. Instead your pricing is based on both bandwidth and “minutes” (which is also hard to predict when some sites can load slower than others)

2nd, the pricing also seems way way below what ScrapingBee offers.

You charge $100 for 60K credits. So assuming 18 credits for 1 page (3 MB per page) that allows me to crawl 3333 web pages. $250 would allow me to crawl under 10k pages

In comparison it costs $249 for 3 million credits in ScrapingBee which enable me to crawl 100,000 webpages (with JS and proxies enabled)

Am i missing something obvious here?


Hey! Completely hear you on that, right now we just have endpoints for headless browsers, this is a instance running chrome with dedicated resources that you have full access to. This comes with proxy services and captcha solving. This comes at a higher cost but you will get blocked less and can do much more, like browser automations.

Dedicated endpoints for just scraping like scraping bee are coming next week at a discounted price! In the mean time let me know if there is anything you would like that scraping bee is missing :)


Thanks, ScrapingBee has everything I am looking for in terms of functionality. My honest opinion is that most people who are looking for an alternative just want something cheaper. Thats 99% of what I am looking for, at least

Things like captcha solving are nice no doubt but for many people, their use cases just aren’t compelling enough to make it worth the ROI. I would love captcha solving, but definitely not at your price point..


Other cofounder of Hyperbrowser here - thanks for the feedback, and noted!

We aim to be pretty competitive on price across the board. Today we're just launching our headless browser service and I think we're significantly less expensive than competitors on that as dbmikus mentioned below

Totally hear you on the scraping specific use case without captcha solving etc though - we'll probably launch competitive pricing for this specific use case in the next few days. I'll make a note to leave a comment here when we do or feel free to email me at shri@hyperbrowser.ai and I can follow-up there :)


OK i am waiting to hear what you have in store!


Shame I already rolled this out in-house (and spent way too long trying to achieve it) - this would have been perfect for my use case. If you don't mind me asking, how do you get chrome to boot so quickly? GCP is unbelievably good at launching containers quickly but chrome is slow to launch.


We actually keep a pool of browsers ready so when a request comes we just assign it to an already running browser. This way in most situations there is no actual waiting for chrome to startup!


Ah of course, makes total sense. Thanks for answering!


Is each browser process dedicated to a single customer, or do you clear cookies and other state (localStorage, cache, Chrome advertising features...) between customers?


We use a new browser for each session you create! We don't share the browsers between different customers.


What challenges did you face in rolling this out? I did as well, and it didnt take long but thats only because I didnt aim for 100% perfection and was OK with being able to crawl 90% of webpages i encountered


Super cool. Any way to know if the endpoints are 'ethically sourced' and how they are secured wrt confidentiality & integrity?

(We have a 100K+/day scraping workload, and TBD full interactive automation)


Thank you! We do make sure we are using legitimate proxy providers. Also happy to manually set it to the provider of your choice if you are interested in doing that :)


Sweet! trying this out now.

one quick nit on your docs: https://docs.hyperbrowser.ai/guides/scrape-site

```

import Hyperbrowser from "@hyperbrowser/sdk";

const client = new Hyperbrowser({ apiKey: process.env.HYPERBROWSER_API_KEY, });

(async () => {

  const job = await client.startScrapeJob({ url: "https://example.com" });

  console.log(job);

  const job = await client.getScrapeJob(job.jobId);

  console.log(job);
})(); ```

should be:

```

import Hyperbrowser from "@hyperbrowser/sdk";

const client = new Hyperbrowser({ apiKey: process.env.HYPERBROWSER_API_KEY, });

(async () => {

  let job = await client.startScrapeJob({ url: "https://example.com" }); // s/const/let

  console.log(job);

  job = await client.getScrapeJob(job.jobId); // remove const

  console.log(job);
})();

```


thanks! fixing now


update: fixed now :)


Hi,

This looks very interesting and I’d love to try it.

We have data extraction and enrichment platform. We process over 12 million automations for our users, half of this is done on lambda and rest we try to process in a container.

We also utilize 10 GB+ in proxies.

I tried to do all the math on how much it would cost on your platform if we do a pilot project but I’m so confused with the pricing.

Can you please explain for 10,000 hours and 10 GB proxy usages what would the cost?


Hey - thanks!

For your specific example, this is how the math works out:

- Browser hours: 10K hours * 60 minutes per hour * 1 credit / minute = 600K credits

- Proxy usage: 10GB proxy data * 1024 MB / GB * 6 credits / 1MB proxy data = 61,440 credits

The scale plan includes 60K credits for $100/mo and overage credits are ~0.2c each, so the total would be $100 + (601.44 * $0.002) = 1302.88

Sorry about the confusing pricing btw! As I wrote this out, clearly we have our work cut out for us to simplify pricing - we'll try to make it simpler over the next few days and add a calculator that let's you see how much this would cost.

Feel free to email me at shri@hyperbrowser.ai if I can help with anything!


This seems like a good option for us. I will write an email to you to discuss this in more detail. Thank you!


That seems like an incredible value on the surface. Great job.


thank you! :)


Incredible that you managed to achieve such a short start time for chrome. Although for only scraping data I think something like scrapingbee or crawlbase gets the job done better, faster and cheaper. But I will consider your solution for browser automation


Of course, would love for you to take a look! I mentioned in another comment but we are working on lower cost/faster scraping focused endpoints, It's coming next week


Just did a pricing check, and you give twice as much concurrency/browsing/data-transfer as Browserbase. Nice!

Do you have any restrictions on sites we can scrape? I am likely going to do some public LinkedIn scraping (logged out, legally compliant).


Hey! So we have to use a different proxy service for some sites (LinkedIn included). Just ask in the app support and we can do that. We are also making custom endpoints for LinkedIn as well that should be ready soon, Ill shoot you an email once we have that!


Will Hyperbrowser soon offer live session view like BrowserBase does e.g., https://docs.browserbase.com/features/session-live-view It'll be great to enable end users to view and control the remote browser session. Btw, Hyperbrowser seems great! Thanks for building this


I'm quite curious, what are people doing with AI-powered browser automation at the moment?

This is such a new capability that I'm having a hard time getting a sense of interesting use cases. I'm quite sure this is more than just a shim for web services which don't expose APIs. But I also wonder whether LLMs are good enough to be trusted with more open-ended tasks at this stage.


Pretty wide variety of use cases tbh - everything from classic scraping on the very simple side to integrating with 3P services that don’t offer APIs, agentic QA testing etc

The companies we’ve seen automatic agentic workflows well typically send a bunch of context to the LLM and somewhat constrain the actions that the model can take. Actually works better than you’d expect :)


Congrats on launching!

> spin up hundreds of browser sessions in secure, isolated environments, with sub-second launch times

What tech do you use for this, and where is this hosted?


Just docker containers in GCP and AWS. We do all the instance management ourselves. It seems like you are working on infra for this from your bio! Send me an email would love to chat :)


Wouldn’t hetzner bare metal allow you to do that an order of magnitude cheaper ??


We will definitely look into it! As we are adding more users scalability is our primary concern right now though


Is it possibile to run the chrome browser with a chrome extension installed that use content script?


Yup! We need to add it to the docs in a day or two but happy to help do it myself. Drop me a line at shri@hyperbrowser.ai!


Is this a cheaper alternative to browserbase? I am a bit confused since you mention creating endpoints for scraping many times, I just want to do browser automation. Is that a use case you support?


Sorry for the confusion, yes we do! https://docs.hyperbrowser.ai/guides/connect-with-puppeteer

Happy to help with anything :)


Seems pretty cool for getting some data off web pages. How many browsers can I open concurrently with hyperbrowser?


Hey, 100 max on our paid plan but I can enable 1k concurrently if you want!


Creator of Browser Use here. This looks super cool! How do you spin them up so fast, firecracker?


We keep "prewarmed" browsers ready for new sessions. Looking into firecracker though!


Do you guys support handling 2FA workflows/scenarios?


We don't have that yet, but you could definitely do it in your own scripts right now using our browser. We'll try to think of ways to make it easier though :)


had usecase of keeping visa appointments slots and instantly blocked by cloudflare :sad:


Sorry - Cloudflare bypass is coming this weekend!

This is also something I need btw so if you built a product on top of it, I'd be user #2 :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: