I've been using OpenAI operator for some time - but more and more websites are b...

bijant · 2025-07-17T18:17:41 1752776261

THIS is the main problem. I was listening the whole time for them to announce a way to run it locally or at least proxy through your local devices. Alas the Deepseek R1 distillation experience they went through (a bit like when Steve Jobs was fuming at Google for getting Android to market so quickly) made them wary of showing to many intermediate results, tricks etc. Even in the very beginning Operator v1 was unable to access many sites that blocked data-center IPs and while I went through the effort of patching in a hacky proxy-setup to be able to actually test real world performance they later locked it down even further without improving performance at all. Even when its working, its basically useless and its not working now and only getting worse. Either they make some kinda deal with eastdakota(which he is probably too savvy to agree to)or they can basically forget about doing web browsing directly from their servers.Considering, that all non web applications of "computer use" greatly benefit from local files and software (which you already have the license for!)the whole concept appears to be on the road to failure. Having their remote computer use agent perform most stuff via CLI is actually really funny when you remember that computer use advocates used to claim the whole point was NOT to rely on "outdated" pre-gui interfaces.

burningion · 2025-07-17T19:11:04 1752779464

This is why an on device browser is coming.

It'll let the AI platforms get around any other platform blocks by hijacking the consumer's browser.

And it makes total sense, but hopefully everyone else has done the game theory at least a step or two beyond that.

ghm2180 · 2025-07-17T19:15:20 1752779720

You mean like calaude code's integration with play right ?

buzzerbetrayed · 2025-07-18T05:09:27 1752815367

No, because playwright can be detected pretty easily and blocked. It needs to be (and will be) using the same browser that you regularly browse with.

Jommi · 2025-07-18T07:25:12 1752823512

detecting playwright is not "easy"... It can be done but its cat and mouse. Playwright in non headless mode is nearly impossible to detect.

CharlieDigital · 2025-07-18T15:39:39 1752853179

Its the other way around: it's easier to detect because detectors are looking for specific "fingerprints" and may even try to run specific JavaScript that will only work when there is a UI present.

(Source: did a ton of web scraping and ran into a few gnarly issues and sites and had to write a p/invoke based UI automation scraper for some properties)

sumedh · 2025-07-18T10:35:20 1752834920

Perplexity already launched it.

FergusArgyll · 2025-07-17T18:15:40 1752776140

If people will actually pay for stuff (food, clothing, flights, whatever) through this agent or operator, I see no reason Amazon etc would continue to block them.

pants2 · 2025-07-17T18:25:14 1752776714

I was buying plenty of stuff through Amazon before they blocked Operator. Now I sometimes buy through other sites that allow it.

The most useful for me was: "here's a picture of a thing I need a new one of, find the best deal and order it for me. Check coupon websites to make sure any relevant discounts are applied."

To be honest, if Amazon continues to block "Agent Mode" and Walmart or another competitor allows it, I will be canceling Prime and moving to that competitor.

FergusArgyll · 2025-07-17T18:37:13 1752777433

Right but there were so few people using operator to buy stuff that it's easier to just block ~ all data center ip addresses. If this becomes a "thing" (remains to be seen, for sure) then that becomes a significant revenue stream you're giving up on. Companies don't block bots because they're Speciesist, it's bec usually bots cost them money - if that changes, I assume they'll allow known chatgpt-agent ip addrs

j_timberlake · 2025-07-18T03:07:09 1752808029

The AI isn't going notice the latest and greatest hot new deals that are slathered on every page. It's just going to put the thing you asked for in the shopping-cart.

exitb · 2025-07-17T18:21:20 1752776480

Many shopping experiences are oriented towards selling you more than you originally wanted to buy. This doesn’t work if a robot is doing the buying.

falcor84 · 2025-07-17T19:38:29 1752781109

I'm concerned that it might work. We'll need good prompt injection protections.

shlomo_z · 2025-07-17T23:37:20 1752795440

Possibly in part because bots will not fall for the same tricks as humans (recommended items, as well as other things which amazon does to try and get the most money possible)

achrono · 2025-07-17T18:39:26 1752777566

In typical SV style, this is just to throw it out there and let second order effects build up. At some point I expect OpenAI to simply form a partnership with LinkedIn and Amazon.

In fact, I suspect LinkedIn might even create a new tier that you'd have to use if you want to use LinkedIn via OpenAI.

aerhardt · 2025-07-18T17:23:11 1752859391

I do data work in domains that are closely related to LinkedIn (sales and recruitment), and let me tell you, the chances that LinkedIn lets any data get out of the platform are very slim.

They have some of the strongest anti-bot measures in the world and they even prosecute companies that develop browser extensions for manual extraction. They would prevent people from writing LinkedIn info with pen and paper, if they could. Their APIs are super-rudimentary and they haven't innovated in ages. Their CRM integrations for their paid products (ex: Sales Nav) barely allow you to save info into the CRM and instead opt for iframe style widgets inside your CRM so that data remains within their moat.

Unless you show me how their incentives radically change (ex: they can make tons of money while not sacrificing any other strategic advantage), I will continue to place a strong bet on them being super defensive about data exfiltration.

gitgud · 2025-07-17T21:41:05 1752788465

Why would platforms like LinkedIn want this? Bots have never been good for social media…

tasty_freeze · 2025-07-17T22:45:58 1752792358

If they are getting a cut of that premium subscription income, they'd want it if it nets them enough.

Jolter · 2025-07-18T09:33:55 1752831235

Would that income be more than the lost ad revenue (as applicants stop visiting their site) plus lost subscriptions on the employer side (as AI-authored applications make the site useless to them)? Who knows but probably MS are betting on no.

wilg · 2025-07-18T00:14:05 1752797645

LinkedIn is probably the only social platform that would be improved by bots.

Jolter · 2025-07-18T09:32:08 1752831128

Hiring companies certainly don’t want bots to write job applications. They are already busy weeding out the AI-written applications and bots would only accelerate their problem. Hiring companies happen to be paying customers of LinkedIn.

achrono · 2025-07-18T14:28:33 1752848913

Job applications aren't the only use case for using LinkedIn in this connected way, but even on that topic -- I think we are moving pretty quickly to no longer need to "weed out" AI-written applications.

As adoption increases, there's going to be a whole spectrum of AI-enabled work that you see out there. So something that doesn't appear to be AI written is not necessarily pure & free of AI. Not to mention the models themselves getting better at not sounding AI-style canned. If you want to have a filter for lazy applications that are written with a 10-word prompt using 4o, sure, that is actually pretty trivial to do with OpenAI's own models, but is there another reason you think companies "don't want bots to write job applications"?

modeless · 2025-07-17T18:47:45 1752778065

Agents respecting robots.txt is clearly going to end soon. Users will be installing browser extensions or full browsers that run the actions on their local computer with the user's own cookie jar, IP address, etc.

pants2 · 2025-07-17T18:53:14 1752778394

I hope agents.txt becomes standard and websites actually start to build agent-specific interfaces (or just have API docs in their agent.txt). In my mind it's different from "robots" which is meant to apply rules to broad web-scraping tools.

modeless · 2025-07-17T19:15:02 1752779702

I hope they don't build agent-specific interfaces. I want my agent to have the same interface I do. And even more importantly, I want to have the same interface my agent does. It would be a bad future if the capabilities of human and agent interfaces drift apart and certain things are only possible to do in the agent interface.

falcor84 · 2025-07-17T19:37:02 1752781022

I think the word you're looking for is Apartheid, and I think you're right.

tomashubelbauer · 2025-07-17T19:18:39 1752779919

I wonder how many people will think they are being clever by using the Playwright MCP or browser extensions to bypass robots.txt on the sites blocking the direct use of ChatGPT Agent and will end up with their primary Google/LinkedIn/whatever accounts blocked for robotic activity.

falcor84 · 2025-07-17T19:35:48 1752780948

I don't know how others are using it, but when I ask Claude to use playwright, it's for ad-hoc tasks which look nothing like old school scraping, and I don't see why it should bother anyone.

mcntsh · 2025-07-18T13:10:35 1752844235

Claude doesn't look at advertisements.

CaptainFever · 2025-07-18T07:11:23 1752822683

I'm surprised older OpenAI agents respected robots.txt.

Expecting AI agents to respect robots.txt is like expecting browser extensions like uBlock Origins to respect "please-dont-adblock.txt".

Of course it's going to be ignored, because it's an unreasonable request, it's hard to detect, and the user agent works for the user, not the webmaster.

Assuming the agent is not requesting pages at an overly fast speed, of course. In that case, feel free to 429.

Q: but what about botnets-

I'm replying in the context of "Users will be installing browser extensions or full browsers that run the actions on their local computer with the user's own cookie jar, IP address, etc."

mountainriver · 2025-07-17T21:40:37 1752788437

We have a similar tool that can get around any of this, we built a custom desktop that runs on residential proxies. You can also train the agents to get better at computer tasks https://www.agenttutor.com/

cma · 2025-07-18T16:00:19 1752854419

> They'll need to allow a proxy configuration or something like that.

You could host a VNC webview to another desktop with a good IP

atmosx · 2025-07-17T18:06:41 1752775601

There are companies that sell the entire dataset of these websites :-) - it’s just one phone call away to solve on the OpenAI side.

pants2 · 2025-07-17T18:27:25 1752776845

It's not about the data, it's about "operating" the site to buy things for you.

arkmm · 2025-07-17T18:42:28 1752777748

Automating applying to jobs makes sense to me, but what sorts of things were you hoping to use Operator on Amazon for?

pants2 · 2025-07-17T18:55:01 1752778501

Finding, comparing, and ordering products -- I'd ask it to find 5 options on Amazon and create a structured table comparing key features I care about along with price. Then ask it to order one of them.

torginus · 2025-07-17T18:22:28 1752776548

Maybe it'll red team reason a scraper into existence :)

jorisboris · 2025-07-17T18:04:21 1752775461

How do they block it?

pants2 · 2025-07-17T18:26:51 1752776811

Certainly there's a fixed IP range or browser agent that OpenAI uses

michaelmrose · 2025-07-17T18:47:59 1752778079

I could imagine something happening on the client end which is indistinguishable from the client just buying it.

Also the AI not being able to tell customers about your wares could end up being like not having your business listed on Google.

Google doesn't pay you for indexing your website either.

esafak · 2025-07-17T18:02:19 1752775339

There needs to be a profit sharing scheme. This is the same reason publishers didn't like Google providing answers instead of links.

causalmodels · 2025-07-17T18:08:58 1752775738

Why does an ecommerce website need a profit sharing agreement?

esafak · 2025-07-17T18:26:45 1752776805

Why would they want an LLM to slurp their web site to help some analyst create a report about the cost of widgets? If they value the data they can pay for it. If not, they don't need to slurp it, right? This goes for training data too.

michaelmrose · 2025-07-17T18:48:52 1752778132

The alternative is the AI only telling customers about competitors wares

esafak · 2025-07-19T15:31:20 1752939080

That's for the publisher to decide. Your argument reminds me of the old chestnut "We're paying you in publicity!"

michaelmrose · 2025-07-19T17:41:37 1752946897

In your example the person is receiving something of tangible value and expecting to pay in near worthless coin.

In the circumstances the merchant would be expecting to receive a valuable service and simultaneously get paid for getting serviced.

More akin to Google paying to index you or going to a lady of the evening and holding your hand out for a tip after.