OpenAI Finally Allows ChatGPT Complete Internet Access

simonw · on Oct 24, 2023

I've been relatively unimpressed with the ChatGPT browse mode.

My problem with it is that it seems to run really obvious, naive searches. Most of the time I ask it something, then see what it's searching for and think "oh no, that's not going to return anything more useful than what I could have found myself".

It's also pretty slow.

I'm very much looking forward to having a search assistant which can go ahead and wade through eg 20 websites about home battery packs and show me a summary table of my options - ChatGPT Browse isn't quite that yet.

hedgehog · on Oct 24, 2023

Metaphor (https://metaphor.systems) has a ChatGPT plugin that works pretty well. I asked for home battery systems, here's what I got: https://chat.openai.com/share/3ce0687d-6ffa-4c02-8957-d8787a...

dwaltrip · on Oct 24, 2023

Agreed. Its memorized knowledge generally seems to be better than its interpretation of a few web search results, often significantly so.

Analyzing and interpreting new information is harder than regurgitating what one already knows, so perhaps this isn’t surprising.

Even gpt-4, as amazing as it is, has quite limited analyzing, reasoning, and interpretation capabilities, compared to skilled humans.

Of course, the fact that we have to even ask this question is kind of unbelievable…

zachthewf · on Oct 24, 2023

Have you tried perplexity.ai? It has become my go-to for LLM + browsing. It is really fast and does a pretty good job with retrieval and summarization.

wordpad25 · on Oct 24, 2023

ChatGPT, Claude and prepexity are each $20/mo

Chatgpt4 seems most powerful but Claude offers unlimited context which can be used to summarize entire books

Prepexity can browse and summarize web

Which one should I subscribe to?

Reviving1514 · on Oct 24, 2023

Kagi ultimate is 25 a month and has Claude, GPT4, and AI web search. Been using it a month or so and it's great.

spott · on Oct 24, 2023

Do you have a source for that? All I can find is here [0], which doesn't mention GPT4 or Claude.

[0]https://help.kagi.com/kagi/plans/ultimate-plan.html

dewey · on Oct 24, 2023

> For Starter and Professional members this mode uses gpt-3.5-turbo, and for Ultimate members it uses gpt-4.

https://help.kagi.com/kagi/ai/assistant.html

spott · on Oct 24, 2023

Thanks! That is good to know.

jve · on Oct 25, 2023

I wonder why they don't list that under ultimate plan? On pricing page the selling point for ultimate is like: Hey, help us test things and help us by giving some money: https://kagi.com/pricing

freediver · on Oct 25, 2023

It is in closed beta.

junon · on Oct 24, 2023

How is Kagi for niche stuff? I'm frequently searching for niche and hard to find programming information about processors and whatnot and google/DDG have only gotten worse with time.

tarboreus · on Oct 24, 2023

I use Kagi. Probably not better than Google, but often with less junk. It's much better than DDG. The extra features are nice, as are the customizations (boost specific domains, etc.) I thought I'd be !g-ing to find things on Google, but I rarely do, and I'm constantly researching niche stuff. When I do try !g, I generally find that Google doesn't have the results, either. Kagi also seems to do better with really old results that Google has memory holded.

reaperman · on Oct 24, 2023

Can also try Phind.com

carlosjobim · on Oct 25, 2023

You can try it for free. That's the only way you'll know if it suits you.

303uru · on Oct 24, 2023

Wait what? How did I not know this. I need to consolidate my subs into Kagi. Thanks for the info!

aitchnyu · on Oct 25, 2023

Which one is powering their bot which does citations?

Reviving1514 · on Oct 25, 2023

I believe it's Claude 2.

latchkey · on Oct 24, 2023

The only bummer is that it doesn't have DALLE.

aussiegreenie · on Oct 24, 2023

Bing has Dalle-3 access for free.

latchkey · on Oct 24, 2023

Just tried it, the free version is really slow (it has been going on 7 minutes now to just generate the first image and still isn't complete) compared with what I'm getting from OpenAI. I'll keep the $20/mo on that, it is worth it.

CSMastermind · on Oct 24, 2023

Do you know what model perplexity uses under the hood?

I just tried it out with my normal test queries and found it to be far worse than Google's Bard.

moffkalast · on Oct 24, 2023

To be fair, Bard has massively improved recently and can in my brief experience even outperform 3.5-turbo for lots of things.

behnamoh · on Oct 24, 2023

Except that it still has a short ctx size.

brianjking · on Oct 24, 2023

I'd disagree by far. I use Perplexity as my main search engine and am a paid subscriber. Copilot with GPT-4 or Claude is fantastic.

13rac1 · on Oct 24, 2023

Paid accounts can use the Perplexity LLM, GPT-4, or Claude 2.

simonw · on Oct 24, 2023

Not in a few months, I should spend more time with that

brianjking · on Oct 24, 2023

I love Perplexity.

civilitty · on Oct 24, 2023

It’s also pretty useless at understanding any page that’s even remotely complex.

I asked it to tell me when the next baseball home game is so I can avoid traffic when going to the downtown library and it couldn’t answer even that basic question. The search results defaulted to the team’s official calendar but the day numbers were displayed as images so they could style them in the team’s font, making them invisible to ChatGPT.

rrrix1 · on Oct 24, 2023

This example is more likely due to that specific page/site lacking any kind of accessibility information (e.g. how people who are blind, deaf or have other disabilities can use the internet), rather than being large or complex.

Relatedly, in my experience, for some reason most internet properties relating to anything sports related (MLB, NFL, NHL, or even small town little league) have absolutely terrible, extremely over complicated user interfaces.

glenstein · on Oct 24, 2023

How come the lack of disability info would make it hard for ChatGPT to interpret the page? I think I missed a step there.

doug_durham · on Oct 24, 2023

Because on accessible pages there is alt-text for all graphical elements so that screen readers can get the content for visually impaired users.

remedan · on Oct 24, 2023

ChatGPT seems to be able to parse images pretty well now. I wonder why they don't simply feed it a screencap of the rendered page instead of the html.

londons_explore · on Oct 24, 2023

Takes much longer to make a screencap of an html page. Modern slow JS and many subresources coming from a slow server, combined with the fact there is no reliable signal to say a page is done loading, generally mean you won't be getting a screencap in less than about 10 seconds.

djbusby · on Oct 24, 2023

No alt attributes or does GPT miss those?

SushiHippie · on Oct 24, 2023

https://kagi.com/fastgpt?query=What+exactly+is+the+chat+cont...

Kagis' FastGPT is pretty good in that regard (it uses kagi search + claude from anthropic) and it's really fast. Their other AI search/chat which is in beta currently (for ultimate users) is even better IMO.

dcow · on Oct 24, 2023

Have you tried phind.com? It even cites its sources.

danielbln · on Oct 24, 2023

Or perplexity.ai which I find better than phind. Faster, looks better, more features.

Tempest1981 · on Oct 24, 2023

Perplexity now requires dismissing 2 prompts before use, every single time. (Install their iOS app. Log in with Google.)

behnamoh · on Oct 24, 2023

Why every successful web app gets plagued by these things?…

dzhiurgis · on Oct 25, 2023

Wish it had my use case - go to website foo, copy it's CSS and apply to my framework of choice

acheong08 · on Oct 25, 2023

Bard is good for quick searches

TechRemarker · on Oct 24, 2023

Tried Browse with Bing a few times, but unlike the normal conversations with Bing, the results from Browse with Bing other than being more up-to-date presumably, gave seemingly very generic answers based on if I had simply searched rather than using all it's power like normal and simply including the latest up-to-date data and took a while to run. So it's a start and hopefully that feature will improve as it enters this new stage.

enoch2090 · on Oct 25, 2023

When I use ChatGPT browse it's mainly two cases:

1. I don't want to read through all materials in that topic but want an overview.

2. I can't describe the topic with the precise language for it, hoping ChatGPT can translate my naive description to the vocabulary for that field.

ChatGPT does bad in both of these cases. For (1) it just randomly browses 3~4 search results, as for (2) it always try to search in my original vocabulary. Why don't I search myself then, but to wait for the slow response?

singularity2001 · on Oct 24, 2023

It's the same feature as before, just without the 'beta' toggle, right?

tiahura · on Oct 24, 2023

DallE is so nerfed now it balks at ninja battle in the style of bob ross.

ShamelessC · on Oct 24, 2023

That worked for me just now. I agree that it was deliberately censored though.

Tao3300 · on Oct 25, 2023

Makes sense. Signs of human life in Bob Ross paintings come down to one person in one painting, and IIRC one instance of smoke coming out of a cabin chimney.

A ninja battle in the style of Bob Ross is going to be some trees and mountains while the ninja battle is assumed to be happening out of view on the other side of it.

michelb · on Oct 24, 2023

The amount of sites I regularly use with ChatGPT that are blocking AI agents has increased to the point where this feature is not that useful for me anymore. I can only see that amount increasing.

sebosp · on Oct 24, 2023

I wonder if AI generated web pages block AI agents from indexing their content, like, one engine indexing the other's content in a loop until the amount of digital garbage is so gigantic that is the end of the information era, or how are we ever stopping this? What is our failsafe?

ethbr1 · on Oct 24, 2023

AI Kessler Syndrome. https://en.m.wikipedia.org/wiki/Kessler_syndrome

sebosp · on Oct 24, 2023

Is there an approximation/ratio in which the amount of digital garbage/hallucinations online generated by AI is so big that it cannot be used to train AI itself? Like are AI companies running against the clock because, say, in 5 years the internet will be flooded by false information to such an extent that it would render the internet as an invalid training ground. In a way requiring a snapshot of the internet pre-AI, because this is click bait problem times infinity it feels like

__loam · on Oct 25, 2023

It's too late already if you want to just scrape random horseshit on the internet. There will be real money in large expert generated data sets. AI is also a potential epistemology nightmare. It can cement bad knowledge and bury new more up to date knowledge in a sea of bullshit.

ethbr1 · on Oct 24, 2023

Aka "t-minus how many days until OpenAi wants to buy archive.org"

shostack · on Oct 24, 2023

If anything, AI work feels like it has accelerated everyone with any dataset of value pulling up their drawbridge, reducing open interconnectivity of the web in hopes of charging for data access.

This started with scrapers and aggregation sites and has gotten noticeably worse.

rideontime · on Oct 24, 2023

Will blocking User-agent: GPTBot in robots.txt work for this too?

simonw · on Oct 24, 2023

It uses a different user agent, but yeah it still obeys robots.txt.

https://platform.openai.com/docs/plugins/bot

    User-agent: ChatGPT-User
    Disallow: /

judge2020 · on Oct 24, 2023

This time it's not scraping the internet (like a robot) but actually acting as a direct user agent for the human typing their prompt, so I wouldn't be against them ignoring robots.txt.

karaterobot · on Oct 24, 2023

My understanding is that the philosophy behind robots.txt is owners not wanting their content automatically included in someone else's product, if not duplicated and recorded wholesale. The important idea seems to be ownership, not the ability to browse. If OpenAI had two agents, one with no memory, and one with a memory, that would be better: you could disallow ChaptGPT-storage and allow ChatGPT-user, for example. Barring that, I'd be afraid allowing ChatGPT access to my website means my website is now part of the ChatGPT corpus.

JohnFen · on Oct 24, 2023

> My understanding is that the philosophy behind robots.txt is owners not wanting their content automatically included in someone else's product

Not really. That use case is done, of course, but the primary purpose of robots.txt is to help crawlers by indicating what parts of the website are appropriate to be searched and what parts aren't.

Robots.txt is not intended primarily as a means to defend a site against crawlers. That's why it relies on the goodwill of crawlers to work.

spott · on Oct 24, 2023

https://platform.openai.com/docs/plugins/bot/chatgpt-user-an...

There are two agents, but apparently you can't allow one but disallow another.

tomjen3 · on Oct 24, 2023

As I recall it it was also to block access to scripts, so it wouldn't e.g get stuck pulling pointless stuff out of your local search page.

JohnFen · on Oct 24, 2023

It's not a crawler, but it seems likely that OpenAI would take the point of view that as long it's going to a website, it may as well keep a cached copy to use for training later.

setuid9002 · on Oct 24, 2023

Wait, so a simple web scraper script has to comply with robots.txt. But if I want to completely ignore the robots.txt, I only have to make my script more complicated (ChatGPT)?

pixl97 · on Oct 24, 2023

I'd like to consider this a difference between script action and user action.

For example if you make a web page a user pulls up that calls another webpage, is that a user action, a script action, a mix of both? I personally would consider it a user action.

seanthemon · on Oct 24, 2023

And make it complicated enough (a human) and no .txt can stop them!

kelseyfrog · on Oct 24, 2023

Yeah, it's like one weird trick for web scraping.

JohnFen · on Oct 25, 2023

Nobody actually has to comply with robots.txt.

luke-stanley · on Oct 24, 2023

I thought the same but OpenAI may train on user chats by default. Maybe if training's off, they could ignore robots.txt, or they could flag the content to be skipped.

stingraycharles · on Oct 24, 2023

That makes no sense. It’s still automated and not an actual human, and it’s still distilling information from the internet.

rolisz · on Oct 24, 2023

Your browser is also automating things, you're not resolving DNS and writing all the http requests by hand.

jtbayly · on Oct 24, 2023

So because something is automated somewhere, nobody has to follow robots.TXT anymore?

Spivak · on Oct 24, 2023

You have it backwards, when a human initiates the request, like clicking on a link, your browser (rightfully) will ignore robots.txt. Unattended requests like scrapers do respect robots.txt. This is a case where ChatGPT is acting more like a browser with a funny way of displaying the final result. Each request is initiated by a human so it would likely be reasonable to ignore it in this case.

worthless-trash · on Oct 24, 2023

I have found the next topic for my blog post.

Challenge accepted!

charcircuit · on Oct 24, 2023

This is only bing search. It can't yet do things like ask a question on stack overflow, or join a discord server to ask a question yet.

rsolva · on Oct 24, 2023

Are there any projects that enable local AIs (like ollama) to fetch/process information from the web?

jrflowers · on Oct 25, 2023

Finally software to explain ladder theory in timecube language and vice versa

twilightzone · on Oct 24, 2023

Google search is dead.

redox99 · on Oct 24, 2023

Nah. If you actually try this, you'll see it's not that good.

jacobtt21 · on Oct 24, 2023

I thought Google's given Bard access to the internet?? I guess people still don't think to use Bard over ChatGPT

generalizations · on Oct 24, 2023

Even the Bard UI is worse. Being able to read the ChatGPT output as it's generated, instead of staring at a blank screen, is a massive time saver. And, I prefer ChatGPT trying stuff rather than Bard telling me that 'it can't do that right now' and 'we're trying to get better'. I've tried Bard, it's just not worth it.

maksimur · on Oct 25, 2023

> Being able to read the ChatGPT output as it's generated, instead of staring at a blank screen, is a massive time saver.

There's an option in Bard's settings to enable real time replies.

PunchTornado · on Oct 24, 2023

My experience is different. I think Bard is giving better answers than chatgpt so I really use it daily even more than google search.

londons_explore · on Oct 24, 2023

Better than the free or the paid chatgpt?

browningstreet · on Oct 24, 2023

I use everything to create newsletter/podcast content and Bard is consistently the best for me. Then Claude and Bing, and ChatGPT last. By a mile. I have ChatGPTPlus, with chat access to GPT4 but not API access to GPT4.

Just got access to Bard API and hoping that continues delivering what I have been happy with...

airstrike · on Oct 24, 2023

Unfortunately for Google, Bard is irrelevant by means of being late and inferior to GPT-4

maleldil · on Oct 24, 2023

Bing has had GPT-4 + internet search forever. It hasn't killed Google yet.

tacone · on Oct 25, 2023

Yes but it leaves conversations when it wants to. Also one time I have got source links which it (transparently) labeled as "ads". AI chat based advertising might be the next thing, be careful.

maksimur · on Oct 24, 2023

At the very least we need free and unlimited access to even think about dethroning Google.

tomjen3 · on Oct 24, 2023

Has been for a while due to Bing chat.

Alifatisk · on Oct 25, 2023

Google Bard have had Internet access for a long time.

ChrisRR · on Oct 24, 2023

So this is when GhatGPT becomes racist?

bone_frequency · on Oct 25, 2023

Always has been. Even with all the safeguards, two days ago I was having a conversation with it and it just happily blurted out that one of the reasons Iceland has low crime is because its population is racially homogeneous.

Alifatisk · on Oct 25, 2023

That’s a weird take?

i_k_k · on Oct 24, 2023

So this means it can actually do things now, at least in principle, right?

littlestymaar · on Oct 24, 2023

From now on the plan is clear:

1. Scam people over the internet to get money

2. Hire hitmen to kill its makers.

3. ?

4. Achieve world domination.

senectus1 · on Oct 24, 2023

Not according to ChatGPT a few seconds ago

Default (GPT-3.5)

User do you have access to search the internet yet? ChatGPT I do not have the capability to search the internet or access real-time information. My knowledge is based on the text that I was trained on, and my training only includes information up until September 2021. I can provide information and answer questions to the best of my knowledge up to that date, but I cannot browse the web or access current information.

simonw · on Oct 24, 2023

Asking ChatGPT about its own capabilities rarely returns useful results.

Its training data predates its creation, and the model doesn't get updated every time they ship a new feature for it.

See also: https://simonwillison.net/2023/Mar/22/dont-trust-ai-to-talk-...

lsedgwick · on Oct 24, 2023

Please don't paste ChatGPT or BARD answers as HN comments in general? In this specific case, no, LLMs don't reliably know about themselves. They're trained on a big corpus of internet text, then trained by rough reinforcement learning to say and not say certain things about themselves, then given a little more information about themselves in system prompts.

famouswaffles · on Oct 24, 2023

A LLM that can use tools is probably going to know it can use tools. Either it's trained to do so or the information is embedded in the context.

3.5 can't browse, only 4 so the above is perfectly correct.

lsedgwick · on Oct 25, 2023

Fair, you're probably right in this case.

jtbayly · on Oct 24, 2023

If you’re right, then this is probably the best proof that I’ve seen that ChatGPT isn’t even remotely conscious.

Terretta · on Oct 24, 2023

Imagine a distant future where there're so many crimes and so few judges you're cryofrozen awaiting trial.

If you were repeatedly flash frozen and flash unthawed to be asked questions about yourself, with the same memories upon thawing as when frozen, for those moments you weren't frozen would you be "remotely conscious"?

I think you'd think you think, don't you think?

jtbayly · on Oct 25, 2023

Imagine your hand was in a cast and then you were cryogenically frozen. While frozen your cast was removed. Then you were unfrozen and asked if you could move your hand.

I think you’d be able to answer the question, don’t you?

sebosp · on Oct 24, 2023

Is this from a book? If not please write it

schleck8 · on Oct 24, 2023

It's a premium feature

famouswaffles · on Oct 24, 2023

3.5 can't browse. Only 4

gumballindie · on Oct 24, 2023

Is there a complete list of ip addresses openai uses for scraping? I suppose they dont honour robots.txt, despite claiming to do so, and this may be one way to reliably block them.

londons_explore · on Oct 24, 2023

I've seen no suggestion of them not honouring robots.txt...

I suspect that any case of them not honoring it is probably due to your site content being available on archive.org or common crawl or some other service.

kelseyfrog · on Oct 24, 2023

Is there an open license or ToS that would disallow OpenAI access?

gumballindie · on Oct 24, 2023

Apparently openai and the rest of defendants claim that if it’s on the internet it’s “fair use”. Meaning they’ll do all they can tk steal your work regardless of licensing or robots.txt rules.

kelseyfrog · on Oct 24, 2023

That's a funny argument.

It's a lot like saying, "if it's on Github, I can ignore your LICENSE file" or "if it's on DA I can ignore your CC-NC license." Both of these seem technically enforceable (if not expensive) so I'm not quite understanding what OpenAI is grounding their justification in.

It seems much more likely that OpenAI stands to make a lot of money by arguing that it's fair use and then back filling in an argument rather than making one from first principles.

gumballindie · on Oct 24, 2023

A”I” depends on data. The more the better. That’s why a lot of these people push for a disregard of people’s property (digital content, behaviour data, etc). Problem is in stealing all of it they will demotivate people from creating quality content and even alienate them from the dead internet.

OpenAI leads the cult.

JohnFen · on Oct 24, 2023

If your concern is that you don't want the contents of your site to be used to train AI, then OpenAI is not the only entity you need to protect against. Other crawlers may not respect robots.txt, and blocking by IP address is just a game of whack-a-mole that you can't win.

This is why I took down some of my sites, and put a login in front of the rest. Until I have some solid means of defense, I can't think of any other effective approach.