I hope that's right. I guess you (I mean someone with Bing Chat access, which I ...

joe_the_user · on Feb 16, 2023

Servers as a rule don't access other domains directly, for the reasons you cite and others (speed, for example). I'd be shocked if Bing Chat was an exception. Maybe they cobbled together something really crude just as a demo. But I don't know any reason to believe this.

artursapek · on Feb 16, 2023

This should be trivial to test if someone has access to Bing Chat. Ask it about a unique new URL on a server you control, and check your access logs.

hyporthogon · on Feb 16, 2023

Doh -- of course, thanks -- I should have gone there first. Would be interesting to see RemoteAddr but I think the RemoteAddr value doesn't affect my worry.

hyporthogon · on Feb 16, 2023

Sure, but I think directness doesn't matter here -- what matters is just whether a url that originates in a Sydney call chain ends up in a GET received by some external server, however many layers (beyond the usual packet-forwarding infrastructure) intervene between whatever machine the Sydney instance is running on and the final recipient.

danohuiginn · on Feb 16, 2023

Isn't it sufficient for Sydney to include the URL in chat output? You can always count on some user to click the link

hyporthogon · on Feb 16, 2023

Yes. And chat outputs normally include footnoted links to sources, so clicking a link produced by Sydney/Bing would be normal and expected user behavior.

joe_the_user · on Feb 16, 2023

I think the question comes down to whether Sydney needs the entirety of the web page it's referencing or whether it get by with some much more compressed summary. If Sydney needs the whole web real time, it could multiply world web traffic several fold if it (or Google's copy) becomes the dominant search engine.

One more crazy possibility in this situation.

fragmede · on Feb 16, 2023

Directness is the issue. Instead of BingGPT accessing the Internet, it could be pointed at a crawled index of the web, and be unable too directly access anything.

hyporthogon · on Feb 16, 2023

Not if the index is responsive to a Sydney/Bing request (which I imagine might be desirable for meeting reasonable 'web-enabled chatbot' ux requirements). You could test this approximately (but only approximately, I think) by running the test artursapek mentions in another comment. If the request is received by the remote server 'in real time' -- meaning, faster than an uninfluenced crawl would hit that (brand new) url (I don't know how to know that number) -- then Sidney/Bing is 'pushing' the indexing system to grab that url (which counts as Sydney/Being issuing a GET to a remote server, albeit with the indexing system intervening). If Sydney/Bing 'gives up' before the remote url receives the request then we at least don't have confirmation that Sydney/Bing can initiate a GET to whatever url 'inline'. But this still wouldn't be firm confirmation that Sydney/Bing is only able to access data that was previously indexed independently of any request issued by Sydney/Bing...just lack of confirmation of the contrary.

Edit: If you could guarantee that whatever index Sydney/Bing can access will never index a url (e.g. if you knew Bing respected noindex) then you could strengthen the test by inputting the same url to Sydney/Bing after the amount of time you'd expect the crawler to hit your new url. If Sidney/Bing never sees that url then it seems more likely that can't see anything the crawler hasn't already hit (and hence indexed and hence accessible w/o Sydney-initiated GET).

(MSFT probably thought of this, so I'm probably worried about nothing.)