I hope that's right. I guess you (I mean someone with Bing Chat access, which I don't have) could test this by asking Sydney/Bing to respond to (summarize, whatever) a url that you're sure Bing (or more?) has not indexed. If Sydney/Bing reads that url successfully then there's a direct causal chain that involves Sydney and ends in a GET whose url first enters Sidney/Bing's memory via chat buffer. Maybe some MSFT intermediary transformation tries to strip suspicious url substrings but that won't be possible w/o massively curtailing outside access.
But I don't know if Bing (or whatever index Sydney/Bing can access) respects noindex and don't know how else to try to guarantee the index Sydney/Bing can access will not have crawled any url.
Servers as a rule don't access other domains directly, for the reasons you cite and others (speed, for example). I'd be shocked if Bing Chat was an exception. Maybe they cobbled together something really crude just as a demo. But I don't know any reason to believe this.
Doh -- of course, thanks -- I should have gone there first. Would be interesting to see RemoteAddr but I think the RemoteAddr value doesn't affect my worry.
Sure, but I think directness doesn't matter here -- what matters is just whether a url that originates in a Sydney call chain ends up in a GET received by some external server, however many layers (beyond the usual packet-forwarding infrastructure) intervene between whatever machine the Sydney instance is running on and the final recipient.
Yes. And chat outputs normally include footnoted links to sources, so clicking a link produced by Sydney/Bing would be normal and expected user behavior.
I think the question comes down to whether Sydney needs the entirety of the web page it's referencing or whether it get by with some much more compressed summary. If Sydney needs the whole web real time, it could multiply world web traffic several fold if it (or Google's copy) becomes the dominant search engine.
Directness is the issue. Instead of BingGPT accessing the Internet, it could be pointed at a crawled index of the web, and be unable too directly access anything.
Not if the index is responsive to a Sydney/Bing request (which I imagine might be desirable for meeting reasonable 'web-enabled chatbot' ux requirements). You could test this approximately (but only approximately, I think) by running the test artursapek mentions in another comment. If the request is received by the remote server 'in real time' -- meaning, faster than an uninfluenced crawl would hit that (brand new) url (I don't know how to know that number) -- then Sidney/Bing is 'pushing' the indexing system to grab that url (which counts as Sydney/Being issuing a GET to a remote server, albeit with the indexing system intervening). If Sydney/Bing 'gives up' before the remote url receives the request then we at least don't have confirmation that Sydney/Bing can initiate a GET to whatever url 'inline'. But this still wouldn't be firm confirmation that Sydney/Bing is only able to access data that was previously indexed independently of any request issued by Sydney/Bing...just lack of confirmation of the contrary.
Edit: If you could guarantee that whatever index Sydney/Bing can access will never index a url (e.g. if you knew Bing respected noindex) then you could strengthen the test by inputting the same url to Sydney/Bing after the amount of time you'd expect the crawler to hit your new url. If Sidney/Bing never sees that url then it seems more likely that can't see anything the crawler hasn't already hit (and hence indexed and hence accessible w/o Sydney-initiated GET).
(MSFT probably thought of this, so I'm probably worried about nothing.)
But I don't know if Bing (or whatever index Sydney/Bing can access) respects noindex and don't know how else to try to guarantee the index Sydney/Bing can access will not have crawled any url.