Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm not sure I see how this is meaningfully different than the threat posed by a search engine. It's a very real threat, and I've always done my best to search from a browser context that isn't logged in as a result. But it's not a new threat, or something distinctive to AI.


Because you can't ask the search engine to summarize the views or thoughts or whatever, of the user. You have to scroll through them by the hundreds and see if any obvious nuggets stand out that you might be interested in.

Yes, search engine history is private too and can reveal stuff you want to remain private. But you also need to see the browser history and the contents of those pages, together with the search history to see what the user was actually interested in reading to get close to the same level of data the the LLM has about you.


You might be surprised at the amount of people that interact with a search engine in the same way they do with an LLM. Especially now that many put an LLM widget at the top of results for queries like that.


Conversational queries are a double edged sword though. You will have a lot more text to dig through. With RAG it’s easy to cut through all of that.


One doesn't have to scroll through them and find the nuggets themselves; it's digital data. It can be copied[1].

Once copied, one can then paste it into an LLM and have it find the nuggets.

[1]: And by "copied," I mean... even a long series of hasty cell phone photos of the screen is enough for ChatGPT to ingest the data with surprising accuracy. It's really good at this kind of thing.


It sounds to me like you’re agreeing with the person above who said ChatGPT isn’t a new threat, but your explanation uses ChatGPT. In other words, “ChatGPT isn’t a new threat because even with a search engine you can use ChatGPT to look through the queries”.


ChatGPT is absolutely a new "threat", at very least because it trivializes the automation of coarse analyzation of unstructured information -- including a user's search history.


To add on to this, people tend to search short words and phrases in Google. Searching "Charlie Kirk assassination" for example doesn't really tell much about a person's political leanings. People have full on conversations with ChatGPT which makes their thoughts much clearer.


You funnel clickstream data into inference engines. intelligence agencies have had these capabilities for decades.


I mean, now you can take their entire search history and feed it on an LLM.


> I've always done my best to search from a browser context that isn't logged in as a result.

It isn't sufficient to avoid being logged in — you have to ensure that the search strings alone, grouped by IP address or some other signal, aren't enough to identify you. When AOL publicly released an archive with 20 million search strings in 2006, many users got exposed:

https://en.wikipedia.org/wiki/AOL_search_log_release

There's also the issue of a site's Terms of Service when not logged in, which may allow an AI to be trained on your interactions — which could potentially bleed compromising information into the generative results other people see.


Anonymized data is basically a smokescreen. With metadata points in the hundreds it is trivial to backtrack the origin of almost any information.

The only real anonymized data is no information kept at all.


Oh, I know, I'm just adding that detail to say that I'm not dismissive of the threat we're talking about. It's a real threat, I'm just saying it's an old one.


Which search and AI services reliably discard logs?

It's my understanding that if you configure your Google account correctly, logged-in searches will be discarded. However, I'm less certain about whether Google retains data for non-logged-in queries which allows for aggregation by IP address, etc.

Then there's DuckDuckGo, which at least the way it's advertised, implies that they discard search strings. Their "duck.ai" service stores prompt history locally, but they claim it's not retained on their machines, nor used for training by the various AI providers that duck.ai connects to[1].

In contrast, ChatGPT by default uses non-logged-in interactions to train their model[2].

[1] https://duckduckgo.com/duckduckgo-help-pages/duckai/ai-chat-...

[2] https://help.openai.com/en/articles/7730893-data-controls-fa...


I think it's related but different than simply a search engine, since AI:

- Entices you to "confess" (or overshare) things about yourself, in the form of questions / debate, because the chat bot is built for this. The "conversation" aspect is something you didn't get with search engines.

- Then, the tool itself makes it easier for someone else to draw conclusions and infer things from the "model" the AI built of you, even if you didn't explicitly told it these things.

Maybe Google can build a profile of me based on my searches and use of their products, but I bet ChatGPT is at least an order of magnitude more useful to draw inferences about me, my health status, and my opinions about stuff.


In theory you could accomplish this by combing through search history.

In practice, the scenario in OP is unlikely to be practical with search history alone. It’s much less convenient for CBP to ask someone to pull up their Google search history. And even if they did, it doesn’t work as well. Officers don’t have infinite time to assess every person.

So I would call it a new threat.


They could also take your traditional search and chat history, feed it into an LLM, and ask it the same questions. Once you start doing that for one person... you could just feed everyone's chat and search history into an LLM, and ask it "who is the most dangerous" or whatever you want to ask.

Its just another version of the classic computing problem "computers might not make a new thing possible, but it makes it possible to do an old thing at a scale that fundamentally changes the way it works"

This is the same as universal surveillance... sure, anyone could have followed you in public and watched where you are going, but if you record everything, now you can do it for everyone at any time. That changes how it works.


I must not have understood the article correctly because I took ChatGPT to be a stand in for LLM technology in general. But I think I am wrong.


That's how I meant it.


Right, and so I interpreted your comment

> I'm not sure I see how this is meaningfully different than the threat posed by a search engine.

as being about the world pre-LLMs and post-LLMs, not about Google in 2025 vs ChatGPT in 2025.

For the latter comparison, I agree, and in fact Google probably has an even richer history of people over time.

But like any “X is just Y” explanation, the former comparison fails to address the emergent effects of Y becoming faster/cheaper/better.


1. Scale and Automation matter always. It wouldn't be the first time, something that was previously already technically possible goes from rarely done to widespread problem.

2. The whole benefit about using LLMs, especially for search is the understanding of logic and intent behind your query, which means that when people use LLMs, they often aren't just sending the half-garbled messes they send in google search, they are sending in queries that make clear the intent behind the queries (so it can better answer it). This is not information you are guaranteed to obtain roving through browser history.

3. Today, and with ~ 5 billion users, Google search has 8.5 billion searches per day. Today, with some ~800M Weekly active users, ChatGPT has some 2.5 billion messages per day. Not only are people more revealing per query, they are clearly having a lot more of it per user.


Re: 3 - Do we know how many of those chatgpt queries are actually people? Cause i can't think of a use case to automate things with Google searches, but i can think of a million ways to automate bullshit with chatgpt. How much of that queries per user stat is inflated with the enterprise accounts making hundreds of queries a minute? How many of those are bot farms automating fake recipe websites and how many are actual people having real and revealing conversations?


The number is chatGPT user messages (not requests via the API) so they are no enterprise accounts making hundreds of queries per minute or bot farms automating fake recipe websites.

Based on Open AI's Usage Breakdown[0], as of July 2025, ChatGPT processes 1.9B Non-Work and 716 M Work Messages per day.

[0] https://www.nber.org/system/files/working_papers/w34255/w342...


The level of time and effort being so low increases the likelihood of this happening. It's the same sort of reason there's red teaming for ensuring AI doesn't help bad actors with chemical weapons, lowered barriers for bad things is a concern even if the bad things were possible before.


A conventional keyword-based search engine is unlikely to actively and subtly encourage a user to (A) reveal secrets and blackmail material (B) become entrapped in behavior the Current Authority will punish them for.

A better "some of this isn't new" comparison would be to imagine you're communicating with an idiot-savant human employee, someone can be tasked with hidden priorities and will do anything to stay employed at their role. What "old" threats could occur there?

That makes for a rather different threat-model.


I think the most important difference is that chats are rich in context and, depending on how you use it, closer to journal entries than search queries. I also think it doesn't have to be new to be significant, it if is expanding the frontier of an existing vulnerability.


I don't understand how you don't understand. Trying to recreate someone's internal thoughts and attitudes from looking at their search history is a pale imitation of this. Just the thought experiment of a customs officer asking ChatGPT to summarise your political viewpoints was eye opening to me.


How so? You'd have a very, very good understanding of my political viewpoints from the log of my Google searches. I'm asking sincerely, not simply to push back on you.


It seems fairly easy to figure this out with a little thought…

When talking to a chatbot you're likely to type more words per query, as a simple measure. But you're also more likely to have to clarify your queries with logic and intent — to prevent it going off the rails — revealing more about the intentions behind your searches than just stringing together keywords.

It'd be harder to claim purely informational reasons for searching if your prompts betray motive.


(Not op)

Maybe not you in particular, but I expect people to be more forthcoming in their writing towards LLMs vs a raw google search.

For example, a search of "nice places to live in" vs "I'm considering moving from my current country because I think I'm being politically harassed and I want to find nice places to live that align with my ideology of X, Y, Z".

I do agree that, after collecting enough search datapoints, one could piece together the second sentence from the first, and that this is more akin to a new instance of an already existing issue.

It's just that, by default I expect more information to be obtainable, more easily, from what people write to an LLM vs a search box.


Asking Google for details about January 6th is different than telling ChatGPT I think the election was stolen, and then arguing with it for hours about it.

It would be harder to frame it in front of a jury that what you typed wasn't an accurate representation of what you were thinking and that you were being duplicitous to ChatGPT.


I don't think it really is in the circumstances we contemplate this threat in. In both the search engine case and the ChatGPT case, we're talking about circumstantial evidence (which, to be clear: is real and legally weighty in the US) --- particularly in the CBP setting that keeps coming up here, a Border Agent doesn't need the additional ChatGPT context you're talking about to draw an adverse conclusion!

I think at this point the fulcrum of the point I'm making is that people might be inadvertently lulling themselves into thinking they're revealing meaningfully less about themselves to Google than to ChatGPT. My claim would be that if there's a difference, it's not clear to me it's a material one.


Ah. Yeah you're more boned if you confess to ChatGPT that you've killed your wife than if you just googled for how to bury a body, but at the edges where people are using ChatGPT as a therapist and someone disappears, and the person who did it is smart enough to use incognito mode to search how to bury a body so it doesn't show up in court, how everyone feels about the deceased is gonna get looked at, including ChatGPT conversations. That's new.


If a nefarious actor opens your browser what is the process for them to quickly ascertain your viewpoint on issue X?

Write a script to search and analyse? Versus just asking their specific question.


Grab search history and ask an AI to analyze it.


So a few steps more than just ask the AI, and still relies on AI.


The point is that the data is there from search engines (and more data, from more people anyway). Whether you automate reading it or do it manually, it is 100% unrelated to the topic of ChatGPT being an informant.


Google completely owns most people's browsers, and the government has made it clear that they do not care.


users type a lot more into gpt and share a lot more of their personal files and access to their cloud services


Users type a lot more often into search engines, and the largest one keeps files on all of their egresses and correlates it with full advertising profiles and what they do within other google properties (which may include their browser itself.)


Google has all of that and more, right? They control the browser and devices that you use to access an AI app. They control the content shown to you in leisure and work. ChatGPT doesn't have that much exposure and surface area yet


Apple has a lot of customers


It's far more detailed and personal given the drill down nature of different things.

Combined with how personal people believe their phones are and it might not be that big of a stretch.


Harder to use AI tools without account or payment tied to your identity.


Seriously, why is this always the first comment on HN? The formula never fails:

1. Criticism of anything related to AI

2. Comment: "I don't see how this is any different than phenomenon X that came before it".

I have seen this by now maybe 400 times.


My personal favorite in this genre was the commenter who said that the heart-rate monitoring features of an Apple Watch were irrelevant because they could always check their own.

Scale & automation matter.


Turns out, we're the stoichastic parrots all along!


And suddenly 8 billion people cried out "I am unique and special" in unison.



The difference is friction


I think the underlying assumption is that people say very different things to an anthropomorphized (even if in their own parasocial head) chatbot than in other online spaces.

I can see why, mainly because of the parasocial relationship that probably many people tend to form with these things that talk to us like they are humans.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: