Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> OpenAI DO NOT WANT your private data in their training data

But they do want it. I can see many old chat logs.

Data is a liability. Does "clear conversations" in chat.openai.com actually remove them? Or jst mark them as "deleted", but they remain in a database. I just did a data export, then a clear conversation, then another data export. The second export was empty, which seems suspiciously fast to me



I didn't say they weren't storing the data - obviously they have to store it to provide the (popular) chat log feature.

I said they didn't want it in the altar training data that they use for the pre-training phase of training future language models.


I'm genuinely trying to understand (based on this and another comment above): wouldn't storing data for pre-training vs fine-tuning carry the same risks?


What risks are you talking about?

If you mean the risk that OpenAI will have their own security hole that leaks that stored data then yes.

If you mean the risk that someone will ask a question about your company and ChatGPT will answer with some corporate secrets then no.

This all depends very much on what they are using the ChatGPT data for. My theory is that they treat it very carefully to avoid "facts" from it being absorbed into the model - so even "fine tuning" may be inaccurate terminology here.

I really, really wish they would be more transparent about how they use this data.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: