Twitter started irreversibly feeding users’ data into its “Grok” AI technology i...

Lerc · on Aug 14, 2024

What does irreversibly mean in this context? It seems like negative connotations are implied, but I feel like it's like irreversibly baking a cake.

makin · on Aug 14, 2024

Once the data is "compressed" into the model it cannot be easily removed without starting the training over.

Lerc · on Aug 14, 2024

So you mean like

"He used one of my eggs to irreversibly make a cake"

It's true, but it would be kind of amazing if it weren't

jug · on Aug 14, 2024

Hmm, it's not that simple, is it? Let's say the AI is trained on the tweet "Ben Adams drove to Mexico yesterday but I still haven't heard from him."

From this knowledge, you can ask the AI "Who has driven to Mexico" and it might know that Ben Adams did, and reply with that.

HOWEVER it's also baked into the model and can't be surgically removed after a complaint. That's the irreversibility part. You can't undo isolated training. You need to provide it a new data set and train it all over again. They won't do that because it's too costly.

The problem with the above example is of course that it can also contain sensitive or private user details.

I've easily extracted the complete song lyrics to the letter from GPT-4 even if OpenAI try to put up guardrails against it due to the copyright issues. AI is really still in the wild west phase...

netrus · on Aug 14, 2024

The irreversibility is still important to highlight, as it is distinctively different from a similar consent issue with search: "Google indexed my website against my will, but I will just forbid them to include me in search results going forward".

zaptrem · on Aug 14, 2024

It is irreversible similar to how a student reading a textbook from LibGen can remember and profit from that information forever. Kinda crazy how many in this community went from champions of freedom of knowledge to champions of megacorps owning and controlling of all of human creation in the span of like two years when it became clear other corporations could profit off that freedom too.

hoseja · on Aug 14, 2024

More like

"He used his eyes to irreversibly read this post"

0x073 · on Aug 14, 2024

If they use Twitter data does grok answer with a 280 character text?

Additional Twitter data is in my eyes mostly low quality content, that's nothing I would want in a AI model.

_imnothere · on Aug 14, 2024

> low quality content

How does it matter even if the quality is high or low? The point is user data was used without consent.

0x073 · on Aug 14, 2024

Yes, but nothing new, other AI models used data that they don't own. Makes it not better, but I think thats the path.

tgsovlerkhgsel · on Aug 14, 2024

> If they use Twitter data does grok answer with a 280 character text?

That may be considered a feature.

ChatGPT seems reasonably concise, Gemini's answers tend to be verbose (without adding meaningful content).

_flux · on Aug 14, 2024

I've lead myself to believe that long responses are actually beneficial for the quality of the responses, as processing and producing tokens are the only time when LLMs get to "think".

In particular, requesting an analysis of the problem first before jumping to conclusions can be more effective than just asking for the final answer directly.

However, this analysis phase, or similar one, could just be done hidden in the background, but I don't think any are doing that yet. From the user point of view that would be just waiting, and from API point of view those tokens would also cost. Might just as well entertain the user with the text it processes in the meanwhile.

seanhunter · on Aug 14, 2024

My understanding is this used to be the case[1] but isn't really true any longer due to things like the "star" method for model training[2]. Empirically it absolutely (circa GPT3) used to be the case that if you prompted with "Explain all your reasoning step by step and then give the answer at the end" or similar it would give you a better answer for a complex question than if you said "Just give me the answer and nothing else" or similar, or asked for the answer first, and then circa gpt-4 answers started getting much longer even if you asked the model to be concise.

That doesn't seem to be the case any more and there has been speculation this is down to the star method being used for training newer models. I say speculation because I don't believe people have come out and said they are using star for training. OpenAI referred to Q* somewhere but they wouldn't be drawn on whether that * is this "star" and although google were involved in publishing the star paper they haven't said gemini uses it (I don't think).

[1] https://arxiv.org/abs/2201.11903

[2] https://arxiv.org/pdf/2203.14465

bboygravity · on Aug 14, 2024

So did OpenAI, why is it only a problem when Twitter itself does it?

squigz · on Aug 14, 2024

I'm pretty sure it's not. I'm pretty sure people have been angry about OpenAI doing the same thing for a while now.

viraptor · on Aug 14, 2024

Has it been proven that OpenAI used twitter for training? I know it knows about the popular tweets, but those are reported in many places, so could be ingested accidentally with other content.

(But regardless, many people raised an issue of OpenAI training from sources they shouldn't be allowed to access, so they're definitely a problem as well)

seydor · on Aug 14, 2024

Twitter bad, but it s not unlawful in their jurisdiction . Don't want it? dont use it

esperent · on Aug 14, 2024

As someone from the EU, hearing this argument over and over from Americans is exhausting.

They provide a product in the EU, therefore they must either follow EU law or exit the EU market. Just like an EU company that provides a product in the US has to follow US law.

seydor · on Aug 14, 2024

I am in the EU.

The line of 'following the law of another country' is grey area on the internet, given that it goes both ways:

EU online companies providing services to US users fail to provide the free speech guarantees that the US laws afford their citizens. That's because all EU countries have more strict laws limiting free speech. Should the EU companies break their own countries' law to satisfy the US audience?

daghamm · on Aug 14, 2024

"EU online companies providing services to US users fail to provide the free speech guarantees that the US laws afford their citizens."

Exactly what is "free speech guarantees" in the context of a private business?

seydor · on Aug 14, 2024

There are now states in the US which voted laws to regulate social media censorship. The US supreme court has declined ruling on them or taking them down based on companies' first amendment rights.

So it seems there are states where a europeans social medium should abide by rules that would most likely contradict european laws, right?

daghamm · on Aug 14, 2024

What are these state laws, can you give me an example?

seydor · on Aug 14, 2024

https://www.texastribune.org/2024/07/01/supreme-court-texas-...

Y_Y · on Aug 14, 2024

> EU online companies providing services to US users fail to provide the free speech guarantees that the US laws afford their citizens. That's because all EU countries have more strict laws limiting free speech. Should the EU companies break their own countries' law to satisfy the US audience?

Could you sharpen up this claim? Like suppose I run a microblogging site but I delete libellous posts and incitements to violence in accordance with my local European law. Am I violating a US law by allowing Americans to use the site?

seydor · on Aug 14, 2024

i m asking the same question

Y_Y · on Aug 14, 2024

My understanding of your post was that you know that it violates US law and so you're asking what should be done. What I am asking is if it really does violate US law, and if so how.