Hmm, it's not that simple, is it? Let's say the AI is trained on the tweet "Ben Adams drove to Mexico yesterday but I still haven't heard from him."
From this knowledge, you can ask the AI "Who has driven to Mexico" and it might know that Ben Adams did, and reply with that.
HOWEVER it's also baked into the model and can't be surgically removed after a complaint. That's the irreversibility part. You can't undo isolated training. You need to provide it a new data set and train it all over again. They won't do that because it's too costly.
The problem with the above example is of course that it can also contain sensitive or private user details.
I've easily extracted the complete song lyrics to the letter from GPT-4 even if OpenAI try to put up guardrails against it due to the copyright issues. AI is really still in the wild west phase...
The irreversibility is still important to highlight, as it is distinctively different from a similar consent issue with search: "Google indexed my website against my will, but I will just forbid them to include me in search results going forward".
It is irreversible similar to how a student reading a textbook from LibGen can remember and profit from that information forever. Kinda crazy how many in this community went from champions of freedom of knowledge to champions of megacorps owning and controlling of all of human creation in the span of like two years when it became clear other corporations could profit off that freedom too.
I've lead myself to believe that long responses are actually beneficial for the quality of the responses, as processing and producing tokens are the only time when LLMs get to "think".
In particular, requesting an analysis of the problem first before jumping to conclusions can be more effective than just asking for the final answer directly.
However, this analysis phase, or similar one, could just be done hidden in the background, but I don't think any are doing that yet. From the user point of view that would be just waiting, and from API point of view those tokens would also cost. Might just as well entertain the user with the text it processes in the meanwhile.
My understanding is this used to be the case[1] but isn't really true any longer due to things like the "star" method for model training[2]. Empirically it absolutely (circa GPT3) used to be the case that if you prompted with "Explain all your reasoning step by step and then give the answer at the end" or similar it would give you a better answer for a complex question than if you said "Just give me the answer and nothing else" or similar, or asked for the answer first, and then circa gpt-4 answers started getting much longer even if you asked the model to be concise.
That doesn't seem to be the case any more and there has been speculation this is down to the star method being used for training newer models. I say speculation because I don't believe people have come out and said they are using star for training. OpenAI referred to Q* somewhere but they wouldn't be drawn on whether that * is this "star" and although google were involved in publishing the star paper they haven't said gemini uses it (I don't think).
Has it been proven that OpenAI used twitter for training? I know it knows about the popular tweets, but those are reported in many places, so could be ingested accidentally with other content.
(But regardless, many people raised an issue of OpenAI training from sources they shouldn't be allowed to access, so they're definitely a problem as well)
As someone from the EU, hearing this argument over and over from Americans is exhausting.
They provide a product in the EU, therefore they must either follow EU law or exit the EU market. Just like an EU company that provides a product in the US has to follow US law.
The line of 'following the law of another country' is grey area on the internet, given that it goes both ways:
EU online companies providing services to US users fail to provide the free speech guarantees that the US laws afford their citizens. That's because all EU countries have more strict laws limiting free speech. Should the EU companies break their own countries' law to satisfy the US audience?
There are now states in the US which voted laws to regulate social media censorship. The US supreme court has declined ruling on them or taking them down based on companies' first amendment rights.
So it seems there are states where a europeans social medium should abide by rules that would most likely contradict european laws, right?
> EU online companies providing services to US users fail to provide the free speech guarantees that the US laws afford their citizens. That's because all EU countries have more strict laws limiting free speech. Should the EU companies break their own countries' law to satisfy the US audience?
Could you sharpen up this claim? Like suppose I run a microblogging site but I delete libellous posts and incitements to violence in accordance with my local European law. Am I violating a US law by allowing Americans to use the site?
My understanding of your post was that you know that it violates US law and so you're asking what should be done. What I am asking is if it really does violate US law, and if so how.
https://noyb.eu/en/twitters-ai-plans-hit-9-more-gdpr-complai...