Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The important takeaway for both GPT4All and Alpaca is that once an expensive proprietary model is released, people can easily train cheaper OSS models on input/output pairs.

⇒ LLMs are not defensible

⇒ LLMs will become commoditized

⇒ Prices will drop

⇒ Great for open source



   "easily train cheaper OSS models"
That's the claim, but I don't see it. All these open models I tested are WAY worse than GPT-3. (2?)


Do you mean ChatGPT (on GPT-3.5 Turbo)?

The foundation models on which they're built are only GPT-3 capable, and what most many people run locally are the lightest weight, quantized weights, so their performance is even more degraded.

LLaMA-based models are popular because LLaMA beat GPT-3 on benchmarks. But ChatGPT runs on 3.5 Turbo and later, beating them.

There's currently no open model that compares.


Can confirm, at least on Gpt4all. Just tried it and it's nothing like ChatGPT


ChatGPT is GPT3.5(++)


If you pay you can use GPT-4 within the ChatGPT UI.


I am aware, i was pointing out that chatgpt is not gpt 3.0 based but 3.5 and subsequent releases are further along.


Only concern is this violates terms of use of the proprietary LLM API, at least that is true for the current OpenAI API


It was trained using data they didn't have the license to. So will that hold in court? I hope not.


It’s not a licensing issue because OpenAI does not own the copyright to the output of GPT-4. The people who might have a copyright claim are: the authors of the training data, if there is a clear resemblance between the training data input and the output, and rarely the author of the prompt.

OpenAI could argue that because some of the training data was written by them, they have a copyright claim to the output. However, this is a very slippery slope for them as the entire existence of OpenAI is predicated their use of training data being fair use.

OpenAI can only control the output of GPT-4 via their terms of service. When you sign up to use ChatGPT or other services you agree to certain conditions.


This can thus be circumvented by an intermediary that happens to release the output somewhere. Since you don't need permission from the copyright owner to train models, you just.. take it from their website or something. As long as you don't accept their terms of service, there is nothing they can do.


Additionally, different jurisdictions take different views on browsewrap and clickwrap agreements. Usually terms of service regulate acceptable use of the website, don’t spam, don’t harass other users, don’t use bots, things that the website has a legitimate interest in.

Attempting to control what users do in their own time with public domain information from your website may be a step too far for a click or browse wrap ToS.


I am not a lawyer, but I don’t think that affects the copyright or license of works created with the broken-term software. For example, wouldn’t you still own the copyright to a brochure you made with pirated Photoshop, or a photo you took with a DJI drone while flying out of sight?

(All of which may be moot if models can’t be copyrighted because they’re machine-generated.)


It's not an issue of copyright as terms of service. The images you create out of a pirate photoshop, you do own the copyright, but adobe can also go after you for the unrelated matter of illegally bypassing their DRM.


But then when it's distributed, are the people who receive it are in the clear, whilst the original author gets pursued?

For example, can anyone now use GPT4All and there be nothing that OpenAI could do about it?


If you can sue someone for using stuff you copyrighted to train a model, "Open"AI won't be a thing in the first place. Moreover, "Open"AI doesn't have copyright on the outputs, only the terms of services stops the receiver of the output from developing competing products. If you received the subsequent product, there is nothing OpenAI can do to you because you didn't enter into an agreement with them in the first place. None of this involves any copyright.


I’d argue it’s different.

It’s not about using software but for example creating brochure using embedded asset library provided with software (which is often the case but not sure if that’s in Photoshop).

In imaginary scenario - Meta could have obtained rights to train model on FB data through EULA and could (potentially) extend those rights to “legitimate” users but me using same data to create derivatives could be a problem (and might lead to financial losses).

Something similar to “Blurred Lines” case.


This is nothing like ChatGPT though.

This isn't the stable diffusion vs DALLE-2 moment yet. The performance lacks so much they ain't the same thing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: