An actually open source LLM would be a game changer. We might need a new license that englobes model usage and training, something GPL-like whereby distributing a retrained model requires contributing data back or making it public, but not if you use it privately.
This will definitely accelerate progress in LLM research, productization and safety. Alpaca, vicuna, gpt4all and others are sporadic repesentations of this that could become a continuous improvement process were the LLM and its license truely open source.
An interesting possible side effect of a GPL-like license is that AIs become unlikely to be trained on private data, the usual moat that big tech wouldn't want/just can't make public if it were to use those GPL-like licensed models.
Huh? There's plenty of open source LLMs. Pythia, GPT-NeoX, GPT-J, GPT-2, BLOOM-176, are ones I can think of off the top of my head. Pythia is the best performing one IIRC.
Not all use cases need GPT-4 level performance. I'd argue that even LLaMA-7B is quite limited. Also, new and improved models are being released all the time.
The solution is simple. We need an updated GPL license that states that the code cant be used in training ais unless the data model is also open source. A coordinated update of all major open source projects and the issue is sorted as it will force the ai folks to open source their models. Or else they’ll have to stick with generating funny cat pictures.
Neither would the proposed model license. Just like the kernel's GPL stops at the userspace boundary, the proposed license would only cover the model definition and weights.
This will definitely accelerate progress in LLM research, productization and safety. Alpaca, vicuna, gpt4all and others are sporadic repesentations of this that could become a continuous improvement process were the LLM and its license truely open source.
An interesting possible side effect of a GPL-like license is that AIs become unlikely to be trained on private data, the usual moat that big tech wouldn't want/just can't make public if it were to use those GPL-like licensed models.