I can only agree. The number of times we have seen corporations abuse “open source” and “open science” in the context of large language models have been baffling: OPT/LLaMA disallowing commercial usage, BLOOM having an ethical non-open license, GLM having a clause not to “undermine [the People’s Republic of China’s] national security and national unity”, etc. Every single one of these models have been happy to ride on the coattails of the hard work of the open movements by calling themselves open, while only paying lip service to the ideals and definitions underpinning them.
While RedPajama has yet to commit to a license (from what I can see, it is late at night…), they are making all the right noises and I am hopeful that my prediction that we are about to see the floodgates of truly open models blow open and that OpenAI’s “moat” will be proving to be a lot shallower than what they and many others have made us believe over the last six months will come true.
Hi, this is Vipul, I am a co-founder of Together. We plan to release the model weights under Apache 2.0. The amount of creativity that Stable Diffusion unleashed for instance is only really possible with permissive licenses!
Thank you Vipul, you and the others are really doing god’s work and have the full support of myself and my academic research team, who are eager to push the boundaries with data, prompts, and investigations of whatever you release (in fact, we have spent the last couple of months working to produce multi-lingual prompts and enriching the few open models we had so far). Just a very quick point of feedback.
While I am not a lawyer and Apache 2.0 is likely to be unproblematic, I always find it puzzling as to why people recently are opting to license non-software using software licenses (Apache 2.0 in particular). Hopefully you have access to sensible lawyers, but I was always under the expectation that model weights would fall under a license such as CC-BY rather than Apache 2.0. Sadly it has been too long since I read the recommendations and justifications for this, so I can not find a good reference, but seem to recall the advice came out of FSF.
Are you working at all with Stability, Eleuther, or LAION? There have been some rumors that they are doing something similar to this and I'm wondering if this is a duplicated effort.
Either way, huge fan, it would be awesome to have a LLaMA set of weights that are fully open.
We are appreciative to the work done by the growing open-source AI community that made this project possible.
That includes:
Participants in building the RedPajama dataset including […] LAION.
Meta AI — […].
EleutherAI — This project is built on the backs of the great team at EleutherAI — including the source code they provided for training GPT-NeoX.
An award of computer time was provided by the INCITE program. This research also used resources of the Oak Ridge Leadership Computing Facility (OLCF), which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725.”
The answer to your question is right there at the bottom of the page in the linked-to blog post :/
> not to undermine the national security and national unity
this is a required statement to conform with China’s constitution, or the superseding authoritative social contract there.
think of it like if the Patriot Act was an article of the constitution instead of a random law subservient to the constitution, it would negate other parts of the constitution that we hold near and dear.
this is a useful similarity as both constitutions have assurances of free speech
just one has a fatal heavily leveraged clause that undermines all other parts of that constitution and dictates all facets of life
This is interesting, thank you. But then how can any entity in the PRC contribute to open source? Alibaba, Baidu, etc. have released plenty of machine learning code under proper open licenses in the past (not to mention that we have hardware vendors in the PRC contributing to say Linux). The story I heard about GLM was that they were a high enough public profile project that it caught the attention of PRC bureaucrats that pushed for the clause to be included.
Regardless of the cause though, the clause flies afoul of any definition of open out there.
While RedPajama has yet to commit to a license (from what I can see, it is late at night…), they are making all the right noises and I am hopeful that my prediction that we are about to see the floodgates of truly open models blow open and that OpenAI’s “moat” will be proving to be a lot shallower than what they and many others have made us believe over the last six months will come true.