The license for this [1] prohibits use of the model and its outputs for any comm...

foobiekr · on May 29, 2024

There's some irony in the fact that people will ignore this license in exactly the same way Mistral and all the other LLM guys ignore the copyright and licensing on the works they ingest.

hehdhdjehehegwv · on May 29, 2024

So basically I, as an open source author, had my code eaten up by Mistral without my consent, but if I want to use their code model I’m subject to a bunch of restrictions that benefit their bottom line?

The problem these AI companies have is they live in a glass house and they can’t throw IP rocks around without breaking their own “your content is our training data” foundation.

They only reason I can think of that Google doesn’t go after OpenAI for scraping YouTube is then they’d put themselves in the same crosshairs, and may set a precedent they’d also be bound by.

Given the model is “on the web” I have the same rights as Mistral to use anything online however I want without regard for IP, right?

Utter absurdity.

paulddraper · on May 30, 2024

No. You are welcome to learn from Mistral's works, either as a meatbag or via machine agent.

You are not allowed to reproduce Mistral's works (beyond the usual Fair Use allowances).

Nor is Mistral entitled to reproduce your works (unless you have licensed as such).

If it does, you can sue for copyright infringement.

hehdhdjehehegwv · on May 30, 2024

This is an actively litigated and unsettled area of law. You, and nobody else, can say any of this with confidence until these lawsuits get to a judge, and even then it’s per jurisdiction rulings. The US, EU, and Japan may end up with different rulings. International trade agreements may be updated. Industry may settle on some sort of broadly acceptable revenue sharing model.

The point is: nobody knows and the AI companies are getting well ahead of the law.

paulddraper · on May 30, 2024

Those cases are about applying these principals to specific events/facts.

But what part about what I said do you believe to be undecided?

That a human can learn without violating copyright? That a machine can learn without violating copyright?

hehdhdjehehegwv · on May 30, 2024

Literally everything about ML training and IP is undecided.

paulddraper · on May 30, 2024

Lol thinking those are outside the law

hehdhdjehehegwv · on May 31, 2024

The law is interpreted anew each day. Nothing is outside the law.

Perhaps if all rules were written in stone, and clear of ambiguity, we would not need judges or the legal process. But that’s not how any of this works.

xigoi · on May 31, 2024

Reproducing is a special case of learning.

paulddraper · on May 31, 2024

No. You can reproduce the works of Shakespeare without learning them, and you can learn the works of Shakespeare without being able to reproduce them.

There is some overlap, but if you think you've found some undiscovered loophole in centuries of copyright law, you're mistaken.

ximira · on May 31, 2024

The concept of "intelectual property" is crazy to begin with, you can't own a thing that is not economically scarse. Nobody owns the atmosphere, nobody owns math, nobody owns a sequence of bytes. If you can copy it, you can't steal it. You have the natural right to use their public code the way you want, as they have the natural right to use your public code the way they want. In the positivist game though, wins whoever has more money to spare on lawyers, so you lose by default even if you wanna play the legal game. The deep pockets always win on that front.

numpad0 · on May 31, 2024

IP is cultural maximalism paperclipping. It ties ambiguous substrings not economically explained to as much of human output as mechanically possible and arbitrarily inflate importance of living humans.

Which is a great thing.

esperent · on May 30, 2024

I used to spend a lot of time (thousands of hours) contributing to open source projects. Over the past few years I've stopped contributing (except minor fixes) to any project under MIT/Apache or similar licences.

Has anyone else done this?

hehdhdjehehegwv · on May 30, 2024

Interesting, I think that’s a totally valid response to this trend of capturing “value” of open source via Cloud Services (for a while now) and Code Gen (more recent).

I think SV is just dead set on killing the golden goose of open source and the web by extracting as much as possible with no regard for the wasteland left behind.

IshKebab · on May 29, 2024

> So basically I, as an open source author, had my code eaten up by Mistral without my consent

Not necessarily. You consented to people reading your code and learning from it when you posted it on Github. Whether or not there's an issue with AI doing the same remains to be settled. It certainly isn't clear cut that separate consent would be required.

Liquix · on May 29, 2024

MIT/BSD code is fair game, but isn't the whole point of GPL/AGPL "you can read and share and use this, but you can't take it and roll it into your closed commercial product for profit"? It seems like what Mistral and co are doing is a fundamental violation of the one thing GPL is striving to enforce.

gpm · on May 30, 2024

No. Either MIT/BSD code isn't fair game because it requires attribution, or GPL/AGPL code is fair game because it isn't copyright infringement in the first place so no license is required.

It'll be a court fight to determine which. Worse, it will be a court fight that plays out in a bunch of different countries and they probably won't all come to the same conclusion. It's unlikely the two licenses have a different effect here though. Either they both forbid it, or neither had the power to forbid it in the first place.

hehdhdjehehegwv · on May 29, 2024

Precisely, this is such a basic violation of GPL it’s mind boggling they went for it.

poopypoopington · on May 30, 2024

Is there an updated version of these license(s) that explicitly excludes projects from being used for training of AIs?

rvz · on May 30, 2024

> but isn't the whole point of GPL/AGPL "you can read and share and use this, but you can't take it and roll it into your closed commercial product for profit"?

You can profit from GPL / AGPL code but just also make all your source code open source and available for everyone to see.

bobthecowboy · on May 30, 2024

> You consented to people reading your code and learning from it when you posted it on Github.

And if I never posted my code to github, but someone else did? What if someone had posted proprietary code they had no rights to to github at the same time the scraper bots were trawling it? A few years ago some Windows source code was leaked onto Github - did Microsoft consent then?

hehdhdjehehegwv · on May 29, 2024

I did not give consent to train on my software and the license does not allow commercial use of it.

They have taken my code and now are dictating how I can use their derived work.

Personally I think these tools are useful, but if the data comes from the commons the model should also belong to the commons. This is just another attempt to gain private benefit from public work.

There are legal issues to be resolved, and there is an explosion of lawsuits already, but the fact pattern is simple and applies to nearly all closed-source AI companies.

portaouflop · on May 29, 2024

Mistral is as open as they get, most others are far worse. Here you can use the model without issues, as others are saying it’s doubtful they would sue you if you were to use code generated by the model in a commercial app

hehdhdjehehegwv · on May 29, 2024

This model is more restricted than Mistral and Mixtral - this is a new development from them.

mananaysiempre · on May 29, 2024

Replit’s replit-code[1,2] is CC BY-SA 4.0 for the weights, Apache 2.0 for the sources. Replit has its own unpleasant history[3], but the model’s terms are good. (The model itself is not as good, but deciding whether that’s a worthwhile tradeoff is up to you. The tradeoff exists and is meaningful, is my point.)

[1] https://huggingface.co/replit/replit-code-v1-3b

[2] https://huggingface.co/replit/replit-code-v1_5-3b

[3] https://news.ycombinator.com/item?id=27424195

jakderrida · on May 31, 2024

>They only reason I can think of that Google doesn’t go after OpenAI for scraping YouTube is then they’d put themselves in the same crosshairs, and may set a precedent they’d also be bound by.

It's like killing Caesar. As long as we all stab him, everyone is guilty and no one can prosecute us.

While you might call it absurd, I feel like these glass houses are why we've seen so much rapid progress with AI recently.

htrp · on May 29, 2024

call it an Enterprise poison pill.

hehdhdjehehegwv · on May 29, 2024

But a pill they also have to swallow.

dodslaser · on May 29, 2024

Enterprise suicide cult.

rvz · on May 30, 2024

> They only reason I can think of that Google doesn’t go after OpenAI for scraping YouTube is then they’d put themselves in the same crosshairs, and may set a precedent they’d also be bound by.

It will be the smartphone patent wars all over again with hundreds of lawsuits against big tech and AI companies.

We are already past the 'fair use' excuses at this point especially when OpenAI is slowly striking deals with news companies to train on their content (with their permission) and with intent of commercializing the model.

hehdhdjehehegwv · on May 30, 2024

I think a lot of the license motivation is to have real time information for RAG. I doubt that is being used for foundation training, it’s just not enough volume.

nullc · on May 29, 2024

Five years ago it would not have been at all controversial that these weights would not be copyrightable in the US, they're machine generated output on third party data. Yet somehow we've entered a weird timeline where obvious copyfraud is fine, by the same entities that are at best on the line of engaging in commercial copyright infringement at a hereto unforeseen scale.

esperent · on May 30, 2024

It's clear that when enough money and power is on the line - and fear that other countries will overtake them - all countries are willing to conveniently and pragmatically ignore their laws. I don't think this is any kind of surprise.

belter · on May 29, 2024

And nobody will sue anybody because suing means...discovery....

aitchnyu · on May 30, 2024

Umm, are you saying its mutually assured destruction for the book pirate (AI company) and api wrapper startup?

nicce · on May 29, 2024

In many countries you even can't claim copyright for the output of the AI to use license like this.

hannasanarion · on May 29, 2024

Copyright on the software that produces something isn't the same as copyright on the output.

The library's copyright is intact, as normal, and they can control who uses it and how just like any other software.

The output of AI systems is not copyrightable, but the systems themselves are, and associated EULAs are valid.

nicce · on May 29, 2024

Is that so certain? To be able to make claims for what you can use the output, can you do it without making any claims for about control and ownership of the output?

Of course, they can revoke your right to use the software, but if it goes to court, that would be interesting case.

wrs · on May 29, 2024

If there’s no copyright in the weights to begin with, the only restrictions you have are the ones you agreed to when you accepted the license agreement. Find the weights somewhere else and you don’t have to worry about the license.

I don’t know why there isn’t more discussion on this point and people just assume there’s an underlying copyright basis to the licensing of weights. As far as I know that isn’t settled at all.

numpad0 · on May 31, 2024

I think it's not important if this is enforcable, this license sounds to me like a warning that the output may be radioactive, which is where AI copyright discussions are slowly shifting towards.

wakawaka28 · on May 29, 2024

I'd like to know how they think they'll prove I didn't write whatever code I generate. Unless it is a direct copy of something else available to the investigator, good luck.

rldjbpin · on May 30, 2024

should it be morally ok to not follow these kinds of license, maybe except when you are selling a service without making any changes? i wonder what people visiting this site thinks about this.

behnamoh · on May 29, 2024

From the website:

> licensed under the new Mistral AI Non-Production License, which means that you can use it for research and testing purposes. ...

Which basically means "we give you this model. Go find its weaknesses and report on r/locallama. Then we'll use that to improve our commercial model which we won't open-source."

I'm sick of abusing the word "open-source" in this field.

JimDabell · on May 29, 2024

> I'm sick of abusing the word "open-source" in this field.

They don’t call this open source anywhere, do they? As far as I can see, they only say it’s open weights and that it’s available under their Mistral AI Non-Production License for research and testing. That doesn’t scream “open source” to me.

demosthanos · on May 29, 2024

They do say "open-weight", which is I think still very misleading in this context. Open-weight sounds like it should be the same as open-source, just for weights instead of the full source (for example, training data and the code used to generate the weights may not be released). This isn't really "open" in any meaningful sense.

boulos · on May 29, 2024

This is why I prefer the term "weights available" just like "source available". It makes it clear that you can get your hands on the copy, you could run this exact thing locally if they go out of business, etc. but it is definitely not open in the OSS sense.

Zambyte · on May 29, 2024

The fact that I can downloaded it and run it myself is a pretty meaningful amount of openness to me. I can easily ignore their bogus claims about what I'm allowed to do with it due to their distribution model. I can't necessarily do the same with a propriety service, as they can cut me off if the way I use the output makes them sad :(

demosthanos · on May 29, 2024

> I can easily ignore their bogus claims about what I'm allowed to do with it due to their distribution model.

If you're talking about exclusively personally use, sure. If you're talking about a business setting in a jurisdiction that Mistral can sue in, not so much.

Being able to use it in a business setting is a pretty darn important part of what Open Source has always meant (it's why it exists as a term at all).

Zambyte · on May 30, 2024

> If you're talking about a business setting in a jurisdiction that Mistral can sue in, not so much.

I'm reminded of the Japanese concept called Sosumi :)

> Being able to use it in a business setting is a pretty darn important part of what Open Source has always meant (it's why it exists as a term at all).

I'm quite familiar with the history of that term, but neither I nor Mistral used it. None of their models have been open source; they have been open weight. You can argue that they are actually "weight available" given the terms they write next to the download link, but since there has been no ruling on whether weights themselves are covered by copyright (and I think that would be terribly bogus if they are), I simply choose not to care what they write in their "terms of use".

TeMPOraL · on May 29, 2024

> The fact that I can downloaded it and run it myself is a pretty meaningful amount of openness to me

That's typically called freeware, though.

Zambyte · on May 29, 2024

The inference engine that I use to run open weight language models is fully free software. The model itself isn't really software in the traditional sense. So calling it ____ware seems inaccurate.

TeMPOraL · on May 29, 2024

The interpreter is free software. The model is freeware distributed as a binary blob. Code vs. Data is a matter of perspective, but with large neural nets, more than anywhere, it makes no sense to pretend they're plain data. All the computational complexity is in the weights, they're very much code compiled for an unusual architecture (the inference engine).

Zambyte · on May 30, 2024

Regardless of the distinction of code vs data, putting a limit to the number of inferences you can run on a model is essentially the same as using a copyright license on a PNG to limit the number of times you can "run" the PNG with a photo viewer. Is that enforceable? Does it matter? Does the copyright extend to music that my photo viewer generates when I open the image? IANAL but imo, no.

Rastonbury · on May 29, 2024

No but they do say "empowering developers" and "democratising coding" as the subtitle, I guess only those who pay

gyudin · on May 29, 2024

All their other models are “open source” and it was the selling point they built their brand on. I doubt they made their new model completely different from previous ones so it’s supposed be open source too, unless they found some juridical loophole lol

benreesman · on May 30, 2024

I use the term “available weight”.

This is maybe a debatable claim, but I’ll contend that without the magnificent rebel who leaked the original LLaMA weights the last, what, 15 months would have gone completely differently.

The legislators and courts and lawyers will be years if not decades sorting all this out.

For now there seems to be a productive if slightly uneasy truce: outside of a few groups at a few firms, everyone seems to be maximizing for innovation and generally behaving under a positive sum expectation.

One imagines if some really cool tune of this model shows up as a magnet or even on huggingface, the courteous thing probably happened: Mistral was notified in advance and some mutually beneficial arrangement was agreed to in outline, maybe inked, maybe not.

I don’t work for Mistral, so that’s pure speculation, but the big company I spent most of my career at would have certainly said “can we hire this person? can we buy this company? can we collaborate with people who do awesome stuff with our stuff that we didn’t think of?”

The icky actors kind of dominate the headlines and I’m as guilty as anyone and guiltier than most of letting that be top of mind too often.

In the large this is really cool and kind of new.

I’m personally rather optimistic that we’re well past the point when outright piracy or flagrantly adversarial license violations are either necessary or useful.

To me this license seems like an invitation to build on Mistral’s work and approach them with the results, and given how well a posture of openness with some safeguards is working out for FAIR and the LLaMA group, that’s certainly the outcome I’d be hoping for in their position.

Maybe open AI was an unrealistic goal. Maybe AvailableAI is what we wind up with, and that wouldn’t be too bad.

wrs · on May 29, 2024

If you want to live on the legal edge, it’s unclear whether there is any copyright in model weights (since they don’t have human authorship), so just wait for someone to post the weights someplace where you can get them without agreeing to the license.

isoprophlex · on May 29, 2024

So, it's almost entirely useless with that license, because the average pack of corpo beancounters will never let you use it over whatever Microsoft has already sold them.

croes · on May 29, 2024

It's more like a demo version you can evaluate before you need to buy a commercial license.

On whose code is Mistral trained?

rvnx · on May 29, 2024

Your code, my code, etc. But there is a common case with law; copyright do not apply when you have billions.

Examples: recurring infringement from Microsoft on open-source projects, Google scraping content to build their own database, etc...

meiraleal · on May 29, 2024

> Who cares? It seems you're not allowed to integrate this with anything else and show it to anyone, even as an art project.

Now they just lack the means to enforce it.

localfirst · on May 29, 2024

impossible to enforce

GuB-42 · on May 30, 2024

They could potentially watermark the model in order to identify the output. There are techniques for doing that, for example by randomly assigning token into groups A and B, group A probability is increased over group B, if group A is over-represented, chances are that that the output comes from the watermarked model.

How effective these techniques are and how acceptable as a proof it is is yet to be defined.

I don't think it is the case here, they probably don't really care, and watermarking has a cost.

rohansood15 · on May 29, 2024

That license is just hilarious.

das_keyboard · on May 29, 2024

OT, but 7.2 reads like the description of some Yu-Gi-Oh card or something:

> Mistral AI may terminate this Agreement at any time [...]. Sections 5, 6, 7 and 8 shall survive the termination of this Agreement.

batch12 · on May 29, 2024

If they can make agreements with arbitrary terms, why can't we? [0]

[0] https://o565.com/content-ownership-and-licensing-agreement/