Streaming is a tiny portion of what the web is used for.
Even so, entertainment should be... entertaining. I got free Paramount Plus with my phone service. It has DRM, adblocker-detectors, and all sorts of other nonsense to where it usually doesn't play videos. I went with Youtube over Star Trek. That's not an ideological choice; it's just not entertaining to fight with computers to play a video or to talk to support.
I suspect the effect would have been the opposite: a more rapid decline of the major content producers. This stuff needs to be easy and to work. Netflix did that, before everyone started to jump ship. Napster did it well too.
At some point, there's a spiral, where:
- Declining usability / quality leads to declining viewership
- Declining viewership leads to declining budget
- Declining budget lead to declining usability / quality and more pressure on monetization
... and so on. That's the disruption S-curve. In retrospect, I'm guessing that would have happened if large content producers forced apps.
But video is probably the most data-intensive thing most people interact with. How many webpages, books, songs, pdfs is equivalent to a 4 minute 1080p Youtube video?
Have you looked into HuggingFace Accelerate? People have supposedly been able to make the tradeoff with that. Although you still need to download the huge models.
Can confirm. HuggingFace Accelerate's big model feature[1] has some limits, but it does work. I used it to run a 40GB model on a system with just 20GB of free RAM and a 10GB GPU.
All I had to do was prepare the weights in the format Accelerate understands, then load the model with Accelerate. After that, all the rest of the model code worked without any changes.
But it is incredibly slow. A 20 billion parameter model took about a half hour to respond to a prompt and generate 100 tokens. A 175 billion parameter model like Facebook's would probably take hours.
I don't understand why OpenAI has so many restrictions on its API. Isn't things like erotic writing, unlabelled marketing etc. good money for them with minimal chances of litigation? Is it for PR?
The critique is that the type of ethics they concern themselves with is borderline moral-panic/Victorian era. Not the Laws of Robotics kind of stuff.
Maybe it's my personality but I get the impression since AI is rather limited in 2022 that all the paid AI ethicists spending 90% of the time on bullshit problems because there aren't many real threats. And these gets amplified because the news is always looking for a FUD angle with every AI story.
The priority seems to be protecting random peoples feelings from hypothetical scenarios they invent, when IRL they are releasing research tools on a long-term R&D timeline... GPT-3 isn't a consumer product they are releasing. It's a baby step on a long road to something way bigger. Crippling that progress because of some hyper-sensitivty to people who get offended easily seems ridiculous to me.
> I get the impression since AI is rather limited in 2022 that all the paid AI ethicists spending 90% of the time on bullshit problems because there aren't many real threats. And these gets amplified because the news is always looking for a FUD angle with every AI story.
Also, it's pointless. OpenAI might be a leader right now but it won't be forever. It can't control a technology. It's like restricting fire because it can burn down houses... yeah it can, but good look with that, all we need is some friction or flint. As time goes on that flint will become easier to find.
If OpenAI wants to concern itself with the ethics of machine learning, why not develop tools to fight misuse?
There are more than enough unaddressed ethics issues in ML/DS from racial bias in criminal sentencing to de-anonymization of weights to keep ethicists busy without needing Skynet.
Seems like that time would be better spent working for local justice orgs and ACLU than blocking OpenAI/Google from releasing chatbots or image generator because they fear someone might voluntarily type in some wrongthink words into input box and blame them for letting it happen.
A comment below said this model uses fp16 (half-precision). If so, it won't easily run on CPU because PyTorch doesn't have good support for fp16 on CPU.
fp16 models inference just fine in fp32, though I was sorta joking in my original comment, it would potentially take weeks for this to run one input. You're better off trying to make something like huggingface accelerate work (like the comment above), which swaps layers of the model on and off the disk
That’s an understandable view point. However, “Security through obscurity” just doesn’t work. Worse, trying to keep something from people really only punishes/limits the rule followers.
The bad guys get it anyway so this gives the good guys a chance.
I am curious what is the reasoning behind "giving "good guys" access to language models will {deus ex machina} and thus allow us to prevent the spam and abuse".
There's not much obscurity here. If you have tens of millions of dollars to throw at compute and a bunch of PhDs you could develop similar tech. I don't understand the idea that ethics somehow requires existing private models to be made available to everybody.
For you or them? I think if they don't want to get better or admit they have an issue it is very hard. In my experience and from what the article says, it seems to stem from the internalized feeling that you are helpless (because other people traumatized you) but also that you are at fault (because you should've known better or you should be able to control your responses 100% of the time). This sets up a very adversarial and resentful inner view of conflict. As your responses destroy both your own health and your relationships, these views are reinforced.
Learning to accept that you are NOT ruined, that other people's actions aren't your fault, and that people don't hate you just because you have conflict seems to be the stem of this for me. From the therapy side, that seems to be what has helped my symptoms. Extremely freeing and I'm very happy to hear that there is a name for this.
My ex. She won't leave me alone. We do have a kid together, but it's a bit over-the-top.
I can't post anything online under my real name (her mission in life is to protect the universe from me). If she finds out I'm associating with anyone, she'll try to network in to warn them off. Everyone in my life has been told that I'm abusive in some way or another. There's random harassment litigation. It's boundless. She just emailed me today saying she sent over a hundred bucks to some random political organization in my home country, which is invariably part of some convoluted scheme. She's broken into my computers before. Etc. She's smart, and there's never quite enough evidence for a restraining order or similar legal channels (and she's a much better communicator than I am; everyone believes her).
She divorced me, but she won't leave me alone.
Dealing with her nonsense is like a full time job.
Don't know your situation beyond what you've disclosed here, but this may not be PTED, just borderline/histrionic personality disorder. You say she's smart, so Narcissistic might be in scope as well (though I find former-school-bully-types are most often the ones who lean that way). Whatever the "diagnosis," reading up on dealing with the behaviors of these types might be of some help to you.
> She just emailed me today saying she sent over a hundred bucks to some random political organization in my home country, which is invariably part of some convoluted scheme.
Doubtless. You might want to look into this "random" organization and make sure she didn't sign your name to it to get you on the public donor list for your local chapter of NAMBLA. These types love to mislead and/or lie by omission. Even if this organization is legit, you're looking where she wants you to look, so it may still be a red herring.
> She's smart, and there's never quite enough evidence for a restraining order or similar legal channels (and she's a much better communicator than I am; everyone believes her).
Your reputation is clearly under assault-- if you haven't already, put a Google alert out on your name to pick up on any developments in the future. She seems the type to know better than to publish anything defamatory herself, hence my above speculation (such action would make you appear to defame yourself).
For what it's worth, after the successes of #MeToo it didn't take long for the opportunists to undermine the credibility of all women. The world is losing patience for its Amber Heards.
Good advice. I've had friends with borderline partners and they were very dangerous and controlling once things went bad. BPD patients tend to have trouble with the self-awareness part of it so treatment is hard as well (afaik).
Oh god. A bit over-the-top is putting it mildly. If she won't get help I don't know what you can do. Learn how to communicate and defend yourself better could be a start. Sounds like protecting yourself is the only option unless you can get through to her somehow. If you have any insight into why she feels so strongly about how "bad" you are it could be a start as well. Sorry to hear that, it sounds like a real horror story.
At this point in my life, I would never, ever, ever, ever, ever, ever, ever, ever buy a cheap table saw.
Table saws are where fingers come off, where work pieces get chucked at you, and where blades sometimes explode. It's probably the second-most dangerous tool in any shop.
A good circular saw and guide can do much the same thing as a cheap table saw, safely. Nice table saws can do some precision work a circular saw can't match (such as various kinds of notches), but not a $40 HF mini table saw. But 90% of what a table saw does, a circular saw properly used will do just as well, only with a bit more work setting up guides.
Other tools can handle those rare precision notches a circular saw won't do.
Things to be aware of:
- You can adjust height and angle on a circular saw. It's more finicky than a table saw, and you might need a protractor, but once you set it, it's good.
- You want a long guide you can clip to the piece.
- You want something like a Kreg which attaches to the table saw and guides it for e.g. cross-cuts of the same length.
If you do want a cheap saw on a table, I'd consider any other kind (e.g. a scroll saw).
It's horrible, but it serves a purpose. I've worked on projects where qualifying a vendor is a ton of work. CodeCommit is.... adequate, if you just need hosted git.
AWS and Azure tend to be easy to work with, if you're dealing with anything with regulatory requirements. I have projects which can't go on github, hosted gitlab, or bitbucket. We host our own gitlab, but if we were doing this in 2022, we might use CodeCommit since AWS is qualified.
It makes sense too. Each vendor expands your security perimeter. This isn't just dumb bureaucracy (but that doesn't make it less annoying either).
If you steal 10 lines of code from me, the damages will be the greater of:
- The benefit to you (10 minutes programmer time)
- The cost to me ($0)
- Statutory damages (probably $200)
In other words, it's very unlikely to be worth a lawsuit. The most likely outcome is:
- A legal letter is sent
- Infringing code is removed
- As good bedside manner, some nominal amount of money is transferred, mostly in some gesture designed to make the violated party feel good about themselves (e.g. a nice gift).
I don’t believe the nine lines of code was the relevant part leading to damages. It was the fact that Google copied this entire API design (SSO) for Java. I don’t think GPT-3 is in danger of doing that.
> You may have to license your entire codebase under GPL if you incorporate GPL code and distribute it.
I would suggest that you actually take your own advice and get more information yourself.
No license can force you to release your code. Nope, not even GPL.
Instead, what a rights holder can do, is sue for damages for the copyright theft, for not following the license. They can't force you to follow the license. Instead, they can say that you didn't follow it, therefore you stole the code, and owe money to them, for stealing the code, depending on how much the code is worth.
The only thing that GPL does, is it gives people permission to use the works, in exchange for releasing code. But, if you infringe, the damages do not depend on whatever the license was, or whatever request the license makes.
To use an example someone else gave, of the "first born child" license, imagine someone writes a simple binary search function, and puts out a license that gives it out for free, in exchange for paying them some absurd price. EX: the joke of the first born child, but more seriously, lets say the license was "1 million dollars".
If someone stole that binary search, couple line function code, and it went to court, they absolutely would not own them 1 million dollars, even though thats what the license said.
Instead, they would owe the rights holders damages. And chances are, a couple line binary search function, or some other example that you could think of, would only be worth a small amount.
And even though the license said "This code is worth 1 million dollars, and you owe us that money if you use it!", it is not true that anyone would owe them a million dollars. Instead they would only owe them damages, which would not be anywhere close to 1 million dollars.
This is correct. Programmers read licenses like code ("If I use the GPL, I need to release my entire codebase."). That's not how the world works. The worst-case outcome is damages. Damages tend to be reasonable.
In most cases, damages are set to make both parties straight, not to be punitive. People cite how trillion dollar companies might have billion-dollar lawsuits, but that's pretty reasonable. $1B damages are 0.1% of a company's value in a battle between FAANGs, which have big-O trillion-dollar valuations. If you have a dispute between $1M businesses, the analogue is $1k damages. That's not atypical for a commercial dispute.
stale2002 read your comment correctly. stale2002 responded to it correctly. No one is arguing with you about what the GPL says.
Let's do an experiment: You need to hit yourself repeatedly in the head with a mallet until you pass out.
Are you currently hitting yourself with a mallet until you pass out? No. Just because something is written doesn't mean you need to do it. If I incorporate your GPL code, distribute it, and don't license my code under the GPL, that means I'm distributing code without a license (or breaking a license). Unless I've crossed the line for criminal prosecution (which is far from anything we're discussing here), the worst-case consequence of that is .... damages.
If I've crossed the line into criminal prosecution, then the consequence is damages and jail time. I absolutely STILL do not need to license my code under the GPL.
(In most cases, it's a good idea to license code under the GPL, though, both due to branding/reputation damage, and since usually that leads to an out-of-court settlement; but those carry no legal force being that)
> Unless I've crossed the line for criminal prosecution (which is far from anything we're discussing here), the worst-case consequence of that is .... damages.
This is not how the law works. In addition to damages, if you're a party to a civil lawsuit then a court can order you to do something. This is called an "injunction".
For example, if I write something and you start selling copies of it without permission, and I sue you over your copyright infringement, a court can and will order you to stop. Copyright has teeth like that.
If the thing you were selling was your product -- based illegally on my GPL'd code -- then that may be a lot worse for you than some damages.
I this case, an injunction doesn't force me to do anything. It prevents me from doing something, namely distributing the 10 lines of copilot-regurgitated GPL code in my program.
The solution to that is to remove or replace those lines.
That's not worse than damages. That's just table stakes. That's expected no matter what happens. If I had a few lines of GPL code in a proprietary code base, I'd do that the day it was discovered.
To understand the frequency of injunctions, have a look at this test:
> Are you suggesting we can use OSS and not follow the terms required by its license?
"can" is a complex question. You can do anything you want, but actions have consequences. I can buy a gun and shoot someone. The consequence is that I might spend the rest of my life in prison. I can fart in a crowded elevator. The consequence is that people will look at me funny, and might dislike me.
Consequences should be proportional to the action.
If farting in an elevator lead to life in prison, or if shooting someone led to people looking at me funny, things wouldn't work very well.
> If not, then you do have to license your entire work under GPL if you incorporate GPL code and distribute it.
No. This is not a proportional consequence. If a random developer incorporates 10 lines of GPL code into Windows, Microsoft doesn't need to license Windows under the AGPL. That's not how our legal system is set up.
Microsoft has to remove the code and pay damages.
> If yes, what kind of environment do you think you're promoting? Is it positive for the development of the industry, and to society in general?
The logic you're suggesting -- is not only incorrect -- but would lead to an environment where people have an irrational fear of "viral" licenses. They're intentionally not viral. They don't infect code. Releasing your code is one option for remedy, but not one the GPL author can force. The FSF went over backwards to design the license like that.
Damages and removing code is an appropriate consequence. It's adequate to prevent most license violations, and still not overly draconian. I don't know of any business which has gone under due to an error around the GPL. That's as it should be. If the GPL were business-toxic, it wouldn't set up a successful ecosystem.
Think of it: If Nevada gave the death penalty for littering, would you liter less? Or simply never, ever, ever travel to Nevada?
In this case, I don't know of a reasonable remedy. I don't want to shut down copilot, but I do feel bad about having my code stolen from me. Perpetual license for everyone whose code was used to develop co-pilot? A nominal stock grant in Open AI? I dunno. When I've seen class action lawsuits, those are the sorts of places things usually land. Indeed, it's usually just short of being fair.
Since people and github are contemplating repeatedly infringing, is there an avenue to increase these damages? This seems like repeated and willful infringement.
It's not willful by the users of copilot. Damages would be low since it's easy to show that users have no intention of infringing, and in most cases, aren't aware they're infringing.
If liability sits somewhere, it's with copilot, github, and Microsoft.
A lot of that might come down to bedside manner. Right now, github isn't super-polite to people whose code it used. That's probably a mistake. They'd be a unsympathetic evil megacorp in a jury trial.
Let's upload a lot of Oracle GPL code and find out. Oracle has certainly sued over 10 lines of code and for much higher damages.
But you know what? I think we'll find that CoPilot will have magically skipped those Oracle repositories and only used code from lowly open source slaves.
Willful copyright infringement for monetary gain can be prosecuted as a criminal act in the United States (and many other countries including Japan) and it's highly possible Github themselves can end up in hot water for facilitating this.
> it’s highly possible Github themselves can end up in how water for facilitating this.
It might be possible, I don’t know about “highly”. Have you checked the license exclusions required to use Github? Their terms already carve out a Copyright exception for Github, because they need it on order to host your code. There’s also no reason Github can’t filter certain licenses, or make it impossible to complete entire functions, or build an option for everyone to opt-in to being autocomplete source material regardless of license, right? Any legal challenges are likely to result in changes to the feature before there are ever any serious repercussions.
I think it’s at least as likely, if not more so, that Copyright Law could evolve in response to the growing number of AI auto completers, and we (society) try to allow it within reason by being more specific about what constitutes automated infringement and who’s responsible for it. Fair Use currently exists but is vague and left up to courts to decide. In the meantime, Copyright is primarily intended to foster a balance between business and freedom of expression, and there’s a lot of open source software on Github that cares about freedom of expression and not about business. In any case, we don’t really want Copyright to represent some kind of absolute ownership land-lock over every string of 100 characters, that is a bit antithetical to both Copyright and the FOSS community.
wow the number of legal experts that appear and debate hypotheticals when everything is spelled out quite clearly in the license agreements is very high on this site.
You and I have a different understanding of “willful”. If you’ve used copilot you’ll know that the vast majority of the time it’s not infringing anybody’s copyright, it’s creating code that is highly unique to the problem you are trying to solve.
All output of machine learning algorithms is derived from the training set. There is no creativity, just lots of complexity. What that means legally has yet to be fully determined.
If that were the case, how can models such as DALL-E 2 generate “Homer Simpson in The Godfather” type images. It’s clear that machine learning models are capable of independent creation.
As far as copilot goes, yes it’s possible to get it to recite copyrighted works, but in normal usage it is creating independent works because it is too influenced by the structure of your code around the insertion point to recite anything. It’s auto completing things like the variable names that you already declared, simple loops and function applications, etc.
> What that means legally has yet to be fully determined.
At least in the US, the Supreme Court ruled in Google v Oracle that the entire Java API is not copyrightable. Copilot users are very far from crossing the line, the courts are not going to come after some de minimis 10-line snippet that copilot generated.
Whether Microsoft itself was legally in the right by training copilot is a more interesting legal question that remains unresolved.
No. There's nothing magical about GPL code. Sticking a license on code doesn't suddenly lead to astronomical damages.
No one has won billions of dollars on GPL enforcement. It's not how courts work. Contrary to popular belief, courts also won't compel compliance (e.g. releasing my code); if I break your license, the standard recourse is damages, whether that's GPL or All Rights Reserved.
Otherwise, I'd make the First Born Child license, whereby by using my code, you give me full ownership of your first born child, your home, your car, and your bank account. I could write a license like that right now, but I couldn't force you to give me your child, car, bank account, and home. If you used my code, you'd have the option to accept the license and give me those things. Or you could reject it, in which case, it's a normal copyright violation; in that case, whatever I wrote in the license is moot, and you pay damages (and stop using my code).
In this case, exchange is not fair, it's a scam, while in case of GPL, exchange is fair (code for code), so it's a valid open contract. You use my code, I use your.
Fair has nothing to do with it. Contracts don't need to be fair, and often aren't. They just need consideration. If we sign a contract whereby you give me your car, bank accounts, and house, for $1, that's a valid contract.
The only part which wouldn't be valid in a contract was the first-born child. That was a joke.
Indeed, if the GPL were a contract, courts might compel compliance.
However, the GPL is not a contract, it's a license. The FSF bent over backwards to make sure the GPL/AGPL licenses wouldn't be viewed as a contract, in part to limit liability / damages / risk.
Confusingly, some EULAs are framed contracts, contrary to the acronym, and do expose users to much more risk of liability than the GPL.
The relevant part of the GPL is:
You are not required to accept this License in order to receive or
run a copy of the Program. Ancillary propagation of a covered work
occurring solely as a consequence of using peer-to-peer transmission to
receive a copy likewise does not require acceptance. However, nothing
other than this License grants you permission to propagate or modify
any covered work. These actions infringe copyright if you do not accept
this License. Therefore, by modifying or propagating a covered work, you
indicate your acceptance of this License to do so.
Although we often like to take a plain-text read, but that's misleading; this is legal jargon. It's one of those bits of text which needs to be explained by a lawyer, and one who specializes in both licensing and in contract law.
> Fair has nothing to do with it. Contracts don't need to be fair, and often aren't. They just need consideration. If we sign a contract whereby you give me your car, bank accounts, and house, for $1, that's a valid contract.
It will be a gift. Gifts are valid, but they require free will of the gifting party. Gifts, without free will, can be easily canceled by court.
I pasted a few segments of code I'd written in a prominent project. Copilot regurgitated paraphrased versions of the rest of that code. It'd be hard to argue it's not a derivative work.
- If my browser has comic-sans cached, no request is made
- Caching works even if the same resource is sourced from multiple places (e.g. I can host comic-sans locally, but if they got it from a CDN, they don't need to get it again)
- If a malicious site replaces a resource, that's flagged
I think the trick would be to make this optional (but bandwidth/privacy-saving), and gradually to make this increasingly mandatory for different types of resources. AJAX calls obviously can't have SHA hashes, but JavaScript libraries can.
Websites can use whether or not a resource is cached (one way to measure that is how long it takes to load) to uniquely identify your browser and track you across the internet.
Another attack is to determine if you visited $popularWebsite by checking if resources it uses are cached (this could be useful to, for example, the Chinese government for surveillance on its citizens).
Thank you. I've been thinking about your comment for 3 days, believe it or not.
It seems like:
- Only standard resources ought to be cached (e.g. D3, common fonts, etc.). Perhaps these could be a free registration with the browser maker (e.g. I can always get them from cdn.mozilla.org or something), with some constraints (e.g. minimum number of users, some delay, or similar). As a user, I ought to have the option to cache *all* of these (which is helpful in bandwidth-constrained settings), either on my machine or on a proxy. If I'm at caltech, I can repoint my browser to grab these from localbox.caltech.edu.
- These shouldn't offer a unique fingerprint, since it only works once. If I needed to load comic-sans.ttf, I won't need to load it next time.
- I might be able to set a fingerprint (e.g. ask you to load 25 resources, and check if they're cached), but that's really for cross-site tracking (for which there are easier mechanisms), and it only works once. Once you've cached a resource, it's cached nearly forever. Your fingerprint changes each time, so it's not really traceable.
I had a similar idea. In addition to caching and detecting if it has been unexpectedly changed, there are other benefits:
- The end user could have the option to enable/disable caching, and to clear the cache. Further configuration is also possible, e.g. to enable same-origin caching only.
- The end user could have the option to replace resources with their own regardless of where the files come from; there is one table keyed by hash and the value is the file to use instead, which might or might not be the same file (so the hash does not necessarily need to match the file that is being used instead).
- Features specific to the browser to make it more efficient could also be used when the user configures replacement of resources, e.g. if it can somehow implement jQuery in native code, or uses a different font format which is more efficient on the computer that it is running on.
- If archived copies of parts of web sites are being made, it can efficiently check if it already has some file which is being used in such a way.
However, requiring a hash probably should not be made mandatory.
Just to clarify: Federal law prohibits discrimination on the basis of religion.
Google cannot take action against religions (even cults) it disagrees with (even for valid reasons for disagreeing with them), unless it gets over a very high bar of impacting business behavior.
The other path leads to far worse outcomes.
Morally, we cannot judge what happened here without hearing from both sides. But if action is to be taken, it should be taken by a prosecutor, not an employer.
He alleges a few specific problems: nepotism, self-dealing and retaliation. Those are things that Google should take seriously, particularly the self-dealing with selling wine (which seems like money laundering).
Google cannot favor people in hiring just because they are members of some religious group. Self-dealing in, say, wine contracts, is against Google policy and likely illegal as well
Google has standing for workplace sexual abuse. If I am a member of a religious / national / ethnic / etc. group with a track record of sexual abuse, I cannot be fired for that. That's discrimination.
There are many minefields here.
- I've seen many community members where I live boycott (unrelated) Russian businesses since Putin's invasion of Ukraine.
- I saw Muslims in the US persecuted after 9/11.
- I live in a Protestant/Atheist community which seems to hate Catholics.
- Etc.
Every large group has a bad component. There was sexual abuse in the Catholic church, 9/11 happened, the US did kill around a million Muslims in recent decades, and Russia shouldn't have invaded Ukraine. Laws are designed to protect individuals from generalization or stereotypes about from the group they're from, whether true or not.
Nepotism, for the most part, is legal in the US; it's just bad business beyond some scale. For a startup, the right strategy is often to hire people you know to be good personally. That's often friends and family. For a family business, the goals aren't purely economic, and again, it's fine practice. For large businesses, it tends to be bad business, but that doesn't make it illegal. Ditto for self-dealing.
There are restrictions for non-profits, public employees, etc. but those aren't general to private businesses like Google.
I'd also like others to be able to do the same, so I can get their modified versions. Usually, someone else will solve my problem.
I don't mind $20, but I do mind needing to pay $20. There are lots of companies who make great patterns. I'm not sure what makes this any different.