Hacker Newsnew | past | comments | ask | show | jobs | submit | blip54321's commentslogin

For me, it's not so much about price but about freedom. I'd like to be able to modify them, create derivative works, and otherwise.

I'd also like others to be able to do the same, so I can get their modified versions. Usually, someone else will solve my problem.

I don't mind $20, but I do mind needing to pay $20. There are lots of companies who make great patterns. I'm not sure what makes this any different.


Yes, the web would have survived.

Streaming is a tiny portion of what the web is used for.

Even so, entertainment should be... entertaining. I got free Paramount Plus with my phone service. It has DRM, adblocker-detectors, and all sorts of other nonsense to where it usually doesn't play videos. I went with Youtube over Star Trek. That's not an ideological choice; it's just not entertaining to fight with computers to play a video or to talk to support.

I suspect the effect would have been the opposite: a more rapid decline of the major content producers. This stuff needs to be easy and to work. Netflix did that, before everyone started to jump ship. Napster did it well too.

At some point, there's a spiral, where:

- Declining usability / quality leads to declining viewership

- Declining viewership leads to declining budget

- Declining budget lead to declining usability / quality and more pressure on monetization

... and so on. That's the disruption S-curve. In retrospect, I'm guessing that would have happened if large content producers forced apps.


> Streaming is a tiny portion of what the web is used for.

By bandwidth, streaming is what the majority of the web is used for. PDF warning: https://www.sandvine.com/hubfs/downloads/phenomena/2018-phen...

> Video is almost 58% of the total downstream volume of traffic on the internet


But video is probably the most data-intensive thing most people interact with. How many webpages, books, songs, pdfs is equivalent to a 4 minute 1080p Youtube video?


On the ethics front:

* Yandex released everything as full open

* Facebook released open with restrictions

* OpenAI is completely non-transparent, and to add insult to injury, is trying to sell my own code back to me.

It seems like OpenAI has outlived its founding purpose, and is now a get-rich-quick scheme.

What I really want is a way to run these on a normal GPU, not one with 200GB of RAM. I'm okay with sloooow execution.


Have you looked into HuggingFace Accelerate? People have supposedly been able to make the tradeoff with that. Although you still need to download the huge models.


Can confirm. HuggingFace Accelerate's big model feature[1] has some limits, but it does work. I used it to run a 40GB model on a system with just 20GB of free RAM and a 10GB GPU.

All I had to do was prepare the weights in the format Accelerate understands, then load the model with Accelerate. After that, all the rest of the model code worked without any changes.

But it is incredibly slow. A 20 billion parameter model took about a half hour to respond to a prompt and generate 100 tokens. A 175 billion parameter model like Facebook's would probably take hours.

1: https://huggingface.co/docs/accelerate/big_modeling


Thank you for the pointer. I've been poking at it with a fork for the past few hours, and realized I forgot to respond.


I don't understand why OpenAI has so many restrictions on its API. Isn't things like erotic writing, unlabelled marketing etc. good money for them with minimal chances of litigation? Is it for PR?


It's because it was genuinely founded as an organization worried about misaligned AI.


The critique is that the type of ethics they concern themselves with is borderline moral-panic/Victorian era. Not the Laws of Robotics kind of stuff.

Maybe it's my personality but I get the impression since AI is rather limited in 2022 that all the paid AI ethicists spending 90% of the time on bullshit problems because there aren't many real threats. And these gets amplified because the news is always looking for a FUD angle with every AI story.

The priority seems to be protecting random peoples feelings from hypothetical scenarios they invent, when IRL they are releasing research tools on a long-term R&D timeline... GPT-3 isn't a consumer product they are releasing. It's a baby step on a long road to something way bigger. Crippling that progress because of some hyper-sensitivty to people who get offended easily seems ridiculous to me.


> I get the impression since AI is rather limited in 2022 that all the paid AI ethicists spending 90% of the time on bullshit problems because there aren't many real threats. And these gets amplified because the news is always looking for a FUD angle with every AI story.

I think we’re about due for an AI-ethics winter.


Also, it's pointless. OpenAI might be a leader right now but it won't be forever. It can't control a technology. It's like restricting fire because it can burn down houses... yeah it can, but good look with that, all we need is some friction or flint. As time goes on that flint will become easier to find.

If OpenAI wants to concern itself with the ethics of machine learning, why not develop tools to fight misuse?


There are more than enough unaddressed ethics issues in ML/DS from racial bias in criminal sentencing to de-anonymization of weights to keep ethicists busy without needing Skynet.


Seems like that time would be better spent working for local justice orgs and ACLU than blocking OpenAI/Google from releasing chatbots or image generator because they fear someone might voluntarily type in some wrongthink words into input box and blame them for letting it happen.


That already exists depending on your definition of slow. Just get a big ssd, use it as swap and run the model on cpu.


A comment below said this model uses fp16 (half-precision). If so, it won't easily run on CPU because PyTorch doesn't have good support for fp16 on CPU.


Parent never claimed it was going to be fast.


It would probably just fail with an error "[some function] not implemented for 'Half'"


fp16 models inference just fine in fp32, though I was sorta joking in my original comment, it would potentially take weeks for this to run one input. You're better off trying to make something like huggingface accelerate work (like the comment above), which swaps layers of the model on and off the disk


On the ethics front Yandex should provide more details on the data they’ve used.


I don't see giving spammers, marketers and scammers more powerful tools as the ethical stance.


That’s an understandable view point. However, “Security through obscurity” just doesn’t work. Worse, trying to keep something from people really only punishes/limits the rule followers.

The bad guys get it anyway so this gives the good guys a chance.


I am curious what is the reasoning behind "giving "good guys" access to language models will {deus ex machina} and thus allow us to prevent the spam and abuse".


Automated tools to distinguish AI generated text from human writing and hide the AI spam.


This ^^ + many other mitigation/analytics use cases.


Can humans be trained en masse to output less distinguishable text from those of NN?


There's not much obscurity here. If you have tens of millions of dollars to throw at compute and a bunch of PhDs you could develop similar tech. I don't understand the idea that ethics somehow requires existing private models to be made available to everybody.


Yeah I was responding to a post asking why we should allow open access, given that some of those with access will do bad things.

I agree with you. Ethics doesn't demand that existing private tech be made available. Who's saying that??

OpenAI is just catching shade because their initial founding mission was to democratize access to AI tech and they've gone pretty far the other way.


Almost certainly they are getting it, OpenAI will just get paid for it.


Better take away the internet then


They wont, but the cat is out of the bag. It is data, and data gets leaked, shared in the open, shared in the dark. Researchers can be bribed.

It is like: you can not talk to your kids about drugs and pretend they don't exist ... or you can.


Hey! This sounds like my ex!

Any advice on how to deal with that? It's been a decade and things are increasingly scary.


For you or them? I think if they don't want to get better or admit they have an issue it is very hard. In my experience and from what the article says, it seems to stem from the internalized feeling that you are helpless (because other people traumatized you) but also that you are at fault (because you should've known better or you should be able to control your responses 100% of the time). This sets up a very adversarial and resentful inner view of conflict. As your responses destroy both your own health and your relationships, these views are reinforced.

Learning to accept that you are NOT ruined, that other people's actions aren't your fault, and that people don't hate you just because you have conflict seems to be the stem of this for me. From the therapy side, that seems to be what has helped my symptoms. Extremely freeing and I'm very happy to hear that there is a name for this.


My ex. She won't leave me alone. We do have a kid together, but it's a bit over-the-top.

I can't post anything online under my real name (her mission in life is to protect the universe from me). If she finds out I'm associating with anyone, she'll try to network in to warn them off. Everyone in my life has been told that I'm abusive in some way or another. There's random harassment litigation. It's boundless. She just emailed me today saying she sent over a hundred bucks to some random political organization in my home country, which is invariably part of some convoluted scheme. She's broken into my computers before. Etc. She's smart, and there's never quite enough evidence for a restraining order or similar legal channels (and she's a much better communicator than I am; everyone believes her).

She divorced me, but she won't leave me alone.

Dealing with her nonsense is like a full time job.


Don't know your situation beyond what you've disclosed here, but this may not be PTED, just borderline/histrionic personality disorder. You say she's smart, so Narcissistic might be in scope as well (though I find former-school-bully-types are most often the ones who lean that way). Whatever the "diagnosis," reading up on dealing with the behaviors of these types might be of some help to you.

> She just emailed me today saying she sent over a hundred bucks to some random political organization in my home country, which is invariably part of some convoluted scheme.

Doubtless. You might want to look into this "random" organization and make sure she didn't sign your name to it to get you on the public donor list for your local chapter of NAMBLA. These types love to mislead and/or lie by omission. Even if this organization is legit, you're looking where she wants you to look, so it may still be a red herring.

> She's smart, and there's never quite enough evidence for a restraining order or similar legal channels (and she's a much better communicator than I am; everyone believes her).

Your reputation is clearly under assault-- if you haven't already, put a Google alert out on your name to pick up on any developments in the future. She seems the type to know better than to publish anything defamatory herself, hence my above speculation (such action would make you appear to defame yourself).

For what it's worth, after the successes of #MeToo it didn't take long for the opportunists to undermine the credibility of all women. The world is losing patience for its Amber Heards.


Good advice. I've had friends with borderline partners and they were very dangerous and controlling once things went bad. BPD patients tend to have trouble with the self-awareness part of it so treatment is hard as well (afaik).


Oh god. A bit over-the-top is putting it mildly. If she won't get help I don't know what you can do. Learn how to communicate and defend yourself better could be a start. Sounds like protecting yourself is the only option unless you can get through to her somehow. If you have any insight into why she feels so strongly about how "bad" you are it could be a start as well. Sorry to hear that, it sounds like a real horror story.



OT: The terminal on your web page is broken. I type and nothing happens.


At this point in my life, I would never, ever, ever, ever, ever, ever, ever, ever buy a cheap table saw.

Table saws are where fingers come off, where work pieces get chucked at you, and where blades sometimes explode. It's probably the second-most dangerous tool in any shop.

A good circular saw and guide can do much the same thing as a cheap table saw, safely. Nice table saws can do some precision work a circular saw can't match (such as various kinds of notches), but not a $40 HF mini table saw. But 90% of what a table saw does, a circular saw properly used will do just as well, only with a bit more work setting up guides.

Other tools can handle those rare precision notches a circular saw won't do.

Things to be aware of:

- You can adjust height and angle on a circular saw. It's more finicky than a table saw, and you might need a protractor, but once you set it, it's good.

- You want a long guide you can clip to the piece.

- You want something like a Kreg which attaches to the table saw and guides it for e.g. cross-cuts of the same length.

If you do want a cheap saw on a table, I'd consider any other kind (e.g. a scroll saw).


It's horrible, but it serves a purpose. I've worked on projects where qualifying a vendor is a ton of work. CodeCommit is.... adequate, if you just need hosted git.

AWS and Azure tend to be easy to work with, if you're dealing with anything with regulatory requirements. I have projects which can't go on github, hosted gitlab, or bitbucket. We host our own gitlab, but if we were doing this in 2022, we might use CodeCommit since AWS is qualified.

It makes sense too. Each vendor expands your security perimeter. This isn't just dumb bureaucracy (but that doesn't make it less annoying either).


Damages. This question will come back to damages.

If you steal 10 lines of code from me, the damages will be the greater of:

- The benefit to you (10 minutes programmer time)

- The cost to me ($0)

- Statutory damages (probably $200)

In other words, it's very unlikely to be worth a lawsuit. The most likely outcome is:

- A legal letter is sent

- Infringing code is removed

- As good bedside manner, some nominal amount of money is transferred, mostly in some gesture designed to make the violated party feel good about themselves (e.g. a nice gift).


An example of how much copied code is worth:

https://en.wikipedia.org/wiki/Google_LLC_v._Oracle_America,_...

For this content:

   a nine-line rangeCheck function, several test files, the structure, sequence and organization (SSO) of the Java (API), and the API documentation.
The cost was: "statutory damages up to a maximum of US$150,000".


I don’t believe the nine lines of code was the relevant part leading to damages. It was the fact that Google copied this entire API design (SSO) for Java. I don’t think GPT-3 is in danger of doing that.


Also don't forget that the Supreme Court has ruled that APIs aren't copyrightable after all (or at least fall within fair use).


Now I understand why US IT consulting corporations have expanded into multinationals


> - The benefit to you (10 minutes programmer time)

That's an incomplete view. You're judging the value by the time it'd take to rewrite it.

The real value is in knowing what to type and why.

When Co-pilot suggests you a GPL code, it's main value is the knowledge, not the typing.

That piece of knowledge may have taken a LOT of effort from an OSS team to acquire.

Depending on the context, this knowledge would be worth millions.

Worth a lawsuit.


> Depending on the context, this knowledge would be worth millions. Worth a lawsuit.

But it probably won't be worth millions of dollars. And that is why the lawsuit wont be worth it.

> That piece of knowledge may have taken a LOT of effort from an OSS team to acquire.

Anything "may" be possible. But it probably won't be worth that much.


> Anything "may" be possible. But it probably won't be worth that much.

I'd suggest to get more information about the repercussions associated with appropriating GPL code into proprietary closed source.

This is a big deal. You may have to license your entire codebase under GPL if you incorporate GPL code and distribute it.


> You may have to license your entire codebase under GPL if you incorporate GPL code and distribute it.

I would suggest that you actually take your own advice and get more information yourself.

No license can force you to release your code. Nope, not even GPL.

Instead, what a rights holder can do, is sue for damages for the copyright theft, for not following the license. They can't force you to follow the license. Instead, they can say that you didn't follow it, therefore you stole the code, and owe money to them, for stealing the code, depending on how much the code is worth.

The only thing that GPL does, is it gives people permission to use the works, in exchange for releasing code. But, if you infringe, the damages do not depend on whatever the license was, or whatever request the license makes.

To use an example someone else gave, of the "first born child" license, imagine someone writes a simple binary search function, and puts out a license that gives it out for free, in exchange for paying them some absurd price. EX: the joke of the first born child, but more seriously, lets say the license was "1 million dollars".

If someone stole that binary search, couple line function code, and it went to court, they absolutely would not own them 1 million dollars, even though thats what the license said.

Instead, they would owe the rights holders damages. And chances are, a couple line binary search function, or some other example that you could think of, would only be worth a small amount.

And even though the license said "This code is worth 1 million dollars, and you owe us that money if you use it!", it is not true that anyone would owe them a million dollars. Instead they would only owe them damages, which would not be anywhere close to 1 million dollars.


This is correct. Programmers read licenses like code ("If I use the GPL, I need to release my entire codebase."). That's not how the world works. The worst-case outcome is damages. Damages tend to be reasonable.

In most cases, damages are set to make both parties straight, not to be punitive. People cite how trillion dollar companies might have billion-dollar lawsuits, but that's pretty reasonable. $1B damages are 0.1% of a company's value in a battle between FAANGs, which have big-O trillion-dollar valuations. If you have a dispute between $1M businesses, the analogue is $1k damages. That's not atypical for a commercial dispute.


Please, read my comment again.

I did not say it forces you to distribute it. That's absurd.

What I said is: "if you incorporate GPL code and distribute it"

If you do those two things, yes, you have to license your code under GPL.

It's not me saying, please take a look at Section 5-b and 5-c of the license. [1]

[1] https://www.gnu.org/licenses/gpl-3.0.en.html#section5


stale2002 read your comment correctly. stale2002 responded to it correctly. No one is arguing with you about what the GPL says.

Let's do an experiment: You need to hit yourself repeatedly in the head with a mallet until you pass out.

Are you currently hitting yourself with a mallet until you pass out? No. Just because something is written doesn't mean you need to do it. If I incorporate your GPL code, distribute it, and don't license my code under the GPL, that means I'm distributing code without a license (or breaking a license). Unless I've crossed the line for criminal prosecution (which is far from anything we're discussing here), the worst-case consequence of that is .... damages.

If I've crossed the line into criminal prosecution, then the consequence is damages and jail time. I absolutely STILL do not need to license my code under the GPL.

(In most cases, it's a good idea to license code under the GPL, though, both due to branding/reputation damage, and since usually that leads to an out-of-court settlement; but those carry no legal force being that)


> Unless I've crossed the line for criminal prosecution (which is far from anything we're discussing here), the worst-case consequence of that is .... damages.

This is not how the law works. In addition to damages, if you're a party to a civil lawsuit then a court can order you to do something. This is called an "injunction".

For example, if I write something and you start selling copies of it without permission, and I sue you over your copyright infringement, a court can and will order you to stop. Copyright has teeth like that.

If the thing you were selling was your product -- based illegally on my GPL'd code -- then that may be a lot worse for you than some damages.


I this case, an injunction doesn't force me to do anything. It prevents me from doing something, namely distributing the 10 lines of copilot-regurgitated GPL code in my program.

The solution to that is to remove or replace those lines.

That's not worse than damages. That's just table stakes. That's expected no matter what happens. If I had a few lines of GPL code in a proprietary code base, I'd do that the day it was discovered.

To understand the frequency of injunctions, have a look at this test:

https://en.wikipedia.org/wiki/Injunction#Permanent_injunctio...

Injunctions generally only happens if other means (like damages) have been exhausted.


Are you suggesting we can use OSS and not follow the terms required by its license?

If not, then you do have to license your entire work under GPL if you incorporate GPL code and distribute it.

If yes, what kind of environment do you think you're promoting? Is it positive for the development of the industry, and to society in general?


> Are you suggesting we can use OSS and not follow the terms required by its license?

"can" is a complex question. You can do anything you want, but actions have consequences. I can buy a gun and shoot someone. The consequence is that I might spend the rest of my life in prison. I can fart in a crowded elevator. The consequence is that people will look at me funny, and might dislike me.

Consequences should be proportional to the action.

If farting in an elevator lead to life in prison, or if shooting someone led to people looking at me funny, things wouldn't work very well.

> If not, then you do have to license your entire work under GPL if you incorporate GPL code and distribute it.

No. This is not a proportional consequence. If a random developer incorporates 10 lines of GPL code into Windows, Microsoft doesn't need to license Windows under the AGPL. That's not how our legal system is set up.

Microsoft has to remove the code and pay damages.

> If yes, what kind of environment do you think you're promoting? Is it positive for the development of the industry, and to society in general?

The logic you're suggesting -- is not only incorrect -- but would lead to an environment where people have an irrational fear of "viral" licenses. They're intentionally not viral. They don't infect code. Releasing your code is one option for remedy, but not one the GPL author can force. The FSF went over backwards to design the license like that.

Damages and removing code is an appropriate consequence. It's adequate to prevent most license violations, and still not overly draconian. I don't know of any business which has gone under due to an error around the GPL. That's as it should be. If the GPL were business-toxic, it wouldn't set up a successful ecosystem.

Think of it: If Nevada gave the death penalty for littering, would you liter less? Or simply never, ever, ever travel to Nevada?

In this case, I don't know of a reasonable remedy. I don't want to shut down copilot, but I do feel bad about having my code stolen from me. Perpetual license for everyone whose code was used to develop co-pilot? A nominal stock grant in Open AI? I dunno. When I've seen class action lawsuits, those are the sorts of places things usually land. Indeed, it's usually just short of being fair.


Since people and github are contemplating repeatedly infringing, is there an avenue to increase these damages? This seems like repeated and willful infringement.


It's not willful by the users of copilot. Damages would be low since it's easy to show that users have no intention of infringing, and in most cases, aren't aware they're infringing.

If liability sits somewhere, it's with copilot, github, and Microsoft.

A lot of that might come down to bedside manner. Right now, github isn't super-polite to people whose code it used. That's probably a mistake. They'd be a unsympathetic evil megacorp in a jury trial.


With Copilot it's 10 lines of code by thousands of users.

It adds up.


Let's upload a lot of Oracle GPL code and find out. Oracle has certainly sued over 10 lines of code and for much higher damages.

But you know what? I think we'll find that CoPilot will have magically skipped those Oracle repositories and only used code from lowly open source slaves.


Willful copyright infringement for monetary gain can be prosecuted as a criminal act in the United States (and many other countries including Japan) and it's highly possible Github themselves can end up in hot water for facilitating this.


> it’s highly possible Github themselves can end up in how water for facilitating this.

It might be possible, I don’t know about “highly”. Have you checked the license exclusions required to use Github? Their terms already carve out a Copyright exception for Github, because they need it on order to host your code. There’s also no reason Github can’t filter certain licenses, or make it impossible to complete entire functions, or build an option for everyone to opt-in to being autocomplete source material regardless of license, right? Any legal challenges are likely to result in changes to the feature before there are ever any serious repercussions.

I think it’s at least as likely, if not more so, that Copyright Law could evolve in response to the growing number of AI auto completers, and we (society) try to allow it within reason by being more specific about what constitutes automated infringement and who’s responsible for it. Fair Use currently exists but is vague and left up to courts to decide. In the meantime, Copyright is primarily intended to foster a balance between business and freedom of expression, and there’s a lot of open source software on Github that cares about freedom of expression and not about business. In any case, we don’t really want Copyright to represent some kind of absolute ownership land-lock over every string of 100 characters, that is a bit antithetical to both Copyright and the FOSS community.


wow the number of legal experts that appear and debate hypotheticals when everything is spelled out quite clearly in the license agreements is very high on this site.

Triply so when Microsoft is involved.


You and I have a different understanding of “willful”. If you’ve used copilot you’ll know that the vast majority of the time it’s not infringing anybody’s copyright, it’s creating code that is highly unique to the problem you are trying to solve.


All output of machine learning algorithms is derived from the training set. There is no creativity, just lots of complexity. What that means legally has yet to be fully determined.


If that were the case, how can models such as DALL-E 2 generate “Homer Simpson in The Godfather” type images. It’s clear that machine learning models are capable of independent creation.

As far as copilot goes, yes it’s possible to get it to recite copyrighted works, but in normal usage it is creating independent works because it is too influenced by the structure of your code around the insertion point to recite anything. It’s auto completing things like the variable names that you already declared, simple loops and function applications, etc.

> What that means legally has yet to be fully determined.

At least in the US, the Supreme Court ruled in Google v Oracle that the entire Java API is not copyrightable. Copilot users are very far from crossing the line, the courts are not going to come after some de minimis 10-line snippet that copilot generated.

Whether Microsoft itself was legally in the right by training copilot is a more interesting legal question that remains unresolved.


Do you see a scope for troll code GPLers, something along the lines of troll patents ?


No. There's nothing magical about GPL code. Sticking a license on code doesn't suddenly lead to astronomical damages.

No one has won billions of dollars on GPL enforcement. It's not how courts work. Contrary to popular belief, courts also won't compel compliance (e.g. releasing my code); if I break your license, the standard recourse is damages, whether that's GPL or All Rights Reserved.

Otherwise, I'd make the First Born Child license, whereby by using my code, you give me full ownership of your first born child, your home, your car, and your bank account. I could write a license like that right now, but I couldn't force you to give me your child, car, bank account, and home. If you used my code, you'd have the option to accept the license and give me those things. Or you could reject it, in which case, it's a normal copyright violation; in that case, whatever I wrote in the license is moot, and you pay damages (and stop using my code).


In this case, exchange is not fair, it's a scam, while in case of GPL, exchange is fair (code for code), so it's a valid open contract. You use my code, I use your.


Fair has nothing to do with it. Contracts don't need to be fair, and often aren't. They just need consideration. If we sign a contract whereby you give me your car, bank accounts, and house, for $1, that's a valid contract.

The only part which wouldn't be valid in a contract was the first-born child. That was a joke.

Indeed, if the GPL were a contract, courts might compel compliance.

However, the GPL is not a contract, it's a license. The FSF bent over backwards to make sure the GPL/AGPL licenses wouldn't be viewed as a contract, in part to limit liability / damages / risk.

Confusingly, some EULAs are framed contracts, contrary to the acronym, and do expose users to much more risk of liability than the GPL.

The relevant part of the GPL is:

    You are not required to accept this License in order to receive or
    run a copy of the Program. Ancillary propagation of a covered work 
    occurring solely as a consequence of using peer-to-peer transmission to 
    receive a copy likewise does not require acceptance. However, nothing 
    other than this License grants you permission to propagate or modify 
    any covered work. These actions infringe copyright if you do not accept 
    this License. Therefore, by modifying or propagating a covered work, you 
    indicate your acceptance of this License to do so.
Although we often like to take a plain-text read, but that's misleading; this is legal jargon. It's one of those bits of text which needs to be explained by a lawyer, and one who specializes in both licensing and in contract law.


> Fair has nothing to do with it. Contracts don't need to be fair, and often aren't. They just need consideration. If we sign a contract whereby you give me your car, bank accounts, and house, for $1, that's a valid contract.

It will be a gift. Gifts are valid, but they require free will of the gifting party. Gifts, without free will, can be easily canceled by court.


Copilot doesn't reuse code. None of the code it regurgitates has the required license.


I pasted a few segments of code I'd written in a prominent project. Copilot regurgitated paraphrased versions of the rest of that code. It'd be hard to argue it's not a derivative work.


Thank you for pointing that out!

I should have put "reuse" in quotes, since I meant copilot takes reuse one step further and replicates or regurgites code.


If it were me, I'd make third-party font sources require a SHA hash. In pseudocode:

    url("https://fonts.googleapis.com/comic-sans", sha="abcd1234")
This way:

- If my browser has comic-sans cached, no request is made

- Caching works even if the same resource is sourced from multiple places (e.g. I can host comic-sans locally, but if they got it from a CDN, they don't need to get it again)

- If a malicious site replaces a resource, that's flagged

I think the trick would be to make this optional (but bandwidth/privacy-saving), and gradually to make this increasingly mandatory for different types of resources. AJAX calls obviously can't have SHA hashes, but JavaScript libraries can.


Sounds like you're basically reinventing SRI: https://en.wikipedia.org/wiki/Subresource_Integrity

One issue with cross-site caching, though, is that it may enable timing-based attacks on privacy.


No, I'm not reinventing it, but extending it by:

1) Mandating it for certain types of resources

2) Extending caching to cover the cross-site case.

Can you please explain the proposed timing-based attack?


Websites can use whether or not a resource is cached (one way to measure that is how long it takes to load) to uniquely identify your browser and track you across the internet.

Another attack is to determine if you visited $popularWebsite by checking if resources it uses are cached (this could be useful to, for example, the Chinese government for surveillance on its citizens).


Thank you. I've been thinking about your comment for 3 days, believe it or not.

It seems like:

- Only standard resources ought to be cached (e.g. D3, common fonts, etc.). Perhaps these could be a free registration with the browser maker (e.g. I can always get them from cdn.mozilla.org or something), with some constraints (e.g. minimum number of users, some delay, or similar). As a user, I ought to have the option to cache *all* of these (which is helpful in bandwidth-constrained settings), either on my machine or on a proxy. If I'm at caltech, I can repoint my browser to grab these from localbox.caltech.edu.

- These shouldn't offer a unique fingerprint, since it only works once. If I needed to load comic-sans.ttf, I won't need to load it next time.

- I might be able to set a fingerprint (e.g. ask you to load 25 resources, and check if they're cached), but that's really for cross-site tracking (for which there are easier mechanisms), and it only works once. Once you've cached a resource, it's cached nearly forever. Your fingerprint changes each time, so it's not really traceable.

So the more I think about this:

(1) You raised a valid (and hard!) problem

(2) There seem to be reasonable solutions


I had a similar idea. In addition to caching and detecting if it has been unexpectedly changed, there are other benefits:

- The end user could have the option to enable/disable caching, and to clear the cache. Further configuration is also possible, e.g. to enable same-origin caching only.

- The end user could have the option to replace resources with their own regardless of where the files come from; there is one table keyed by hash and the value is the file to use instead, which might or might not be the same file (so the hash does not necessarily need to match the file that is being used instead).

- Features specific to the browser to make it more efficient could also be used when the user configures replacement of resources, e.g. if it can somehow implement jQuery in native code, or uses a different font format which is more efficient on the computer that it is running on.

- If archived copies of parts of web sites are being made, it can efficiently check if it already has some file which is being used in such a way.

However, requiring a hash probably should not be made mandatory.



Just to clarify: Federal law prohibits discrimination on the basis of religion.

Google cannot take action against religions (even cults) it disagrees with (even for valid reasons for disagreeing with them), unless it gets over a very high bar of impacting business behavior.

The other path leads to far worse outcomes.

Morally, we cannot judge what happened here without hearing from both sides. But if action is to be taken, it should be taken by a prosecutor, not an employer.


He alleges a few specific problems: nepotism, self-dealing and retaliation. Those are things that Google should take seriously, particularly the self-dealing with selling wine (which seems like money laundering).


I would say that this is evidence of discrimination on the basis of religion.


Google cannot favor people in hiring just because they are members of some religious group. Self-dealing in, say, wine contracts, is against Google policy and likely illegal as well


> Google cannot take action against religions (even cults) it disagrees with

"Disagreement" is quite an euphemistic way of describing the situation. The said cult has documented history of sexual abuse.


I don't think Google has standing to prosecute sexual abuse. Isn't that the DA's job?


Precisely.

Google has standing for workplace sexual abuse. If I am a member of a religious / national / ethnic / etc. group with a track record of sexual abuse, I cannot be fired for that. That's discrimination.

There are many minefields here.

- I've seen many community members where I live boycott (unrelated) Russian businesses since Putin's invasion of Ukraine.

- I saw Muslims in the US persecuted after 9/11.

- I live in a Protestant/Atheist community which seems to hate Catholics.

- Etc.

Every large group has a bad component. There was sexual abuse in the Catholic church, 9/11 happened, the US did kill around a million Muslims in recent decades, and Russia shouldn't have invaded Ukraine. Laws are designed to protect individuals from generalization or stereotypes about from the group they're from, whether true or not.

Nepotism, for the most part, is legal in the US; it's just bad business beyond some scale. For a startup, the right strategy is often to hire people you know to be good personally. That's often friends and family. For a family business, the goals aren't purely economic, and again, it's fine practice. For large businesses, it tends to be bad business, but that doesn't make it illegal. Ditto for self-dealing.

There are restrictions for non-profits, public employees, etc. but those aren't general to private businesses like Google.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: