More

riskable · 2025-06-25T19:12:27 1750878747

It can recite something like 80% of Harry Potter with carefully crafted prompts. If you take half a sentence from Harry Potter then tell the LLM to predict the rest it will complete it. That's what they did in that study you're referring to.

It's not even remotely the same thing as "can recite whole Harry Potter." If you ask an LLM to regurgitate Harry Potter it won't be able to do so because that's not how they work. They're prediction engines and it just so happens that Harry Potter quotes/excerpts are so pervasive on the Internet that the LLMs ingress ranks that style of wording higher than other styles.

Ask it to regurgitate some other, less-popular work. Do it for hundreds or thousands of them. You'll quickly find that those two examples you gave are the exceptions and that LLMs can't pull it off. They won't even get close.

rasz · 2025-06-26T07:02:21 1750921341

>predict

unpack, unless you are going to convince me LLMs are predicting '0x5f3759df' :). Lossy compression is still compression.

riskable · 2025-06-25T19:08:21 1750878501

Google trained their AI on stuff they scraped without knowing whether it was pirated content. Why should it be different for Anthropic?

Google literally scrapes pirated content all day every day. When they do that they have no idea if the content was legally placed on that website. Yet, they scan and index it anyway because there's actually no way to know (at all!). There's no great big database of all copyrighted works they can reference.

I'm not saying Meta and Anthropic didn't know they were pirating content. I'm saying that it should be moot since they never distributed it. You can't claim a violation of copyright for content that was never actually "copied" (aka distributed). The site/seeders that uploaded the content to Meta/Anthropic are the violators since copyright is all about distribution rights.

riskable · 2025-06-25T19:03:33 1750878213

Your take on how copyright infringement works only counts for unregistered copyrights. If the copyrighted works are registered with the copyright office statutory damages apply:

https://www.law.cornell.edu/uscode/text/17/504

riskable · 2025-06-25T19:01:38 1750878098

Completely untrue. If some clever engineer or consortium of engineers designed a 3D-printable car for 3D printing-and-manufacturing companies to make then it surely would exist. If you buy one from a Ford dealership you'd be getting the Ford-branded version which may have their own tweaks to the design.

It makes perfect sense to me that the big carmakers could get together some day and develop a handful of car platforms that all their cars will be built upon. That way they can buy the parts from any number of manufacturers (on-demand!) and save themselves a ton of money.

They kind of already do that, actually =)

riskable · 2025-06-24T20:25:05 1750796705

A judge already ruled that models themselves don't constitute copyright infringement in Kadrey v. Meta Platforms, Inc. (https://casetext.com/case/kadrey-v-meta-platforms-inc). The EFF has a good summary about it:

> the court dismissed “nonsensical” claims that Meta’s LLaMA models are themselves infringing derivative works.

See: https://www.eff.org/deeplinks/2025/02/copyright-and-ai-cases...

qoez · 2025-06-25T14:46:17 1750862777

Time to overfit on some books and publicize them as a libgen mirror.

londons_explore · 2025-06-26T07:22:15 1750922535

I think this could lead to interesting results outside the legalities.

Imagine you're getting it to spit out lord of the rings, but midway through you inject into the output 'Suddenly, the ring split in two. No longer one ring to rule them all, but two!'.

You then let the model write the rest of the story!

esperent · 2025-06-26T07:36:36 1750923396

I'm sure many people have imaged this - supposing that LLMs, while making no great strides towards AGI, consciousness, or any of that, nonetheless keep getting better and better at what they do now. Imagine a decade or two of steady improvements, throw in at least a couple of major breakthroughs. Much longer context by a few orders of magnitude. Much better quality, in terms of tone, consistency, hallucinations.

Maybe we'll actually be able to say things like: write me a trilogy in the style of Lord of the Rings but with these changes:

* Make it scifi

* Add more female characters with greater depth

* At least five rings

* Hobbits are the bad guys

... Or whatever, specifying a version of the story tailored to your intersts, and that you would get out really high quality results, similar in quality to the source materials.

Imagine you could do the same with movies, games, music.

I'm not trying to assign a value judgement here. There's good and bad sides. However, this reality is becoming easier to imagine with each new model released.

For sure, anyone who is a writer or artist will see this as bad. But perhaps our whole concept of what art is will become more fluid and personalized.

falcor84 · 2025-06-26T14:47:46 1750949266

I love that, and only wanted to nitpick that not only are there "at least five rings" in the story, but a full score (20): 9 for men, 7 for dwarves, 3 for elves and the one ring.

esperent · 2025-06-26T15:13:32 1750950812

Damn you're right, it's been too many years since I read it. I was following on from the previous comment talking about splitting the ring in two.

riskable · 2025-06-20T18:31:51 1750444311

A thousand tokens.

riskable · 2025-06-20T15:21:44 1750432904

How is Perplexity different from running a Jupyter Notebook or anything, really that lets you download a web page programmatically? I can spin up an AWS instance, login then run `python` and scrape the BBC's content as much as I want. Why aren't they suing Amazon (and every other company that lets you download stuff via their systems) for providing the same functionality?

A very old argument: If you don't want people scraping or downloading your content don't put it on the (public) Internet!

Imagine we had LLM-like functionality in the 1980s: Sony announces a new VCR that can read a recorded news show and print out a summary on a connected Imagewriter II. People start using it to summarize the publicly-broadcast BBC news programs.

Today's scenario would be like the BBC sues Sony for providing that functionality.

ethbr1 · 2025-06-20T15:33:27 1750433607

Because copyright is intrinsically linked to scale.

1000000x'ing fair use... might no longer be fair use.

The balances between society and copyright need to change when scale changes drastically.

To address the elephant in the room -- what happens when there are only leachers and no sources, because we've let them hijack first-party news revenue without creating a replacement?

riskable · 2025-06-20T13:50:32 1750427432

No. If you live in an apartment building and you stop by the coffee shop on the first floor on your way to work every day you get to know the people that work there. Or rather, you would get to know the people that work there if the corporation that owned the franchise paid/cared enough to retain people.

It's the same story everywhere—regardless of whether or not you live in a big city or a small town: It's "jobs", not people.

riskable · 2025-06-16T15:42:53 1750088573

Based on the theory of gravity in the article it's actually, "Archimedes Principle all the way down."

https://en.wikipedia.org/wiki/Archimedes%27_principle

riskable · 2025-06-11T13:57:01 1749650221

How are you using it as a beta reader? What prompts do you use? I'd love to try it.

CuriouslyC · 2025-06-11T18:51:22 1749667882

Just dump your manuscript into google's aistudio, and tell it you'd like it to serve as a beta reader/editor, and tell it what your objectives are with your manuscript so it can give you targeted feedback.