Hacker Newsnew | past | comments | ask | show | jobs | submit | surround's commentslogin

Anna's Archive announced they intended to infringe on the label's copyrights by distributing their music without a license. The law allows the court "to prevent or restrain infringement of a copyright" (emphasis mine).

https://www.law.cornell.edu/uscode/text/17/502#:~:text=Any%2...


Spotify does not own that copyright, only a distribution license. How they can get away with it?

The plaintiffs are actually record companies, spotify is tacked on at the end for some reason, and the article decided to confuse matters :)

Rights can be extended through contracts. A lawyer at Spotify might think to put in: "we distribute the music for you, your right to enforce copyright or otherwise litigate on behalf of that music is also extended to us as if we also own it".

The legal language would be different, that's a dumbed down version.


I do understand what can happen (I'm an IP lawyer), but this basically requires enabling spotify to act as your attorney, since they still do not in fact own the rights, even with this. You can't manufacture standing here - only folks who are exclusive rightsholders can sue. Period. So it would require giving them power of attorney enabling them to sue on your behalf, since you (or whoever) still own the exclusive rights .

I strongly doubt their contract terms have this in there, it would be fairly shocking.

I say this having seens tons of these kinds of contracts, even with spotify, and never seeing something like this.


What I have seen in practice (not with Spotify) is a law firm that is cozy with both entities will be delegated standing, the "powers" in power of attorney but with clauses defining a limited scope and "escape hatch" and "kill switch" clauses.

With the amount of content that has been described, it's not unlikely that Spotify actually owns some tiny fraction of it. They probably have some half-assed record label that owns two songs by a nobody.

Apparently you can win anything you want in a default judgement, no matter how ridiculous. When you know the other side won't show up because they'd be handcuffed, this is a useful way to achieve your goals.

Nah - the plaintiffs include record companies, who do have rights here.

We need to abolish copyright laws entirely. This is just the ten millionth instance of them being abused by the 1% to harm the majority.

In times of AI this doesn't sound lije the ideal solution either

Wikipedia says

  Ek's initial pitch to Lorentzon was not initially related to music, but rather a way for streaming content such as video, digital films, images or music to drive advertising revenue.
So yes, they were always intending to get revenue from ads. And yes, the initial pitch included other types of media too. But I don't think we can call Spotify "an ad platform" that "never actually cared about music" any more than we could call Ars Technica "an ad platform that never actually cared about tech news."

Did you know that Ars in Ars Technica stands for Ass, showing how badly they really thought of technology?

/s


Sarcasm doesn't mean bad jokes.

And this is exactly why I had to use /s. Because some people would not understand that it was weitten tongue-in-cheek, while some others would fail to see the larger context and confuse my sarcasm with a simple joke (sure, as a joke it is bad; and that's precicely because it was optimized to be sarcastic, not joke-funny).

Where does it talk about his use of English or his lawyers?


I can't find anything on his Wikipedia article for "English" or "lawyer." Can't we assume good faith?


OP is Korean, and will be using the Korean wiki article which ranks squarely first when Googling his name in Korea from inside Korea. Would you ask for the same in the US case with the paralles I drew? Of course not.


> The betting markets were not impressed by GPT-5. I am reading this graph as "there is a high expectation that Google will announce Gemini-3 in August", and not as "Gemini 2.5 is better than GPT-5".

This is an incorrect interpretation. The benchmark which the betting market is based upon currently ranks Gemini 2.5 higher than GPT-5.


EDIT: I updated the article to account for this perspective.

------

This can't be right -- they're using LMArena without style control to resolve the market, and GPT-5 is ahead right? (https://lmarena.ai/leaderboard/text/overall-no-style-control)

> This market will resolve according to the company which owns the model which has the highest arena score based off the Chatbot Arena LLM Leaderboard (https://lmarena.ai/) when the table under the "Leaderboard" tab is checked on August 31, 2025, 12:00 PM ET.

> Results from the "Arena Score" section on the Leaderboard tab of https://lmarena.ai/leaderboard/text with the style control off will be used to resolve this market.

> If two models are tied for the top arena score at this market's check time, resolution will be based on whichever company's name, as it is described in this market group, comes first in alphabetical order (e.g. if both were tied, "Google" would resolve to "Yes", and "xAI" would resolve to "No")

> The resolution source for this market is the Chatbot Arena LLM Leaderboard found at https://lmarena.ai/. If this resolution source is unavailable at check time, this market will remain open until the leaderboard comes back online and resolve based on the first check after it becomes available. If it becomes permanently unavailable, this market will resolve based on another resolution source.


You may have already figured this out, but the leaderboard you linked to (https://lmarena.ai/leaderboard/text/overall-no-style-control) shows gemini-2.5-pro ahead with a score of 1471 compared to gpt-5 at 1462.


It is very interesting that among top-20 models, all non-proprietary ones are from China.


gpt-5 was ahead on that last night


The leaderboard hasn't changed since it was updated to add gpt-5. Here's what it looked like yesterday https://archive.is/XIrbN

If you saw gpt-5 was ahead, you might have been looking at the leaderboard with style control https://lmarena.ai/leaderboard/text/overall


> This is an incorrect interpretation. The benchmark which the betting market is based upon currently ranks Gemini 2.5 higher than GPT-5.

You can see from the graph that Google shot way up from ~25% to ~80% upon the release of GPT-5. Google’s model didn’t suddenly get way better at any benchmarks, did it?


It's not about Google's model getting better. It is that gpt-5 already has a worse score than Gemini 2.5 Pro had before gpt-5 came out (on the particular metric that determines this bet: Overall Text without Style Control).

https://lmarena.ai/leaderboard/text/overall-no-style-control

That graph is a probability. The fact that it's not 100% reflects the possibility that gpt-5 or someone else will improve enough by the end of the month to beat Gemini.


GPT-5 knowledge cutoff: Sep 30, 2024 (10 months before release).

Compare that to

Gemini 2.5 Pro knowledge cutoff: Jan 2025 (3 months before release)

Claude Opus 4.1: knowledge cutoff: Mar 2025 (4 months before release)

https://platform.openai.com/docs/models/compare

https://deepmind.google/models/gemini/pro/

https://docs.anthropic.com/en/docs/about-claude/models/overv...


It would be fun to train an LLM with a knowledge cutoff of 1900 or something


Someone tried this, I saw it one of the Reddit AI subs. They were training a local model on whatever they could find that was written before $cutoffDate.

Found the GitHub: https://github.com/haykgrigo3/TimeCapsuleLLM


That’s been done to see if it could extrapolate and predict the future. Can’t find the link right now to the paper.


This one? "Mind the Gap: Assessing Temporal Generalization in Neural Language Models" https://arxiv.org/abs/2102.01951


The idea matches, but 2019 is a far cry from, say, 1930.


In 1930 there was not enough information in the world for consciousness to develop.


You mean information in digestible form.


I think this is a meta-allusion to the theory that human consciousness developed recently, i.e. that people who lived before [written] language did not have language because they actually did not think. It's a potentially useful thought experiment, because we've all grown up not only knowing highly performant languages, but also knowing how to read / write.

However, primitive languages were... primitive. Where they primitive because people didn't know / understand the nuances their languages lacked? Or, were those things that simply didn't get communicated (effectively)?

Of course, spoken language predates writings which is part of the point. We know an individual can have a "conscious" conception of an idea if they communicate it, but that consciousness was limited to the individual. Once we have written language, we can perceive a level of communal consciousness of certain ideas. You could say that the community itself had a level of shared-consciousness.

With GPTs regurgitating digestible writings, we've come full circle in terms of proving consciousness, and some are wondering... "Gee, this communicated the idea expertly, with nuance and clarity.... but is the machine actually conscious? Does it think undependably of the world, or is it merely a kaledascopic reflection of its inputs? Is consciousness real, or an illusion of complexity?"


I’m not sure why it’s so mind-boggling that people in the year 1225 (Thomas Aquinas) or 1756 (Mozart) were just as creative and intelligent as they themselves are, as modern people. They simply had different opportunities then comparable to now. And what some of them did with those opportunities are beyond anything a “modern” person can imagine doing in those same circumstances. _A lot_ of free time over winter in the 1200s for certain people. Not nearly as many distractions either.


Saying early humans weren’t conscious because they lacked complex language is like saying they couldn’t see blue because they didn’t have a word for it.


Well, Oscar Wilde argues in “The Decay of Lying” that there were no stars before an artist could describe them and draw people’s attention to the night sky.

The basic assumption he attacks is that “there is a world we discover” vs “there is a world we create”.

It is hard paradigm shift, but there is certainly reality in “shared picture of the world” and convincing people of a new point of view has real implications in how the world appears in our minds for us and what we consider “reality”


It should be almost obligatory to always state which definition of consciousness one is talking about whenever they talk about consiousness, because I for example don't see what language has to do with our ability to experience qualia for example.

Is it self awarness? There are animals that can recognize themselves in mirror, I don't think all of them have a form of proto-language.


Llama are not conscious


Not sure we have enough data for any pre-internet date.


That would be hysterical


with web search, is knowledge cutoff really relevant anymore? Or is this more of a comment on how long it took them to do post-training?


In my experience, web search often tanks the quality of the output.

I don't know if it's because of context clogging or that the model can't tell what's a high quality source from garbage.

I've defaulted to web search off and turn it on via the tools menu as needed.


Web search often tanks the quality of MY output these days too. Context clogging seems a reasonable description of what I experience when I try to use the normal web.


THIS. I do my best work after a long vigorous walk and contemplation, while listening to Bach sipping espresso. (Not exaggerating much.) If I go on HN or slack or ClickUp or work email, context is slammed and I cannot do /clear so fast. Even looking up something quick on the web or an LLM causes a dirtying.


I feel the same. LLMs using web search ironically seem to have less thoughtful output. Part of the reason for using LLMs is to explore somewhat novel ideas. I think with web search it aligns too strongly to the results rather than the overall request making it a slow search-engine.


That makes sense. They're doing their interpretation on the fly for one thing. For another just because they now have data that is 10 months more recent than their cutoff they don't have any of the intervening information. That's gotta make it tough.


Web search is super important for frameworks that are not (sufficiently?) in the training data. o3 often pulls info from Swift forums to find and fix obscure Swift concurrency issues for me.


In my experience none of the frontier models I tried (o3, Opus 4, Gemini 2.5 Pro) was able to solve Swift concurrency issues, with or without web search. At least not sufficiently for Swift 6 language mode. They don’t seem to have a mental model of the whole concept and how things (actors, isolation, Tasks) need to play together.


> They don’t seem to have a mental model of the whole concept and how things (actors, isolation, Tasks) need to play together.

to be fair, does anyone ¯\_(ツ)_/¯


This. It’s a bunch of rules you need to juggle in your head.


I haven't tried ChatGPT web search, but my experience with Claude web search is very good. It's actually what sold me and made me start using LLMs as part of my day to day. The citations they leave (I assume ChatGPT does the same) are killer for making sure I'm not being BSd on certain points.


How often you actually check the citations? They seems to confidentally cite things but then they also say different things what source has.


It depends on the question. I was having a casual chat with my dad and we wondered how Apple's revenue was split amongst products, and it was just to chat about so I didn't check.

On the other hand, I got an overview of Postgres RLS and I checked the majority of those citations since those answers were going to be critical.


That’s interesting. I use the API and there are zero citations with Claude, charGPT and Gemini. Only Kagi assistant gives me some, which is why I prefer it when researching facts.

What software to you use? The native Claude app? What subscription do you have?


Claude directly (web and mobile) with the Pro ($20) subscriptions.

I found it very similar to Kagi Assistant (which I also use).


Kagi really helps with this. They built a good search engine first, then wired it up to AI stuff.


I also find that it gets way more snarky. The internet brings that bad taint.


Completely opposite experience here (with Claude). Most of my googling is now done through Claude- it can find and digest a d compile information much quicker and better than I'd do myself. Without web search you're basically asking an LLM to pull facts out of its ass- good luck with trusting the results.


It still is, not all queries trigger web search, and it takes more tokens and time to do research. ChatGPT will confidently give me outdated information, and unless I know it’s wrong and ask it to research, it wouldn’t know it is wrong. Having a more recent knowledge base can be very useful (for example, knowing who the president is without looking it up, making references to newer node versions instead of old ones)


The problem, perhaps illusory that it's easy to fix, is that the model will choose solutions that are a year old, e.g. thinking database/logger versions from December '24 are new and usable in a greenfield project despite newer quarterly LTS releases superseding them. I try to avoid humanizing these models, but could it be that in training/posttraining one could make it so the timestamp is fed in via the system prompt and actually respected? I've begged models to choose "new" dependencies after $DATE but they all still snap back to 2024


The biggest issue I can think of is code recommendations with out of date versions of packages. Maybe the quality of code has deteriorated in the past year and scraping github is not as useful to them anymore?


Knowledge cutoff isn’t a big deal for current events. Anything truly recent will have to be fed into the context anyway.

Where it does matter is for code generation. It’s error-prone and inefficient to try teaching a model how to use a new framework version via context alone, especially if the model was trained on an older API surface.


I wonder if it would even be helpful because they avoid the increasing AI content


This is what I was thinking. Eventually most new material could be AI produced (including a lot of slop).


Still relevant, as it means that a coding agent is more likely to get things right without searching. That saves time, money, and improves accuracy of results.


It absolutely is, for example, even in coding where new design patterns or language features aren't easy to leverage.

Web search enables targeted info to be "updated" at query time. But it doesn't get used for every query and you're practically limited in how much you can query.


Isn’t this an issue with eg Cloudflare removing a portion of the web? I’m all for it from the perspective of people not having their content repackaged by an LLM, but it means that web search can’t check all sources.


Web pages become prompt, so you still need the model to analyze


I've been having a lot of issues with chatgpt's knowledge of DuckDb being out of date. It doesn't think DuckDb enforces foreign keys, for instance.


Yes, totally. The model will not know about new versions of libraries, features recently deprecated, etc..


Question: do web search results that GPT kick back get "read" and backpropagated into the model?


Right now nothing affects the underlying model weights. They are computed once during pretraining at enormous expense, adjusted incrementally during training, and then left untouched until the next frontier model is built.

Being able to adjust the weights will be the next big leap IMO, maybe the last one. It won't happen in real time but periodically, during intervals which I imagine we'll refer to as "sleep." At that point the model will do everything we do, at least potentially.


Falling back to web search is a crutch, its slower and often bloats context resulting in worse output.


Yes, because it may not know that it needs to do a web search for the most relevant information.


Gemini does cursory web searches for almost every query, presumably to fill in the gap between the knowledge cutoff and now.


I had 2.5 Flash refuse to summarise a URL that had today's date encoded in it because "That web page is from the future so may not exist yet or may be missing" or something like that. Amusing.

2.5 Pro went ahead and summarized it (but completely ignored a # reference so summarised the wrong section of a multi-topic page, but that's a different problem.)


I always pick Gemini if I want more current subjects / info


funny result of this is that GPT5 doesn't understand the modern meaning of Vibe Coding (maximising llm code generation), it thinks it "a state where coding feels effortless, playful, and visually satisfying" and offers more content around adjusting IDE settings, and templating.


And GPT-5 nano and mini cutoff is even earlier - May 30 2024.


maybe OpenAI have a terribly inefficient data ingestion pipeline? (wild guess) basically taking in new data is tedious so they do that infrequently and keep using old data for training.


Does this indicate that OpenAI had a very long pretraining process for GPT5?


Maybe they have a long data cleanup process


Perhaps they want to extract the logic/reason behind language over remembering facts which can be retrieved with a search.


Does the knowledge cut off date still matter all that much since all these models can do real time searches and RAG?


the model can do web search so this is mostly irrelevant i think.


That could means OpenAI does not take any shortcuts when it comes to safety.


  > GPT-5 knowledge cutoff: Sep 30, 2024
  > Gemini 2.5 Pro knowledge cutoff: Jan 2025
  > Claude Opus 4.1: knowledge cutoff: Mar 2025
A significant portion of the search results available after those dates is AI generated anyway, so what good would training on them do?


Latest tech docs about a library which you want to use in your code.


So, JavaScript vibe coding. Got it.

Honestly, maintaining software for which the AI knowledge cutoff matters sounds tedious.


> The purpose of Stallman’s open source movement

My understanding is that the purpose of Stallman's free software movement is "that the users have the freedom to run, edit, contribute to, and share the software." The FSF is focused on "defending the rights of all software users." Its about the users, not the developers.


Sounds like https://news.ycombinator.com/item?id=27454589

"I currently have 10 fully remote engineering jobs"


One of the comments there raises an interesting question: this is plain old criminal fraud, right? (Whether this person is actually telling the truth is another matter, of course.)


chess.com ranks #195 whereas lichess.org ranks #1546 on Alexa's top sites.

https://www.alexa.com/siteinfo/chess.com

https://www.alexa.com/siteinfo/lichess.org


If anyone is looking for a way to merge games from lichess.org and chess.com: I'm developing a website where you can link your accounts to view stats for all of your games. It's free and currently in Beta: https://www.chessmonitor.com/

Here is an example for the current world champion: https://www.chessmonitor.com/u/kcc58R9eeGY09ey5Rmoj


I've been wishing someone would make something like this! Always frustrated me that tools like aimchess and openingtree only allow you to look at the two sources separately.. supposedly, at least in aimchess's case, the reason being that the rating systems are different.

How do you unify ratings?

Edit: ah I see you don't, makes sense I guess. Might be interesting to have them both on the same graph even if the y axis is different..


As you say, I currently don't. In the future I might check the correlation between chess.com/lichess (when enough users register). Then I should be able to even calculate the chess.com rating from a lichess rating and vice versa.

But there are many other features I want to implement first. I'm currently more focused on the statistics part than on the chess.com/lichess relation.


Yeah definitely should be the focus! Would be excited to see some of the stats aimchess has implemented.. performance in various phases of the game.. tactics.. advantage capitalization etc.

I also found that when clicking through to an opening on your site, it would always say "no games find at this position". Bug?


Thanks for the feedback. It's not really a bug. More of a communication problem... ;)

The openings page list your openings for white and black. If you click on an opening it takes you to the explorer which shows the stats for only one color (white by default). Therefore, if you play an opening for black a lot it will appear in the list of your openings. But when you click on the opening, the explorer will show your stats for white.

There is also another problem, that the detection of openings (on the openings page) respects transpositions [1] while the explorer does not.

Maybe I'll remove the link to the explorer as this seems to cause a lot of confusion...

[1] https://en.wikipedia.org/wiki/Transposition_(chess)


Previous discussion on the original article (447 comments) https://news.ycombinator.com/item?id=27456500


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: