> Now many site owners are trying to put technical obstacles to competitors who ...

grepfru_it · on Jan 29, 2020

Yes you can scrape them, no you cannot repubilsh them. Everything you listed is protected by copyright. You cannot infringe on copyrights because of this ruling.

>hiQ argued that LinkedIn’s technical measures to block web scraping interfere with hiQ’s contracts with its own customers who rely on this data. In legal jargon, this is called” malicious interference with a contract”, which is prohibited by American law

Does this mean that Google's random recaptcha check is interference?

peeters · on Jan 29, 2020

I think any ruling that says LinkedIn can't put in protectionary measures against automated requests is doomed to be overturned, as long as they're not doing it discriminately. Captcha, rate limiting, user agent testing, etc are all common tools to protect against malicious/unintentional denials of service. The question is what was LinkedIn doing, and did it specifically target hiQ while permitting others of the same class of traffic.

buboard · on Jan 29, 2020

Why would it be an issue if it is discriminatory? Linkedin can use its servers any way they like, unless they ve promised their users that their data can be scraped indiscriminately

matttb · on Jan 29, 2020

Because of the court case. This is just an injunction pending an actual decision.

gyvastis · on Jan 29, 2020

I'm curious how entities like https://www.omdbapi.com/ can continue their activity, get $$$ and not get shut down.

securingsincity · on Jan 29, 2020

Yeah what is the line here? Would it be against the rules to block known user agents, throttling of traffic?

papln · on Jan 29, 2020

No, because what one side of a case argues is not the law. What judges decide is the law.

JaceLightning · on Jan 29, 2020

Probably not. Facts aren't copyrightable but creative works are.

So prices on Amazon.com are facts. User reviews are creative so probably copyrighted.

Similarly the videos on YouTube are copyrighted. However the number of views and the number of likes are probably scrapable.

kabacha · on Jan 29, 2020

See that's where I have problem with this. Isn't data just _data_?

Lets draw some pararells to real life. If I go to public space like town square - can't I take pictures, notes and records then go home and draw my analytics from it? What if I read something in a book I bought, can't I quote it?

Same thing should be with web resources even if they are creative - as long as I don't publish them I should be able to scrape whatever public resources I want and use them in my analytics, machine learning or whatever.

contravariant · on Jan 29, 2020

This is why I strongly prefer the Dutch term 'auteursrecht' (author's rights) as opposed to copyright. Copyright has this annoying incorrect connotation that it has anything to do with copying when it's really publishing that it should be limiting.

Downloading publicly available data should (by definition of public) not be a violation of someone's rights. However it's easy to see why it wouldn't be desirable for someone to republish creative works as their own, so it's reasonable to give the author control over how their work should be published.

And in the case of price data or similar you would be hard pressed to deem anyone the 'author' of it, hence it would be weird to enforce the author's rights.

pbhjpbhj · on Jan 29, 2020

>Copyright has this annoying incorrect connotation that it has anything to do with copying when it's really publishing that it should be limiting. //

Copyright does make _copying_ tortuous. Broad personal use exceptions in USA, for example, make this appear not to be true, but it is the act of copying - even without publication - that is protected in general.

Ripping a CD in UK, for example is copyright infringement without a general personal use exception (there are exceptions, under Fair Dealing, but whatever you're doing almost certainly doesn't fall into them).

See eg UK CDPA1988, Chapter II, section 16(1)(a); or USC17, Chapter 1, 106(1).

jermaustin1 · on Jan 29, 2020

You are discussing the fair use provisions of copyright law.

Not a lawyer, but:

You can do all of that, but:

You cannot scan the book you bought, and put it on your website for sale or even free - unless it's copyright is up or you are given permission by the copyright holder.

You can not take a picture of someones painting in high detail, then sell prints of it - unless it's copyright is up or you are given permission by the copyright holder.

Angostura · on Jan 29, 2020

In addition, there are some buildings and landmarks that you can't simply take photos of and then resell

https://www.rd.com/advice/travel/eiffel-tower-illegal-photos... http://www.photographers-resource.co.uk/photography/Legal/Ac...

lopmotr · on Jan 29, 2020

Your examples are really wanting greater freedom to copy rather than about the distinction between data and creative work. Copyright is supposed to encourage people to make creative work, not encourage people to record existing facts. I think this distinction is important because creative work isn't actually necessary to anyone else - they could create their own different one if they wanted. But data might only have one correct value and if that was locked away by copyright, it would limit other people's ability to do things that can't be done with some different data.

greenshackle2 · on Jan 29, 2020

As far as law is concerned, data is not just data -- bits have colour:

https://ansuz.sooke.bc.ca/entry/23

dillonmckay · on Jan 29, 2020

Additionally, some public areas prohibit photography of architecture because of copyright.

https://www.diyphotography.net/10-famous-landmarks-youre-all...

SAI_Peregrinus · on Jan 29, 2020

> Isn't data just _data_?

Think of Law around data as using dependent types. The legal protections depend on the type of the data, and the type depends on the content (among other things). You have to determine the type BEFORE you can tell what the law says about it, since the law only cares about the type. You could probably encode the law nicely with something like Idris, but any "code as law" type governance system without dependent types won't be able to express existing law.

pc86 · on Jan 29, 2020

> Isn't data just data?

No. At the risk of just repeating the comment you didn't understand, creative works are not "just data" - they are copyrightable works that the owner has control over who can use them, not just for profit, but for any reason with few exceptions.

You don't just get to drop someone else's work product into your algorithm without their permission.

rpedela · on Jan 29, 2020

There are cases where "dropping into your algorithm" would count as fair use such as a search engine of copyrighted content.

kabacha · on Jan 29, 2020

> You don't just get to drop someone else's work product into your algorithm without their permission.

Why not?

ncallaway · on Jan 29, 2020

Because copyright law exists.

dkarras · on Jan 30, 2020

I don't think using data as input to an algorithm necessarily breaks copyright law.

I can read a book to post my impression on it somewhere right? I can read it and say "it was beautiful" on twitter.

I can then automate my "taste meter" through machine learning, it reads a given book character by character, and spits out what I'd think of it if I actually read it. Then posts it on twitter, says "it was beautiful".

Did I break copyright law? I don't think so.

wtetzner · on Jan 29, 2020

You can't take something copyrighted by someone else and re-distribute it without their permission. However, I suspect you can capture it freely if you don't re-distribute it.

amelius · on Jan 29, 2020

I think the fashion industry should exert their right to have their work removed from photographs.

pc86 · on Jan 29, 2020

Neat straw man but you're actually proving my point. There are scenarios under which they can't do that (fair use) but there are also many scenarios where they would be entirely within their right to do so.

chii · on Jan 29, 2020

> User reviews are creative so probably copyrighted.

I wonder if the number of stars are copyrighted. It's not creative, but a fact.

mantap · on Jan 29, 2020

Probably not since each star review is a separate "work" by a separate author. Mechanically combining multiple non-copyrightable things into one doesn't make it copyrightable. If Amazon arranged their users' star reviews into an infographic that would be copyrightable.

amelius · on Jan 29, 2020

Why would a review be a copyrightable creative work, while a LinkedIn resume wouldn't be?

dec0dedab0de · on Jan 29, 2020

I think perhaps the layout, cover letter, and maybe any flourishing notes are copyrightable, but the actual details of work experience and education are not.

wtetzner · on Jan 29, 2020

Yeah, I would think the "description" section for each job would be copyrightable, but the simple "title", "company", "year" fields would not be.

ikeboy · on Jan 29, 2020

There's some huge datasets of Amazon reviews available. Stanford has a big scrape out there, plus there's one from Amazon themselves in the AWS datasets.

Jolter · on Jan 29, 2020

Youtube videos are definitely protected by copyright, though.

dillonmckay · on Jan 29, 2020

In theory, right?

See the South Park WWITB issue.

I believe South Park used a videoclip from youtube, and Youtube’s ContentID system removed the video South Park had used, because Youtube considered it a violation of South Park’s copyright.

dylan604 · on Jan 29, 2020

Just because YouTube gets it wrong doesn't mean it's just theory. YouTube is not the only site that has automated content scanning for copyright violations. Getty and other photo sites have gotten this wrong in the same way by sending C&D letters for violations to the actual copyright holders.

dillonmckay · on Jan 29, 2020

I was specifically discussing Youtube.

KaoruAoiShiho · on Jan 29, 2020

Shouldn't the copyright belong to the creator not to youtube? Basically youtube shouldn't be able to sue you, it should be up to the creator to do so.