Hacker Newsnew | past | comments | ask | show | jobs | submit | njl's commentslogin

Kyra Health | Python/Typescript | Remote (USA) | Full Time

We believe there is a significant opportunity to improve healthcare quality and cost for employers, employees, and insurance companies; regulatory changes and LLMs mean it is time to build. We think we can provide a better experience for less money, and have multiple positive impacts on society. We have the idea, the funding, and the skills and knowledge to make it happen. We're looking for a few more strong engineers to help us build it.

Responsibilities

- Research, plan, scope, and design solutions to difficult problems while navigating laws, regulations, and standards in money movement and healthcare.

- Produce elegant and simple code that is architected and written for the long haul.

- Speed up overall development efforts.

- Help build a development environment and operational tooling that will be the envy of companies many times our size.

Must Haves

- Strong Python or TypeScript skills, or a strong background in another dynamic language.

- A burning desire to get things out the door and in the hands of users.

- Strong communication skills.

Reach out to me at n@kyra.health, or kick the process off at https://enghiring.kyra.health


Kyra Health | Python/Typescript | Remote (USA) | Full Time

We believe there is a significant opportunity to improve healthcare quality and cost for employers, employees, and insurance companies; regulatory changes and LLMs mean it is time to build. We think we can provide a better experience for less money, and have multiple positive impacts on society. We have the idea, the funding, and the skills and knowledge to make it happen. We're looking for a few more strong engineers to help us build it.

Responsibilities

- Research, plan, scope, and design solutions to difficult problems while navigating laws, regulations, and standards in money movement and healthcare.

- Produce elegant and simple code that is architected and written for the long haul.

- Speed up overall development efforts.

- Help build a development environment and operational tooling that will be the envy of companies many times our size.

Must Have

- Strong Python or TypeScript skills, or a strong background in another dynamic language.

- A burning desire to get things out the door and in the hands of users.

- Strong communication skills.

Reach out to me at n@kyra.health.


Unlaunched Healthcare Startup | Python/Typescript | Remote (USA) | Full Time

We believe there is a significant opportunity to improve healthcare quality and cost for employers, employees, and insurance companies; regulatory changes and LLMs mean it is time to build. We think we can provide a better experience for less money, and have multiple positive impacts on society. We have the idea, the funding, and the skills and knowledge to make it happen. We're looking for a few more strong engineers to help us build it.

Responsibilities

- Research, plan, scope, and design solutions to difficult problems while navigating laws, regulations, and standards in money movement and healthcare.

- Produce elegant and simple code that is architected and written for the long haul.

- Speed up overall development efforts.

- Help build a development environment and operational tooling that will be the envy of companies many times our size.

Must Have

- Strong Python or TypeScript skills, or a strong background in another dynamic language.

- A burning desire to get things out the door and in the hands of users.

- Strong communication skills.

Apply at https://www.unlaunchedhealthcarestartup.com/, or reach out to me at njl@njl.us.


Spotify actually has (at least) a few hundred pools of money; one of the pools might be "Free listeners in the US", one might be "Moldovan listeners who have Spotify as a bundle with this particular cell phone provider". A fixed percentage of that money is set aside for the rights holders for the performance (master rights) and the rights holders for the music (publishing rights).

The allocated money is then passed to the rights holders, based on the percentage of listens they received within users in that pool. Individual rights holders can negotiate additional bumps, and Spotify has some exceptions for what needs to be counted.

I know that the idea that money should go from a listener directly to the bands that user listens to, instead of splitting up the pool, feels like a pretty interesting one. Unfortunately, folks who listen to a lot of music tend to listen to smaller and more interesting bands as well. This change wouldn't necessarily result in more money in the pockets of smaller artists; it's not inconceivable at all that the rich would get richer.

It's also worth mentioning that the average person who spent money on music at all, prior to the invention of streaming, spent far less annually than they spend on streaming now.

I think artists are reacting to a lot of issues. The entire music industry is built around signing absolutely disgusting contracts with deeply talented and fundamentally naive 18 year olds who are effectively replaceable by the next new thing. This is compounded by the fact that most artists don't have a relationship with Spotify; they have a relationship with their record label who has a relationship with Spotify, and the label isn't really interested in giving an artist a full accounting of what they've brought in from Spotify.

Finally, and most importantly, there is a real disconnect between the social impact of music and the actual size of the industry. There's a lot of money there, sure, but the music industry is like a professional sports league in terms of total revenue. A few superstars are going to suck up a lot of the money and not leave much for the bench players, let alone the folks three leagues down from the "show".


There are a multitude of different processes and procedures to deal with different kinds of fertility issues. It isn't uncommon to do a "fresh" transfer after a handful of days. Genetic testing is generally only going to be offered if you do a frozen transfer. Not all embryos are of high enough quality to survive being frozen, and a live transfer might be the only option.


Spotify has internal fingerprinting stuff that can handle this easily enough. The shutdown is entirely due to pressure from the record labels. It's a shame; this was a surprisingly difficult product to get off the ground, and a lot of folks put in heroic efforts to make it work.


Apparently not, because it's number one streamed track "Old Town Road" never cleared the Nine Inch Nails sample it used.

https://www.digitalmusicnews.com/2019/05/29/lil-nas-x-sample...


Think about it from the artist's point of view. What financial incentive does an unknown artist have to clear samples?

On one end of the spectrum, you create a song using an uncleared sample and it goes nowhere. Nine Inch Nails never hears about your song, and neither does the public.

On the other end of the spectrum, your song goes viral and makes you one of the most popular artists in the country.

Even if Nine Inch Nails claims every cent you make from the song, you have already accomplished your goal. You're a popular artist and now have a platform.You can sell merch, sign sponsorship deals, and get paid to advertise for brands on social media. You can put out new music with all your ducks in a row and go on to have a very successful career.

Why go through the trouble of clearing your sample when the chances of your song becoming popular are very slim, and even if it does become a hit, the reward far outweighs the risk?

Outside of ethics, there is no reason for an unknown artist to clear samples when they are trying to make it in the music industry.

It should also be noted that both the creator of Old Town Road(Lil Nas X) and Nine Inch Nails are currently under Columbia records.


You make an excellent point, and even if the song goes viral they are still unlikely to make any money from the song itself, regardless of whether they cleared samples. Presumably any physical merchandise they sell is clear of anyone else's copyright.


There are theoretical risks, depending where you are based: Court costs Injunctions Risk of Punitive damages

They may share a record label, but do they share the same publisher?


Music labels are strangling culture in America. Please pirate music to avoid funding music industry lobbyists in their efforts.


I donate lots of money monthly to my private music trackers because it's such a superior experience for music lovers, far better than Spotify and the like. I wish the labels would let these ecosystems emerge into the light, because I'd pay $100 a month for a legitimate version of Redacted...


Pedantic nitpick: it's not just music labels, but copyright trolls/racketeers in general; music is just the industry where it's most obvious/egregious.


Parasites exist on every industry. Left unchecked, they multiply and eventually kill the host.

Instead of pirating: support local artists, go to concerts, Bucky their march from the stalls and their self-pressed cds.


This is uninformed and off-topic.


How so?


Maybe labels are bad, but piracy is far worse for artists -- it reduces their incentive to produce. That's why, for instance, big bands have become such an endangered species. Almost nobody can afford to hire, say, a horn section today.


Big bands were essentially killed [0] by cheaper multitrack recording, synthesizers and samplers in the 80s, when there were no such things as torrent trackers. Even the wonderful although hugely inefficient Napster was more than a decade away. Today artists get most money from gigs, while record labels profit the most from printed music sales and royalties.

[0] The correct term should be "relegated to niche performances". If you love big bands you can indeed listen to them somewhere today, the difference being that mostly for economical reasons they're not anymore the default method used for example to make a film score.


> Big bands were essentially killed by cheaper multitrack recording, synthesizers and samplers

Excellent point.


Independent distribution is still extremely cheap and easy - there are still platforms out there that will upload music to Spotify and many others for free or for a small annual fee. Spotify offering the service directly isn't necessary for distribution to be accessible to all.


>Spotify has internal fingerprinting stuff that can handle this easily enough.

I'll admit I don't know about Spotify's inner workings but my understanding is that Spotify itself does not have its own fingerprinting algorithm. Instead, they rely on their distribution partners like DistroKid and CD Baby to use fingerprint services and prevent copyright violations. (DistroKid and CD Baby license the fingerprint algorithm from Audible Magic[1]. Google also initially licensed Audible Magic software before they wrote their own version of ContentID.)

I'm also not sure audio fingerprinting technology is enough. Even though Youtube's ContentID is a state-of-the-art algorithm with the largest fingerprint database, Google was still afraid it wouldn't be enough to satisfy the EU which is why they fought the passage of Article 13.

Maybe the EU Article 13 isn't the sole reason but it certainly gives more ammunition to copyright holders to use against Spotify's catalog of potentially unlicensed content and copyright violations.

>The shutdown is entirely due to pressure from the record labels.

Ok, let's say that the primary reason was pressure from the record labels. We need to dissect that "pressure" into smaller concepts.

To add precision to the discussion we can separate 2 different types of entities:

- record labels[2] : Universal, Sony, Warner, and EMI

- distribution partners[3] : DistroKid, CDBaby, EmuBands, etc

Look at the webpage[3] of non-record-labels that are distribution partners and they mention "protecting against infringement" and "infringement" -- 4 times. Also, Spotify's blog post[4] again mentions "protect artists from infringement".

For Spotify to offer direct uploads, they'd have to manage the extra logistical hassles of policing copyright violations that was previously outsourced to DistroKid & CD Baby.

I agree that record labels (Universal/Sony/Warner) can exert pressure... but what exactly is that pressure about? I believe it's preventing copyright infringement from crowdsourced random uploaders. If it's not about "protecting copyrights" but some other reason that's keeping record labels happy, what's that "other reason"?

[1] https://www.audiblemagic.com/

[2] https://www.google.com/search?q=biggest+record+labels

[3] https://artists.spotify.com/guide/your-music

[4] https://artists.spotify.com/blog/we're-closing-the-upload-be...


Record labels are gatekeeping and defending their moat. The existence of their industry relies on friction in the music distribution system - the easier it is to release music own your own, the less money they will make.


You're not obliged to sign to a record label.


Correct, which is why the labels try to do things like this. If the labels pressure Spotify to kill the ability for independent artists to upload their own music, then they only way you get on Spotify is to sign with a record label. And if being on Spotify is important to you as an artist, then yes, you are indeed obliged to sign with a label.


> Restrict processing – in your admin panel where there’s a list of users, there should be a button “restrict processing”. The user settings page should also have that button. When clicked (after reading the appropriate information), it should mark the profile as restricted. That means it should no longer be visible to the backoffice staff, or publicly. You can implement that with a simple “restricted” flag in the users table and a few if-clasues here and there.

The simple hubris in this statement is jaw-dropping. “Just a flag and a few if clauses! Easy peasy!”


This article is one of the best I've seen for describing actual features that you need to build.

I agree that the specific language here is poorly chosen ("simple" and "a few if-clauses" are perilously close to the word "just") but I don't think that should detract from the enormous value the article itself provides.


If only it was that easy. A reasonable reading of GDPR makes standard web server logs (which contain IP addresses) a punishable offense, even if you don’t have a nexus in Europe.

GDPR is a wonderful idea that will be insanely expensive to comply with, act as a continuous drag on developing new technologies, and end up offering only nominal protection to end users. This is just going to be another way for EU regulators to smack around Google and Facebook. They probably deserve it, but the potential fallout for the rest of us is really going to hurt.

Don’t get me wrong, treating user data with respect is the right thing to do. But we’re all going to be paying for this overly broad and under specified legislation for years to come.


Web logs are not a punishable offence under the GDPR, if you have a legal basis for retaining those logs and reasonable retention and data minimisation policies. If those are in place and you've documented them, you have nothing to worry about.

Why? You have a legitimate interest (one of the six legal bases under the GDPR) to combat fraud and maintain information security. That's the primary reason you have those IPs in your logs in the first place.

If you're using those logs for analytics purposes, things get slightly murkier, but if you're just using IP addresses to enrich your log data with GeoIP, you should be fine. You might even be able to get away with more granular third-party databases, but the more detailed you get, the closer you get to profiling (which is not where you want to be, if you want to minimise your legal fees).

More to the point, I don't understand all this talk about web logs being illegal. If people have collected and processed personal data without thinking about the whys and wherefores, isn't it just a good thing this makes one think about what one is logging and what it's used for? Granted, IP addresses are far from sensitive (depending on your threat model), but I've seen things in technical logs that make me happy about reliable automated retention policies. Also, granted, it's a hassle - that's the price you pay for privacy.

I'd still be glad if nginx et al shipped with more GDPR-compatible defaults.


> If people have collected and processed personal data without thinking about the whys and wherefores, isn't it just a good thing this makes one think about what one is logging and what it's used for

If people are creating software that burns fossil fuels without thinking about the whys wouldn't it be a good thing to have a law that regulates how we use electricity? Shouldn't an EU regulator have input on whether you can release your new blockchain app? You should be fine if its purpose falls into one of the covered categories...

People are creating online communities that enable abuse of members. Do we need statues and regulations to mandate abuse protections in online interactions and punish platforms that allow users to abuse other users?


> If people are creating software that burns fossil fuels

They aren't. Only hardware burns fossil fuels, and computing hardware doesn't inherently do so, for the most part, only if you choose to hook it up to a fossil fuel power plant rather than something else; the software isn't the thing directly to address.

OTOH, the personal data use you are drawing a poor analogy to is the direct point of concern.


I don't want to torture this metaphor any further, but you're kinda proving my point that software developers do not consider the energy and environmental impact of their work. Software that uses significant CPU time uses more electricity and is worse for the environment.

Misuse of personal data is a problem. Wasting electricity is a problem. Online harassment is a problem.


If wasting electricity becomes such a big problem for the society as misuse of personal data already is, sure, let's introduce regulations on that, too.

In some European countries, there are regulations already on how to insulate new buildings to avoid energy waste.


> If wasting electricity becomes such a big problem for the society as misuse of personal data already is, sure, let's introduce regulations on that, too.

Sure, what could go wrong there? Regulator, "We're going to need to look closer at that for-loop to see if it complies. And you do realize that n+1 queries are a violation of EU law?"


Your example is obviously unrealistic, but even when buying it, it rather supports my position. Imagine a regulator indeed pointing out where you can optimize your algorithms and thus save energy, money and achieve faster query processing. What is the problem with that?

Fire safety regulator: "we're going to need to look closer at that door seal glue component to see if it complies...". Nobody complains here about a regulator looking into details.


When stuff like this comes up it always seems so weird to me that with all the work that regulators put into this, why can't they at least scratch the surface of providing some specific examples? Of course there are legal documents, and maybe some "for dummies" versions written up about it.

But would it be so crazy for these regulators to hire someone who knows something about commonly used open source software and building web apps, to help provide a little bit of actionable technical advice? For instance, the majority of the internet is running on Apache or Nginx, why not have an official, EU-sponsored blog post explaining "here's how to set up a LAMP stack, or nginx and rails on a linux server, that complies with GDPR". Of course they can't cover every obscure language or framework, but it would be a starting point. And it would probably end up a lot cheaper than having to investigate and/or penalize people who didn't read the fine print of the law and/or didn't understand how it translates to actually running software.

Because despite how "simple" this post is saying these laws are, there still seems to be quite a bit of confusion on this thread, among smart developers, about questions like whether or not we're allowed to keep collecting webserver logs in the default format or not.


As others have noted: Laws with examples would be to specific to survive fast technological changes. Laws do mostly contain the 'spirit' of the idea and are applicable to many different situations and times.

But the European Commission does gives examples: https://ec.europa.eu/info/law/law-topic/data-protection/refo...

This is of course no nginx configuration. But the thing is.. there is no one size fits all example configuration. The situation depends on: 1) What do you use the data for? 2) How long do you really need it? 3) Can you securely handle it? 4) Has the user consented?

Saving ip adresses in log files can be fully complaint IF you only use them for legal reasons (sue an attacker, ...), have severe access restrictions on the files, delete them as fast as possible and get consent from the user prior to saving the logs.

It depends on your goal, workflow and abilities if you are allowed to store this data, and you must decide for yourself. If in doubt.. don't store it.


>Saving ip adresses in log files can be fully complaint IF you only use them for legal reasons (sue an attacker, ...), have severe access restrictions on the files, delete them as fast as possible and get consent from the user prior to saving the logs.

You do not need consent for saving the IP, user agent and URL (including GET values) in Apache logs because, as someone said above, you have a "legitimate interest to combat fraud and maintain information security".

Legitimate interest and consent are only 2 of the 6 legal bases under which you can collect and store (process) personal data. Art. 6 contains all 6 https://gdpr-info.eu/art-6-gdpr/ .


> When stuff like this comes up it always seems so weird to me that with all the work that regulators put into this, why can't they at least scratch the surface of providing some specific examples?

Technology is something which constantly changes. From the point of view of the legislator, legal text that is too concrete will stagnate innovation and progress by "locking" people into current technological assumptions. The text becomes inappropriate/outdated when the next wave of technologies come along.

Thus legislators try to document the spirit behind a legislation and try to stay away from concrete implementation details as much as possible, in order to give people maximum freedom to decide how they should implement things, and maximum freedom in technology choices.

So yes, to us implementors it is a hassle because we have no idea what we should concretely do. But we can also see this as freedom to explore how to best implement an idea.

I expect that in the next few months/years, domain experts such as us will debate and decide on implementation best practices.


That doesn't work though. Sure, if it was some industry initiative then a broad statement of intent and people figure out the details as they go would be OK.

But this one comes with massive, company destroying fines attached.

If you and other domain experts debate and decide on a best practice, and then some EU commissioner disagrees and destroys your company with a fine you cannot pay, will you be so sure that vague laws are a good idea then? Will it seem like freedom to explore, or will it seem more like walking through a minefield?

The EU wants to regulate the precise details of data handling in software firms. It can do that. But it's trying to have its cake and eat it - micromanaging the tech industry at the same time as refusing to be precise about what it wants. It just expects everyone to intuit what they want, on pain of corporate death if you fail.


> company destroying fines

They are not company destroying for large companies though. By raising fixed cost (and risk) of doing business, regulations of this kind are an absolute godsend for large companies.


There’s more to that. Startups now exists as a constellation of services and it’s quite hard to tell what goes into a PIA document and whar not.

Say our landing web page contains an intercom chat widget and google analytics tracking.

At that point we have collected the user ip at most, which would become sensitive only if connected with data from two other businness entities.

What the heck am I supposed to write into the damn thing now?


Ask your chat provider if he is GDPR compliant, he will provide you the confirmations that you need to add to your page. Regarding google analytics, you are risking getting banned if you feed it with personal data (including ip).

https://gdpr.report/news/2018/02/01/gdpr-google-analytics-2/

If I were you, I would add my own chat (there is bunch of them on github) and use piwik instead of google analytics.

(By the rule of the thumb, for each 3rd party provider, ask them about gdpr compliancy and purge all the data you are not getting user consent - GDPR is retroactive)


There are several grounds on which you can legally process data in addition to consent, so it is unhelpful to talk in general terms about purging data where you are not getting user consent. If you are using data to provide a service, then generally it will not a consent-based processing for example.

You have to assess each use to which you put any personal data and determine the correct processing basis for that usage. Often there are more relevant bases than consent.

I do appreciate that the definition of 'consent' in this regard is often thought of in different terms though. When I think of consent I think of the narrow data protection consent, whereas I think often in layman's terms it has a broader definition which is often linked to disclosure requirements in relation to privacy policies etc.


> But would it be so crazy for these regulators to hire someone...to help provide a little bit of actionable technical advice?

Didn't you know? They do. Large corporations are always happy to help regulators write laws in such a way as to benefit them to fend of those pesky innovators. [1] Raising compliance costs as high as possible is highly desired by large companies.

[1] https://en.wikipedia.org/wiki/Regulatory_capture


And give up all those expensive multi year court cases for there lawyer friends :-)


> This is just going to be another way for EU regulators to smack around Google and Facebook.

Actually, it's more like a giant gift to Google and Facebook: GDPR borders on regulatory capture, with only the giants really having the resources to comply properly. This will hurt startups and smaller firms far more than it will the big dogs with their armies of compliance lawyers.


That assumes enforcement will be homogeneous.


I'm sure there will be a lot of hipster-trolls suing left and right, trying to make a name for themselves.


This isn't the US, the law is enforced by governments, not lawsuits.


Actually one new thing about the GDPR is that consumer rights organisations can sue companies/organisations to enforce privacy rights.

Max Schrems (who's case killed Safe Harbour) has set up an org None of your Business (https://noyb.eu) to do exactly this.


I lack legal experties, but I'd assume you will easily be able to sue any company and claim they infringe somehow on your rights as stated by this GDPR; maybe I'm wrong.

I attended a GCP event and I could practically see the hipsters pupils dilate/mouth foaming as they went in the hisper frenzy "this GDPR is a huuuge opportunity".


You can write a complaint to responsible institutions that then chose how to act (send a warning to violating company, issue them fine, start an investigation etc.) You cannot sue companies yourself, unless you can prove that (big) damage was done to you as a direct consequence of violation, then you can seek compensation via civil lawsuit.


Only without consent from the user. Previously it was an ethically grey area to be logging IP addresses anyway. If you are preventing malicious use, then that is allowed as long as you are not using that data outside of the bounds of the user's consent.

If, however, a company is storing IP addresses to identify users without their consent and are found to be specifically targeting them without their consent, then that is a misuse of data.

You are right that companies will be paying for this for a long time and it does take effort to comply, but if that's what it takes to protect user data, increase security across the board to prevent data breaches and kill off the players that never should be in the business to begin with then I'm all for it.


You appear to be suggesting that "intent" defines the shape of law here, but I really don't think that's the case.

By my reading, information becomes personal —and therefore subject to GDPR— when it can be used to identify people. If you've got login timestamps, IP addresses and user records, for legitimate reasons, any other logging that includes IPs is tainted because it takes anybody with that data two minutes to munge them together.

Intent, and actual business use-case play second fiddle to the worst-case, or "what could that data be used for?".


Worst case usage determines what information is subject to GDPR, but actual business use-case is what determines what data you are allowed to collect.

IP addresses are subject to GDPR, but that just means that you have to have either a legitimate business need for keeping them or to have the user's consent to keep them and you need to disclose to the user that you are keeping them and for how long.

You probably do have a legitimate need to keep IP address logs for some period of time to allow troubleshooting and possibly for a longer period of time to allow for fraud detection. As long as you are disclosing to the user that you are collecting that information and are abiding by the retention period that you are disclosing to users, then you will be allowed to collect logs of IP addresses.


Your intention and how you actually use the data are critical to an entity's compliance with the GDPR. If I am only using IP addresses for legitimate purposes of monitoring/protecting my network then that is very different to using IP addresses to assist in my tracking of users for advertising purposes for example.

The classification of data of personal data is likely beyond dispute but you are then under obligations on how you actually make use of that data.

Entities should have in place relevant protective measures to ensure that if you have only collected data for a limited purpose, it should not be used for purposes beyond that.


In my experience of having lived all my life in the EU and mostly in 3 countries of the union, all law enforcement here is about intent, unlike the US for instance (as far as I read online ofcourse, like the Nintendo copyright case linked here a week ago). Copyright, drugs, bankrupting your company etc, judges look at intent not literally what the law says. So this will not be different. Nothing will change if you are not trying to actually go against what the law intents to protect.


Mens rea (i.e. intent) is part of common law criminality (along with actus reus, which is the actual doing of something illegal). The United States, having its legal system derived from that of England’s (and thus being a common law legal system), absolutely requires intent when considering whether or not someone or some organization has committed a crime.

I’m not familiar with the referenced Nintendo case, but mens rea is usually only considered in criminal cases. Unless you’re prosecuting someone for illegally downloading copyrighted material or some such thing, intent wouldn’t be considered (it can increase liability in civil cases, though).


Now that you mention it; I do see it in crime shows. But the case I mentioned was about if a Nintendo modchip could be used for good or only for evil according to the EU while the US court just yelled copyright infringement and put some hacker in jail. Those are the cases we read about in the press over here and most people find it ridiculous over here to go to jail (aka ruin lives) over something as small as copyright infringement. Courts agree as they usually mostly slap on a fine based on the intent.


And it even gets more interesting: The question is not if you can identify a user by merging your different data sets. The question is if you can identify a user if you merge one of your data sets with any other data set, even if this set is currently not in your possession. (This can happen if the provider is able to mach IP addresses to personal information.)


Intent comes into play when you determine the appropriate processing basis; for eg preventing abuse, the basis isn't consent and therefore consent is not required. So GP is partially wrong. If your intent is to use the data for marketing purposes, then you are much more likely to require consent. See LI balancing tests.


>"Previously it was an ethically grey area to be logging IP addresses anyway."

wat.

Standard log formats capture IP, and have ~forever. Who claims this is an ethical quandary?


I won‘t address „ethical“, but collecting full IP addresses has been discussed as possibly illegal in Germany for years now.

And since the recent European court decision, I suppose it is settled: yes, illegal.


What's a 'not full' IP address?


Leaving the last three digits out of logs gives you all the information you might need without making it possible to tie it back to a specific user. Google Analytics actually has an option for that.


Holy shit what.

IP logging is not ethically ambiguous in any way. It's 100% okay. You chose to connect to that IP. If you don't want your IP logged don't send an IP packet to that address. It's very simple.

This is beyond ridiculous. The entitlement I see here is cancerous in the literal sense.


Holy shit can't you read up before complaining without knowing the details? There is the exception that you may use and store data that is necessary for providing the service. Thus, since ip is necessary for talking to a server, you don't need to explicitly ask for consent. However you MUST NOT do anything else with that IP, like logging it for longer than necessary or tracking users across sites (without consent).

Why do you need to log ip? To prevent abuse? That's ok. For how long? That's up do you to decide, but it must be motivated and documented.

What's so hard to understand? How is this not perfectly reasonable already? Why are you entitled to not respect other's personal data?


You knock on my door and I write down that you visited me.

Why is it somehow reasonable to compel me to forget that interaction existed?


Because 1) your analogy is off. People forget, a machine does not 2) GDPR is about privacy; tracking people's behaviour, linking things together without explicit consent is not allowed according to GDPR.


1. If I am writing it down, as my analogy suggests, it is not forgotten.

2. I understand what it's about.

If you want to make tracking people and linking things together illegal, great.

However, my argument in response to the OP intended to illustrate that recording information about someones actions, particularly when it's a party who is part of the interaction creating the recording, does not seem to have some preexisting moral expectation or attached to it.

Hence, to me at least, the GDPR's directives are not objectively reasonable or obvious in some way as suggested by the OP.

I also think forbidding certain uses of the data is more reasonable than to regulate its collection and storage. But yes, that's probably riskier and harder to enforce.


It's ok to take a picture of the street out of your front window

It's not ok to take a picture of everyone that walks in front of your house, timestamped and on top of that you search their picture on Facebook (supposing you could do that) and keep all that info forever


> It's not ok to take a picture of everyone that walks in front of your house, timestamped and on top of that you search their picture on Facebook (supposing you could do that) and keep all that info forever

Why not? It's certainly not obvious why this is the case.


I guess that's a fair question. Two reasons come to mind:

1. If the by-passers where to discover what you've done they might feel violated. This is why there are laws against stalking. Thus in this example it would be all about intent.

2. What if your database leaks? Have you considered that event, the probability of it happening, and the impact? How can you minimize the risk? Is it encrypted? How long do you need to store it for? Can it be anonymized? Do you even need to look up name? Is the potential privacy intrusion proportional to the purpose of collecting the data?

To be GDPR-compliant you must have answered all those questions and documented it.


The EU is very consumer protection focused, which is very different from the US. It's just a point of view I don't see how you can perceive it as entitlement. Not everyone knows how privacy works and how much data is available by not using a VPN.


Keeping a log of visitors to your own computing resources is an ethical grey area since when? IP Addresses themselves are most useful for deanonymization/violating privacy when shared across organizations, or converted to accounts from ISPs. Why does GDPR target the storage of them and not the sharing or conversion of this type of data?

One other interesting thing to keep in mind is that GDPR does not exempt public, government organisations. It will be interesting to see what happens with that, if anything.


Standard server logs with IP addresses must be disclosed in a privacy policy but you do not have to seek consent for them because you collect them as part of a business critical need to prevent fraud. See Recital 47, which includes the language: "The processing of personal data strictly necessary for the purposes of preventing fraud also constitutes a legitimate interest of the data controller concerned." https://www.privacy-regulation.eu/en/r47.htm


The user can request I delete all of the data related to them without “undue delay”. Are you ready to purge all references to certain IP addresses in your logs? Don’t forget backups.

GDPR blows up a lot of assumptions we make about writing software and managing servers.

https://www.privacy-regulation.eu/en/article-17-right-to-era...


Again, you do not have to if is business critical and used for fraud prevention. You must routinely delete logs before they get too old (60-90 days maybe), but you do not need to take special action beyond that. I’m not saying the GDPR isn’t troublesome, but having spent the better part of the last 6 months combing through the law and interpretations of it, I think the concern over IP addresses in log files that can be used for security and fraud-prevention is unfounded. Data portability requests are likely where it will get onerous and expensive. Even with those, there is flexibility build in to prevent users from repeatedly requesting their data at short intervals.


Other countries mandate that we keep logs for 7 years. This is unworkable.


I believe that you're okay in that case. Some countries in the EU require that you have financial records stored for five years, and they will always contain personal identifiable information. The GDPR states, if I recall correctly, that because some other law requires you to store the information for X number of years, the customer can't force you to delete it.

Similarly credit agencies aren't required to comply with deletion requests either. You can't simply GDPR your way out of a bad credit score.

But it's a total mess, when you read the GDPR it's clear that it's written by people with limited understanding of IT. Of cause it has to be extremely strict, otherwise you'll end up with a Cookie-law 2.0. The cookie law from the EU was read by the industry in a way that clearly wasn't intended. It made zero different to user tracking, we just got a bunch of pop-ups stating that the site uses Cookie. If you read that law as I believe it was intended, the idea would be that you could say yes to cookies or no. If you choose no, the site would disable the use of tracking cookies. But was to much work, so people just slapped a cookie pop-up on their sites.


The cookie law was never well thought out, that's why nobody read it in the way it was "intended" (what was the intent anyway). The distinction between a regular cookie and a tracking cookie doesn't exist except in the minds of the EU regulators, so no surprise that all they achieved with this was making the EU web experience horrible by default instead of opt-in horrible - browsers have let you request notification of cookies being set since forever, after all, and you can create extensions to notify you in whatever way you like.


Thanks for the clarification. This is definitely going to make lawyers rich and make it much harder to startup. The legal cost overhead benefits the establishment at the cost of startups and SMEs


Data portability is no more work than a SAR. Nothing says you have to give them the data in a convenient format; you can perfectly well hand a pg or mysql dump to them that is nothing more than the unformatted output of your SAR process and call it a day. In particular, it doesn't have to be convenient at all to do data interchange; it just has to be "structured, commonly-used and machine-readable format."


You'd also have to argue why you need to store the IP and not just a hash of the IP if it's just for fraud prevention.


If you're using an IPv4 address as a seed then that's pretty useless due to the small address space.


I wouldn’t worry about individuals requesting access or deletion of the Apache logs concerning a certain IP as I see no possible solution for the “reasonable measure to verify the identity of a data subject” against an IPv4 IP and, as per the GDPR, providing data to the wrong person could “affect the rights and freedoms of others” in which case you shouldn’t provide the data.


Look at the paragraph you're quoting - you have to comply with the request if "one of the following applies", which means:

1) if you have a legitimate reason that allows you to process the data without consent (which would be the expected scenario; if not, then any sane organization would likely just choose to don't have that data in their logs at all), then none of the following applies, and you can refuse the request;

2) if you had a legitimate reason but "the personal data are no longer necessary", then you must comply.... but that's just duplication, you should not have had that data anymore since if you're compliant, you should have cleared the data out already. E.g. if you believe that you need (and are allowed) to store data for 6 months for purpose X; then you'd ignore the request for 5 month old data as you need it, and ignore the request for 7 month old data as you already purged it as a routine operation.

3) if you didn't have a legitimate reason and actually needed consent, then you follow the same process as you do for scrubbing references to all IP addresses which didn't give you consent. If you're compliant with the other requirements (which is tricky in this case), then the deletion request doesn't add anything meaningfully different.


The user can request whatever. If you are processing the data via consent, you have to obey. If you processing the data under a different basis, then you may well not; you have to figure out. I hope you like paying lawyers!


"act as a continuous drag on developing new technologies"

I don't see this as a bad thing. For far too long, we've not cared at all about user data and privacy.


Yes, this shifts things to privacy first, instead of second/never.


Of all the wonderful things that we're capable of as technologists, I think we can figure out a way to strip raw-IP addresses from log-files once we don't need them any more.

I'll need to figure out to handle this on the data I'm responsible for at the moment. It's boring and it doesn't help the product, but it's not supposed to. In idlewords' terms, I feel like I'm finally purging toxic waste: http://idlewords.com/talks/haunted_by_data.htm


It's left ambiguous, but it's likely that any aggregate computed from personal data may also be considered personal data (i.e. how many unique IPs you've seen).


If you are looking to derive aggregated insights from data then you need to be clear on your anonymisation processes and understand whether or not you any derived dataset is capable of identifying individuals whether in isolation or through reasonable means. To me if you are taking a tally of the volume of unique IPs alone that would never be sufficient to identify a person but maybe I don't have the full context?


What you're saying makes sense. Any data derived from PII should considered as PII itself if it can be used to identify users, and even if it cannot be used for that, it needs to be cleared frequently enough such that you don't end up with data derived from information for which you received a deletion request, for instance.

In practice, you can achieve this by simply refreshing your derived data frequently (ever ~30-60 days), and for aggregated data k-anonymity is a good way to enforce this privacy constraint.

https://en.wikipedia.org/wiki/K-anonymity


You need the IP records for jurisdictions that require long term retention for law enforcement requests including copyright infringement.

So you must delete them and also keep them.


Do you know which jurisdictions and laws that includes?

This sounds like the Investigatory Powers Act in the UK, though I haven't heard of similar laws in other liberal democracies.



>A reasonable reading of GDPR makes standard web server logs (which contain IP addresses) a punishable offense...

You need retention policies and if you use the web logs for (let's say) detection malicious behavior or troubleshooting, you are in the clear.


Also, you can keep just a hash(seed + IP address) - enough to uniquely identify user session (so you can debug possible problems) but not enough to pinpoint a specific user.

Of course in reality nothing is that simple, but it can be done, and it can be done automatically. I am sure there will be GDPR nginx plugins/configs available soon.


Unless you use IPv6 hashing IPv4 address space is way, way too narrow. Hash+seed is trivial to have the original IP recovered So whoever advises that got no idea how hashing (and collision of the latter) works.

(Brute force of few billion hashes in the days of crypto currencies is a walk in the park)


Welcome to every other industry, where "breaking things" and doing whatever you want with reckless abandon isn't considered acceptable behavior.

It's not like you couldn't say the same thing x1000 with respect to finance laws.


> A reasonable reading of GDPR makes standard web server logs (which contain IP addresses) a punishable offense, even if you don’t have a nexus in Europe.

Can you expand on that?


IP addresses are deemed personally identifiable information. All web servers log these by default - before asking users for permission to do so - and are therefore, bafflingly, about to become illegal.


How does this work out for Git repos and other things with encryption backed histories? If I run a software project and a developer wants an identifying section of a repo back-edited, do I have to edit and rebase the whole repo, and what does this do to the trust in a project that is based on a verifiable history?

Also, I can't help but notice that currently there is a hell of a lot of money being bet on immutable public ledgers.


> Also, I can't help but notice that currently there is a hell of a lot of money being bet on immutable public ledgers.

I've been pondering the same thing. You have to be extremely careful about building a new product on blockchain technology, because, depending on what you're building, you may be required to delete stuff from it in the future.


Why are you accepting PII into your software projects' source repository in the first place?


Source repositories in many (most?) companies include the full names of the employees who authored every particular commit. This is PII. GDPR refers to all personal information you're handling, not excluding information of your employees.


The simple logical answer to that is that it is clearly impossible to blacklist.

The more specific answer is:

git config --global user.name "Your Name Comes Here"

git config --global user.email you@yourdomain.example.com

Also, looking up, you can undo a rebase with reflog, so even editing commits with an interactive rebase may not be enough to purge a git repo of identifiable information that people have entered.


I presume email addresses are PII?


Yes they are.


How it works out? Badly.

But you generally cannot build a system that intentionally does not have a certain capability and then successfully claim that laws don‘t apply to you, because your beautiful system does not accomodate them.


If this is an opern source repo on GitHub/GitLab, I think you could argue that the developer "made the data public" in giving it to you in the first place. That's an exception to the requirement to delete data. The same goes for public ledgers.

The tricky situation is when someone puts personal data not about themselves, but about a third party into a public ledger...


Is 'making data public' a plausibly reasonable defense tho? It's sad that that's not obvious.


I've been reading through the text of the act, and while there is an exception allowing you to process data that has been made explicitly public by the person it relates to without asking them for permission, it seems to indicate that you still have to give them the ability to edit it later.


I don't think it is quite that bad. It is certainly less than going full ISO 27001, and less than a major breach.


> act as a continuous drag on developing new technologies

Or foster new technologies around privacy and user management.


You need to crawl through all your webserver logs (the zipped ones as well) and remove entries by IP.

I seriously don't get what's the huge deal about this. Of course it sucks but it's not THAT hard to implement.


No you don't.

AS per the GDPR I see no possible solution for the “reasonable measure to verify the identity of a data subject” against an IPv4 IP and thus to reliably act on IPv4 related data subject access/deletion requests.

Also per the GDPR, providing data to the wrong person could “affect the rights and freedoms of others” in which case you shouldn’t provide the data.


> a continuous drag on developing new technologies

Any and all of them? Because of anonymized IP addresses in server logs? I wouldn't even buy that when it comes to the web, but certainly not to mention talking about computers and software in general, or even "tech in general", whatever that would be.


What's an anonymized IP address? As others have pointed out, there is a sufficiently small number of IP (v4) addresses that any hash function output of them can be easily brute-forced nowadays. So the only 'anonymous' IP address is the one you never collect in the first place.


Dave Cutler was a lead for VMS, then left DEC for Microsoft and led the development of Windows NT.


I've never heard it stated quite so plainly, but that makes a lot of sense. I wonder if the billion dollars in funding demands more.


This seems to me to be the problem. They took too much money to monetize in this way. A few bucks a month from a few million users, plus whatever they can make by occasionally inserting ads into the stream (which can't be that much considering how many people would never even see an ad because their streams are too noisy) just doesn't seem like enough to justify all the VC money...


It's improbable they would have the amount of users they have now if users had to pay a subscription. Do you think Facebook would?


Agreed, that's why I said "a few bucks a month from a few million users". I can't imagine that more than a couple million of their hundreds of millions of users would pay to remove the ads from their feeds.


Sorry, I mis-read that.

Perhaps they can charge for certain items that can be considered premium, such as verified accounts. I still doubt there'll be a couple million of users willing to pay though.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: