Reminder Linux is too big (MLoCs, and megabytes of object code) for it to be anywhere near bug-free.
As Linux runs with full privileges, this is a major architectural issue.
We should be looking at deploying well-architected systems based on microkernels (such as seL4). Some examples of such systems include Fuchsia, Genode and Managarm.
Yet these escalation CVEs are getting rarer and rarer, while production systems are generally hardened with SELinux and AppArmor out of the box, making practically exploiting them much harder.
The gist, is it requires a local user to be able to run processes, and not to be bound by SELinux or AppArmor to succeed.
It should be patched, but it's not Heartbleed.
P.S.: I didn't want to open the box of performance issues of microkernels, yet somebody did, and yes, performance matters.
I don't see how SELinux or AppArmor would help here. From the details provided, it looks like all that is needed is mmap. At most it "requires minimal capabilities to trigger" based on the article.
Granted, it is possible to deny mmap permissions, but I don’t think most policy writers would consider mmap a particular dangerous permission to grant.
I just checked my CentOS 8 system running stock policy, and every process type has permission to map every file type. Granted, this is gated behind a boolean, so an administrator could disable it if they wanted to, but the default is true.
Not that it is relevant to this issue where it doesn't matter what you mmap; but since I know someone will ask, having mmap permission does not bypass the check on, for example, read permission.
> while production systems are generally hardened with SELinux and AppArmor out of the box, making practically exploiting them much harder.
Is it really so? I get an impression that SELinux is a so complicated that nobody wants to bother with setting it up and everyone disables it as soon as possible.
RHEL systems run SELinux in enforcing mode by default. Most applications run in a domain called "unconfined_t", which is about as secure as it sounds. [0]
Ubuntu runs AppArmor in its version of enforcing mode. I've only ever run into issues with it once though, so I presume it is a similarly lax policy.
[0] The policy is called "targeted", to reflect the fact that you are supposed to write targeted policy modules for specific applications you are concerned about.
Debian’s AppArmor policy is pretty strict, yet well refined. As long as you use standard folders and paths, it’s pretty invisible.
Last time while I was migrating a DNS server, our zone folders were not in a standard path, and Bind was unable to find them. 10 minutes later I remembered about AppArmor, and moved the files.
The only problematic hardening was Thunderbird, which broke attachments. They disabled the Thunderbird’s profile after failing to fix it.
Currently I’m on mobile, so I can’t give you a list of active profiles, yet the list is not short and the policy is not lax.
I worked with both of them back in the day and developed profiles for them.
Update: Stats are as follows:
78 packages ship AppArmor profiles in Debian 12. Notable are:
- mariadb-server
- redshift (f.lux alternative)
- bind
- ioquake3 (yes, the game)
- squid
- tcpdump
among others.
Also there are "apparmor-profiles" and "apparmor-profiles-extra" packages which provides profiles for vsftpd, dovecot, dhcpd amongst others (141 in profiles and 10 in extras).
So, AppArmor ecosystem is neither lax, nor unmaintained.
We don't disable SELinux unless the software explicitly requires it. Also, all of the machines running without SELinux are not accessible from outside.
SELinux is complicated, but it's not impossible to work with. Configuring new rules is another story, though.
However, there is a large scale exception to this: Android. Android uses SELinux extensively, and it is also used as part of further sandboxes in Chrome and Android System Intelligence.
SELinux is enabled by default on many production-grade distros, most notable Fedora. It's the distro maintainer's duty to ship policies. Yes, you need to write policy files for bespoke software, but SELinux has gotten better at helping to generate these.
Honestly I can't remember the last time I had an issue with SELinux on any of my desktop systems. I've had to work past some issues with systems I manage for work, but those weren't too bad to deal with.
I've been running Fedora for about 2 years, and have never been tempted yet to turn off selinux. (And I just used the 'sestatus' command to verify that it is enabled. "Current mode: enforcing", it elaborates.)
It's worse than Heartbleed for anyone doing shared-kernel multitenant (running Docker containers from mutually untrusting sources). There are a bunch of those!
Does browsing the web with Javascript turned on on the same kernel where I ssh from (and consequently keep my ssh credentials) count as multitenanted here?
> P.S.: I didn't want to open the box of performance issues of microkernels
Why are there performance issues? Are there any reasons for them except for context switch costs (which are only due to legacy CPU architectures that were optimized for outdated insecure OSes)? Is microkernel architecture somehow incompatible with good performance?
Microkernel architecture pretty much requires that processes communicate via explicitly defined interfaces, usually message passing. And it becomes much more difficult for subsystems to coordinate low-level activities, which is crucial for performance optimizations. And yes, context switches matter a lot too.
1. A sel4 context switch does significantly less than a Linux ditto. It saves fewer hardware registers etc
2. You will need many more context switches to achieve the same thing in a microkernel due to its distributed architecture.
3. A lot of "complexity" in the kernel is simply performance optimization that does not exist in sel4. Back in the days Minix and Hurd promised to achieve optimization by replacing generic servers with custom ones but I haven't seen this actually working in 20 years
> Reminder Linux is too big (MLoCs, and megabytes of object code) for it to be anywhere near bug-free.
I don't necessarily disagree, but I also think that almost every piece of software in common use is probably too big to actually be bug free. The bar is far lower than millions of lines of code for "to big to not have bugs"; I'd expect that line to be crossed in the tens of thousands, if not lower.
Most of those millions of lines of code are drivers which are largely irrelevant factor here. The actual core Linux is, while still large, significantly smaller and tighter codebase
Native POSIX is incompatible with good APIs. Just look at the mess you need to go through to ensure data is actually written to disk.
More importantly, what is the point of a new OS if you support just POSIX? The architecture choices involved will push the OS to be a worse Linux/BSD/Mac, and whatever new thing you add will eventually be ported to Linux. Nowadays, a new OS can only be technically justified by trying to be better than POSIX.
It is true that POSIX emulation might be useful, Microsoft showed one way to do so. A new OS could use its equivalent of WSL2/Fuschia's starnix/wine. But I don't think there are any circumstance when it is a good idea for a new kernel to natively support POSIX.
posix isn't a kernel API. It is a runtime that can largely be implemented in userspace. There is no need for any kernel to do anything specific to support posix other than for possible performance optimizations. Also posix isn't even the API people target anymore - the Linux uapi is. This is why wsl and starnix target that instead. There are security issues that come along with such APIs, but you can sandbox things in such a way that it need not be as problematic.
There is no getting away from the need for backwards compatibility with existing applications. Applications won't start supporting your platform natively until it matters and getting big enough to matter requires lots of software working on your platform.
However, the userspace assumptions leak. e.g. POSIX's file API don't sit quite well with Windows, and Microsoft had to eventually abandon WSL1.
My point was that we really don't need yet another Unix clone. If we're going for a new OS, we should fix the core APIs as well. Emulation ain't bad per se, but it better be done 'outside' without compromising the original design.
The problem is that sel4 isn't turnkey enough for the average product maker to use. It's good for larger firms who can invest in building out their own platform on top which they can then use in products. There doesn't seem to be anyone building a public platform on top of sel4 which can be then used by product makers. This is similar to building a product with a custom Linux distro vs an existing one, albeit a custom Linux distro is still far easier to do because of the great ecosystem around it. I'm not an advocate for using Linux, but unless someone invests in the missing pieces, sel4 usage will never become widespread.
> unless someone invests in the missing pieces, sel4 usage will never become widespread.
There's multiple efforts, such as Makatea[0], Genode[1] and sel4cp[2].
The seL4 foundation is finally well-established and funded thanks to increasing commercial interest[3]. There's been a lot of activity lately[4] and I'm very optimistic for it.
Not every device using Linux is a server. Many devices can deal with hypothetical peak throughput losses for the sake of improved security. Power and latency are probably going the key metrics that need to not be regressed on, but even then, some devices probably still don't care as much about those.
As Linux runs with full privileges, this is a major architectural issue.
We should be looking at deploying well-architected systems based on microkernels (such as seL4). Some examples of such systems include Fuchsia, Genode and Managarm.