Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
StackRot (CVE-2023-3269): Linux kernel privilege escalation vulnerability (github.com/lrh2000)
73 points by simonpure on July 6, 2023 | hide | past | favorite | 64 comments


Reminder Linux is too big (MLoCs, and megabytes of object code) for it to be anywhere near bug-free.

As Linux runs with full privileges, this is a major architectural issue.

We should be looking at deploying well-architected systems based on microkernels (such as seL4). Some examples of such systems include Fuchsia, Genode and Managarm.


Yet these escalation CVEs are getting rarer and rarer, while production systems are generally hardened with SELinux and AppArmor out of the box, making practically exploiting them much harder.

The gist, is it requires a local user to be able to run processes, and not to be bound by SELinux or AppArmor to succeed.

It should be patched, but it's not Heartbleed.

P.S.: I didn't want to open the box of performance issues of microkernels, yet somebody did, and yes, performance matters.


I don't see how SELinux or AppArmor would help here. From the details provided, it looks like all that is needed is mmap. At most it "requires minimal capabilities to trigger" based on the article.

Granted, it is possible to deny mmap permissions, but I don’t think most policy writers would consider mmap a particular dangerous permission to grant.

I just checked my CentOS 8 system running stock policy, and every process type has permission to map every file type. Granted, this is gated behind a boolean, so an administrator could disable it if they wanted to, but the default is true.

Not that it is relevant to this issue where it doesn't matter what you mmap; but since I know someone will ask, having mmap permission does not bypass the check on, for example, read permission.


you "disable" mmap and you will break:

  - dynamic loading
  - malloc
  - any just in time compilers
it is a core feature of a modern operating system


> while production systems are generally hardened with SELinux and AppArmor out of the box, making practically exploiting them much harder.

Is it really so? I get an impression that SELinux is a so complicated that nobody wants to bother with setting it up and everyone disables it as soon as possible.


RHEL systems run SELinux in enforcing mode by default. Most applications run in a domain called "unconfined_t", which is about as secure as it sounds. [0]

Ubuntu runs AppArmor in its version of enforcing mode. I've only ever run into issues with it once though, so I presume it is a similarly lax policy.

[0] The policy is called "targeted", to reflect the fact that you are supposed to write targeted policy modules for specific applications you are concerned about.


Debian’s AppArmor policy is pretty strict, yet well refined. As long as you use standard folders and paths, it’s pretty invisible.

Last time while I was migrating a DNS server, our zone folders were not in a standard path, and Bind was unable to find them. 10 minutes later I remembered about AppArmor, and moved the files.

The only problematic hardening was Thunderbird, which broke attachments. They disabled the Thunderbird’s profile after failing to fix it.

Currently I’m on mobile, so I can’t give you a list of active profiles, yet the list is not short and the policy is not lax.

I worked with both of them back in the day and developed profiles for them.

Update: Stats are as follows:

78 packages ship AppArmor profiles in Debian 12. Notable are:

    - mariadb-server
    - redshift (f.lux alternative)
    - bind
    - ioquake3 (yes, the game)
    - squid
    - tcpdump
    among others.
Also there are "apparmor-profiles" and "apparmor-profiles-extra" packages which provides profiles for vsftpd, dovecot, dhcpd amongst others (141 in profiles and 10 in extras).

So, AppArmor ecosystem is neither lax, nor unmaintained.


We don't disable SELinux unless the software explicitly requires it. Also, all of the machines running without SELinux are not accessible from outside.

SELinux is complicated, but it's not impossible to work with. Configuring new rules is another story, though.


You are correct about SELinux in general terms.

However, there is a large scale exception to this: Android. Android uses SELinux extensively, and it is also used as part of further sandboxes in Chrome and Android System Intelligence.


SELinux is enabled by default on many production-grade distros, most notable Fedora. It's the distro maintainer's duty to ship policies. Yes, you need to write policy files for bespoke software, but SELinux has gotten better at helping to generate these.


Honestly I can't remember the last time I had an issue with SELinux on any of my desktop systems. I've had to work past some issues with systems I manage for work, but those weren't too bad to deal with.


I've been running Fedora for about 2 years, and have never been tempted yet to turn off selinux. (And I just used the 'sestatus' command to verify that it is enabled. "Current mode: enforcing", it elaborates.)


It's worse than Heartbleed for anyone doing shared-kernel multitenant (running Docker containers from mutually untrusting sources). There are a bunch of those!


If you're running multi-tenant Docker, you're doing something wrong to begin with. There are better solutions like Apptainer [0].

[0]: https://apptainer.org/


I don't know what this is, but it appears to be shared-kernel.


Apptainer works the same way as Docker. It just makes using it for distributing applications more convenient.


Does browsing the web with Javascript turned on on the same kernel where I ssh from (and consequently keep my ssh credentials) count as multitenanted here?


> P.S.: I didn't want to open the box of performance issues of microkernels

Why are there performance issues? Are there any reasons for them except for context switch costs (which are only due to legacy CPU architectures that were optimized for outdated insecure OSes)? Is microkernel architecture somehow incompatible with good performance?


Microkernel architecture pretty much requires that processes communicate via explicitly defined interfaces, usually message passing. And it becomes much more difficult for subsystems to coordinate low-level activities, which is crucial for performance optimizations. And yes, context switches matter a lot too.


Lots of myths around microkernel "slowness"[0].

>And yes, context switches matter a lot too.

seL4 is significantly (100x-1000x) faster than Linux at context switching[1]. At the same time, it does not require that many more context switches.

0. https://archive.is/30hhi

1. https://sel4.systems/About/Performance/


This is a very unfair comparison

1. A sel4 context switch does significantly less than a Linux ditto. It saves fewer hardware registers etc

2. You will need many more context switches to achieve the same thing in a microkernel due to its distributed architecture.

3. A lot of "complexity" in the kernel is simply performance optimization that does not exist in sel4. Back in the days Minix and Hurd promised to achieve optimization by replacing generic servers with custom ones but I haven't seen this actually working in 20 years


> Reminder Linux is too big (MLoCs, and megabytes of object code) for it to be anywhere near bug-free.

I don't necessarily disagree, but I also think that almost every piece of software in common use is probably too big to actually be bug free. The bar is far lower than millions of lines of code for "to big to not have bugs"; I'd expect that line to be crossed in the tens of thousands, if not lower.


Most of those millions of lines of code are drivers which are largely irrelevant factor here. The actual core Linux is, while still large, significantly smaller and tighter codebase


While I agree in principle, I have tried a couple times(admittedly not very hard) to learn how to use seL4 and it seems unready for prime time.

I don't think POSIX is the end-all-be-all, but it is what I and many others are familiar with, and seL4 has practically nothing in common with it.

The design is just too foreign to get going quickly, and the opportunity cost of learning their way to do things is too high in almost all cases.

One day I hope to have the time to dedicate to learning it properly!


Native POSIX is incompatible with good APIs. Just look at the mess you need to go through to ensure data is actually written to disk.

More importantly, what is the point of a new OS if you support just POSIX? The architecture choices involved will push the OS to be a worse Linux/BSD/Mac, and whatever new thing you add will eventually be ported to Linux. Nowadays, a new OS can only be technically justified by trying to be better than POSIX.

It is true that POSIX emulation might be useful, Microsoft showed one way to do so. A new OS could use its equivalent of WSL2/Fuschia's starnix/wine. But I don't think there are any circumstance when it is a good idea for a new kernel to natively support POSIX.


posix isn't a kernel API. It is a runtime that can largely be implemented in userspace. There is no need for any kernel to do anything specific to support posix other than for possible performance optimizations. Also posix isn't even the API people target anymore - the Linux uapi is. This is why wsl and starnix target that instead. There are security issues that come along with such APIs, but you can sandbox things in such a way that it need not be as problematic.

There is no getting away from the need for backwards compatibility with existing applications. Applications won't start supporting your platform natively until it matters and getting big enough to matter requires lots of software working on your platform.


However, the userspace assumptions leak. e.g. POSIX's file API don't sit quite well with Windows, and Microsoft had to eventually abandon WSL1.

My point was that we really don't need yet another Unix clone. If we're going for a new OS, we should fix the core APIs as well. Emulation ain't bad per se, but it better be done 'outside' without compromising the original design.


POSIX also hardly matters on Windows, mobile OSes, legacy mainframes/micros or game consoles.

What is needed is to have a product that makes the OS relevant enough for devs to consider targeting the platform.


The problem is that sel4 isn't turnkey enough for the average product maker to use. It's good for larger firms who can invest in building out their own platform on top which they can then use in products. There doesn't seem to be anyone building a public platform on top of sel4 which can be then used by product makers. This is similar to building a product with a custom Linux distro vs an existing one, albeit a custom Linux distro is still far easier to do because of the great ecosystem around it. I'm not an advocate for using Linux, but unless someone invests in the missing pieces, sel4 usage will never become widespread.


> unless someone invests in the missing pieces, sel4 usage will never become widespread.

There's multiple efforts, such as Makatea[0], Genode[1] and sel4cp[2].

The seL4 foundation is finally well-established and funded thanks to increasing commercial interest[3]. There's been a lot of activity lately[4] and I'm very optimistic for it.

0. https://trustworthy.systems/projects/TS/makatea

1. https://www.genode.org/

2. https://trustworthy.systems/projects/TS/sel4cp/

3. https://sel4.systems/Foundation/Membership/

4. https://trustworthy.systems/news/


Yes, but during that debate ages ago Linus won on points and as a result we now have an OS focused on throughput rather than on security.


Not every device using Linux is a server. Many devices can deal with hypothetical peak throughput losses for the sake of improved security. Power and latency are probably going the key metrics that need to not be regressed on, but even then, some devices probably still don't care as much about those.


Linus was never correct about microkernels[0], and the world is worse for it.

0. https://www.cosy.sbg.ac.at/~clausen/PVSE2006/linus-rebuttal....


And the winner is...

"A flaw was found in the handling of stack expansion in the Linux kernel 6.1 through 6.4, aka "Stack Rot". The maple tree, responsible for managing virtual memory areas, can undergo node replacement without properly acquiring the MM write lock, leading to use-after-free issues. An unprivileged local user could use this flaw to compromise the kernel and escalate their privileges."


> An unprivileged local user...

Lots of hoops to jump through for a well-configured & internet connected box.

IOW, not a grave concern at first blush, yet warrants a 24H fix.


> Lots of hoops to jump through for a well-configured & internet connected box.

This is a pretty typical assumption for most local privilege escalation bugs.


As a system administrator, I think I can make some assumptions and observations about the systems I manage.


Those are some pretty famous last words, speaking as someone with over 35 years of hardware and software experience.


Assuming that local user access is simply impossible is a problem. But assuming that local user access means you're already fucked can be pretty damn honest. And sysadmins are the people who will know this the best.


Yes. If somebody can login to a machine which they shouldn’t, local exploits are least of our concerns, because this means we have bigger failures in many layers up to that point.


>But assuming that local user access means you're already fucked can be pretty damn honest.

Compartmentation through VMs or containers (docker et al) tends to be assumed sufficient.

The world relies on this to work yet Linux, unlike seL4[0], cannot guarantee separation.

0. https://sel4.systems/About/


When you're running malicious code on your server, it's way, way different. And in fact, supply chain attacks are a great example of this.

I think seL4 is interesting, but nobody actually runs production servers on this research project.


>on this research project.

seL4 is much more than a research project today, with a well-established seL4 foundation that has a lot of high-profile commercial members[0].

It is already used in the most critical applications that require levels of assurance only seL4 can provide.

0. https://sel4.systems/Foundation/Membership/


> It is already used in the most critical applications that require levels of assurance only seL4 can provide.

This sounds to me like embedded controĺ systems (airplanes, life support, etc.), not production servers for everyday internet services.

seL4 isn't exactly an everyday pull from a security toolkit though? It's like an entire paradigm. You build a business around the fact that you use seL4, you don't pull it off the shelf for an internet server.

In order to make a pie from scratch you must first create the universe. That's seL4.


>In order to make a pie from scratch you must first create the universe. That's seL4.

While somewhat true still, much progress has been made and the situation isn't the same as five years ago.

Furthermore, there's multiple ongoing efforts to bring seL4 closer to turn-key, such as Makatea[0], Genode[1] and sel4cp[2].

0. https://trustworthy.systems/projects/TS/makatea

1. https://www.genode.org/

2. https://trustworthy.systems/projects/TS/sel4cp/


I wasn't aware of these projects, thanks


Yes you're right. This doesn't mean I ignore vulnerabilities and play PacMan. Instead, I harden my systems first, observe and patch my systems second, and play PacMan then.

Jokes aside, we never take vulnerabilities lightly. However, we stick to our best practices and yet stay diligent.


FYI, there are already follow-up patches in 6.1.38, 6.3.12 and 6.4.2. This is often the case when bugfixes happen in a rush and with limited testing. So if you haven't started integrating, just make the jump to the latest version.


Limited? There are no tests anywhere near this code.

To reiterate: the Linux memory management subsystem contains a core data structure implemented in C without any tests.


What kernel versions are affected by this, difficult to parse from the page on a skim.


First line of paragraph 2: Linux kernel 6.1 through 6.4


Also at the end, the fixed versions:

> These patches were subsequently backported to stable kernels (6.1.37, 6.3.11, and 6.4.1), effectively resolving the "Stack Rot" bug on July 1st.


It's not stated in the article whether this was because the vulnerability only affects 6.1 and up or because Linux only supports 6.1 and up.


Seems pretty clear, though the info is haphazardly scattered throughout the article.

> The StackRot vulnerability has been present in the Linux kernel since version 6.1 when the VMA tree structure was changed from red-black trees to maple trees.


Cheers! I was looking for a table or something in bold


Why ask? There has never been a release of Linux without a known local privilege escalation bug. If you take the union of all known exploits, they cover the entire history of Linux. Knowing the interval of this flaw is pointless.


Because I'm curious as to when the affected code was introduced and if it affected older kernels that I run. Pure curiosity, mainly, as I know chances are it will have been patched. I still have PTSD from the Spectre kernel fun, so cut me some slack ;)


Luckily this is JUST a local exploit but yet again shows that C should just die as a language.


What about the commit message for the fix [0] makes you think this code, if it were written in say... rust, wouldn't have just been put in an unsafe block to do the "opportunistic" e.g. fast and dirty thing, anyways?

When your development practices normalize cheating here and there for performance wins where it matters, any language capable of writing kernels isn't going to stop you.

[0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...


The wording `unsafe` makes it very obvious that it is supposed to be a method of last resort. In C, it is very easy to accidentally drift into undefined behavior, and reviewers have to be aware of all the potential issues, which can occur everywhere. Rust's `unsafe` blocks at least make it obvious where exactly dangerous things are going on so refers can focus their attention there. And the Rust compiler still performs many checks inside `unsafe` blocks, most importantly borrow checking.


> Luckily this is JUST a local exploit

Its very rare for only one exploit to be used when a system is compromise. Multiple exploits will be used, one for entry, one for escalation of privileges, etc.


Rust hardly deals with memory race conditions better than C, and managed languages are not fit for kernel. This isn't the fault of C at all. These kinds of issues are unavoidable part of kernel programming in multiprocessor settings and not something you can solve with a different programming language.

This comment just can't be more wrong.


Rust deals with race conditions 100 times better than C. Have you tried it? Just because it can't guarantee that they never happen doesn't mean it isn't any better.

You can't solve this with a different programming language but you can definitely make bugs much less likely.


> yet again shows that C should just die as a language

It's a logic error. Nothing to do with the language.

"now that the vma layout uses the maple tree code, we *really* don't just change vm_start and vm_end any more, and the locking really is broken" - https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: