IMO OOM killing should be reserved for single processes misbehaving. When a lot of different applications just use a decent amount of memory and exhaust the system RAM swapping to disk is the appropriate thing to do.
> To be honest I don't know why it's such an issue on Linux. Mac and Windows don't have this issue at all. Windows presumably because it doesn't over-commit memory
To be fair, my Windows system grinds to a halt (not really, but it becomes very noticably less responsive in basically anything) when JetBrains is installing an update (mind you I only have SSDs with all JetBrains stuff being on an NVMe). I don't know what JetBrains is doing, but it consistently makes itself noticable when it is updating.
I have had this happen in the past (not very often though), and another saving grave of Windows is you can press ctrl-alt-del, which somehow seems to pause the rest of the system activity, and then see a process list and choose which one to kill.
Linux doesn't have anything like that. KDE seems to have a somewhat functional Ctrl-alt-del menu - I have been able to access it when the rest of the shell gets screwed up (not due to OOM). But inexplicably the only options it has are Sleep, Restart, Shutdown or Log out!! Where is the "emergency shell", or "process manager" or even "run a program"? Ridiculous.
I think Linux GUIs often have this weird fetish with designing as if nothing will ever go wrong, which is clearly not how the real world works. Especially on Linux. I've genuinely heard people claim that most Linux users will never need to use a terminal for example.
> Systemd for some reason seems to uniquely be the epicenter of giant facepalm bugs like LEAVING THE SYSTEM FIRMWARE VULNERABLE TO AN RM -RF COMMAND
I am very sorry to inform you but efivarfs is something coming from the Linux kernel. Being able to rm -rf it is squarely something that is entirely on the kernel implementation, WHICH THE AUTHOR OF EFIVARFS EVEN ADMITS[0]
Would have made more sense to say Ctrl+Shift+Esc since that just directly brings up the task manager. All in all I would say it is a slightly weird title, but I assume enough people get what they want to say with it.
> It's mainly re-hashed. I think I've seen the same talk twice before? At least once.
He held a similar talk at All Systems Go I think (I missed the talk here at FOSDEM).
> It's a very "I've made a cool thing. This is what I think is cool about it" type of talk.
Varlink isn't something he just made up, he mearly "adopted it" (started making use of it). It existed before, but I don't know anything that really made use of it before.
The official-looking website at https://varlink.org doesn't give any information about who the authors are, as far as I can tell, but the screenshots show the username "kay". There's a git repo for libvarlink [1] where the first commits (from 2017) are by Kay Sievers, who is one of the systemd developers.
An announcement post [2] from later in 2017, by Harald Hoyer, says that the varlink protocol was created by Kay Sievers and Lars Karlitski in "our team", presumably referring to the systemd team.
So the systemd developers "adopted" their own thing from themselves?
While I guess you aren't wrong, I also wouldn't say you are entirely correct that Kay is a systemd developer. He use to work on udev, but hasn't been active in any meaningful way on it for 2 years before varlinks release[1]. For what it was made I can't really say, but Lennart hadn't start integrating Varlink until a while after its release (I think I remember it being like 2021 or so when he started making use of it, after another check it seems the start of varlink stuff in systemd was 2019[2]).
Kay Sievers' Wikipedia page cites a blog post by Lennart Poettering [1] which says that systemd was designed in "close cooperation" with Kay Sievers and that Harald Hoyer was also involved, so it seems pretty clear that he's on the team that develops systemd, the team that Harald Hoyer referred to as "our team". All three of them gave a talk [2] together in 2013 about what they were developing.
If Lennart Poettering "adopted" varlink, he seems to have done so from members of his own team ("our team") who created varlink and who are also fellow co-creators of systemd.
AFAIU (I haven't looked much into it) shim basically exists so that MS signs the shim once (or only a few times when updated), which has the distro public key embedded, which does further verification of the chain (bootloader/kernel) which gets updated more frequently.
That's basically my understanding too. But since you can still boot any shim-supported distro, Secure Boot + shim practically gains you nothing. An adversary can simply boot their own own copy of shim with whatever OS they like.
I don't know all the ins and outs, but because of the Machine Owner Key (MOK) mechanism in shim, it should be possible to boot arbitrary OSes without MS signing anything.
Your step of removing the MS keys works of course :) Although I've heard that can be risky on various systems that need to load MS-signed EEPROMS. Also I think that firmware updates can be problematic?
> Although I've heard that can be risky on various systems that need to load MS-signed EEPROMS
Yea, I bricked a Gigabyte board and still haven't been able to fix it. I just replaced it with an Asrock board and that has settings for what to do with option-rom when secureboot is enabled (always execute, always deny, allow execute, defer execute, deny execute and query user) and I have no clue what half of them specifically do (like, does "allow execute" only execute if a matching key exists and doesn't execute if it doesn't? and what is the difference between "always deny" and "deny execute"? and defer to when??). But I just set it to always execute and my problem is solved.
This reminds me of when I enrolled only my own keys into a gigabyte AB350 and I just soft-bricked it because presumably some opt-rom required MS keys.
I exchanged it for an Asrock board and there I can enable secure boot without MS keys and still have it boot cuz they actually let you choose what level of signing the opt-rom needs when you enable secure boot.
What I want to say with this is that it requires the company to actually care to provide a good experience.
How do I have nsresourced work in a regular systemd service or quadlet so that I can have an ephemeral user run a container? I am trying to find information and just seeing it as part of nsspawn, that seems to require a container specifically built around a root filesystem.
I am not going to struggle with systemd if I have to build containers specifically for it. If I have to rearrange everything I am doing I would just learn to do it on a minimal Kubernetes install instead.
nspawn containers aren't really any different to regular system images/archives other than they don't need a kernel.
I don't think the setting is exposed to regular service units (it might be able to in the future, I don't know) and I don't think podman has any integration with it.
What kinda service do you have where you need a full range of UIDs?
I don't need a full range. I would just like to run podman under a non-root user using regular system services. Especially where a persistent volume or bind mount is involved.
Let's say Home Assistant. It would be nice to have a have some system user "homeassistant" with no home directory that owns the process and owns its /var/whereever/config.conf . It would be nice to have the isolation on host in addition to the isolation via container. But I don't want to be rebuilding any containers to get that, unless I am misunderstanding something on nsresourced.
I'd be really pleased with that setup. MQTT could be its own system user. And HA could depend on MQTT so I have nice startup behavior. Etc.
IDK how to have system users like this run a container without the subuid range. Even when I create the users with ranges in the file, there seems to be problems with informing systemd (as a non-root user) that the running process is different from the one it started.
podman quadlet doesn't seem to support running at a "system level" as a non-root user, at least according to their docs[0]. I assume they make some assumptions which wouldn't hold up if the user actually changed when running at a system level, dunno.
> But I don't want to be rebuilding any containers to get that, unless I am misunderstanding something on nsresourced.
Setting up the user namespace would be part of the container manager and not the containers themselves, so they shouldn't need any rebuilding or special handling (possibly the files might need to be shifted into the "foreign ID" range[1, 2], but I might be lying with this and this isn't necessary for this usecase) but the container manager needs to be specifically make use of nsresourced.
I really think currently the best option is to go with either systemd as your "container manager" (e.g. just regular system files with sandboxing or nspawn images or maybe systemd-portabled[3]) or podman as your container manager. As much as I too would love to mix them, I don't think it's the best idea (at least in the current state) and just go with what is more suited for the task (in your case it sounds like podman would be the most suited option).
> there seems to be problems with informing systemd (as a non-root user) that the running process is different from the one it started.
Yea, I don't think systemd likes double forking. The best option would be to keep the process that spawned your actual process alive until the child exists and just bubble up the exit code. There is the `PIDFile=` option with `Type=forking`, but I haven't used it, nor looked much into it.
> For example I have no idea what they mean by the bullet "runtime integrity".
This is for example dm-verity (e.g. `/usr/` is an erofs partiton with matching dm-verity). Lennart always talks about either having files be RW (backed by encryption) or RX (backed by kernel signature verification).
reply