Docker is running a daemon with root privileges to start all containers. So if your start a container with "docker run -d ...." you talk to a privileged process. That in turn means, all spawned containers can have root privileges (docker run -v /etc/shadow ... to change the root password of your host). "rootless" actually means running a container process as a normal user. (less attack surface because of less permissions). So if you would run "podman -v /etc/shadow" as a normal user, you wouldn't have the permissions needed to open the file.
As simple as possible:
Docker ("normally"):
run every command inside container with full root permissions on host
$root-> Docker -> container
Docker/Podman ("rootless"):
run every command as the current user
$user-> container
The other big piece is capabilities (specifically CAP_SYS_ADMIN) which as I understand it is related but kind of orthogonal to the question of root/rootless.
For example, buildah (the container-building part of podman) is daemonless and can use the fuse-overlayfs storage driver to build containers rootlessly— you appear as root inside the container, but from the outside, those processes and any files created are owned by the original invoking user or some shim UID/GID based on a mapping table.
But critically, this doesn't mean it's possible to just run buildah inside any Kubernetes pod and build a container there, because buildah needs to be able to start a user namespace, and must have the /dev/fuse device mapped in. I believe there continues to be ongoing work in this area (for example Linux 5.11 allows overlayfs in unprivileged containers), but the issue tracking [1] it is closed without really being IMO fully resolved, since the linked article [2] from July 2021 is still describing the different scenarios as distinct special cases that each require their own special sets of flags/settings/mounts/whatever.
Yup, and based on that mapping table the process inside the container is not allowed to create another namespace and/or fuse-overlayfs. That's why you need to mount /dev/fuse into the container (you might also need cap_sys_admin and cap_mknod). There is another link from RedHat which also explains it:
well, building stuff with flatpak is not THAT intuitive. no package format before docker was from my point of view. on the other side, packaging also cleans up and docker-insides often are not cleaned up :D
depends on your use case. I wanted a way of removing network access for my text editors and starting ephermal firefox instances that are completely independent from each other. Its just an easier way to hack around an application tbh.
I am actually running a few of my daily applications, such as firefox, vscode, or spotify inside a podman container (rootless makes me feel a little safer). I build a small python script around it, which creates a desktop icon, tags the current version (so you can rollback), and updates the images after x amount of time. I'll clean it up and put it on github if someone is interested :)
i think they use bwrap as mentioned in a comment below.
my use case is to restrict network access for example. Or running multiple firefox instances in parallel (so they dont have the same parent process / cookies etc.). or restrict memory for to 2G per container. there were just a few things i wanted to do that didn't quiet work with flatpak or snap.
I will! cleaning up now and going to publish it later on github. The main idea was a least privilege approach to running simple desktop applications independent from the host OS and being able to control filesystem/network access on a per app basis. (spotify on fedora without flatpak or rpm-fusion repo's, not even sudo needed to install)
No real need for full Flatpak, Bubblewrap (bwrap) is intended to be a lightweight sandbox providing this out of the box, with Flatpak (and other stuff besides) building upon it. The Arch wiki has a nice introductory page: https://wiki.archlinux.org/title/Bubblewrap
Oh didn't know about bwrap yet! If i understand the wiki page correctly, you still need to get those binaries to your pc. So thats why i went with plain and simple dockerfiles.
that's probably what happened, i reached out to github and hope to get it changed.
It just left me at unease. The mighty Linus involved in an organization that redirects to a crypto scam? Can't be!
%wheel ALL=(ALL) NOPASSWD: ALL
effectively disabling sudo completely.