Hacker News new | past | comments | ask | show | jobs | submit login
ZFSBootMenu (zfsbootmenu.org)
189 points by denysonique on Aug 13, 2023 | hide | past | favorite | 52 comments



Kudos to everyone involved in this! Love everything about this. Using it on my notebook, on dedicated servers rented at Hetzner as well as on Hetzner-Cloud, as well on a bunch of dedicated servers in a rack. Solves almost all problems related to ZFS and Linux. Booting this from SYSLINUX works very well as well as UEFI - it's extensible and you can run it with the ZFS git version if you use the generate-zbm command. Saved my ass quite a few times already.


May I ask how you use this on the dedicated server?


Hetzner's dedicated servers give you KVM access for stuff like tweaking your bios settings, installing an OS, or I guess using your boot menu. You have to request them via support ticket, but last time I did that I got it within 5 minutes, no questions asked.

If you have your own server in a rack somewhere chances are you bought one with a similar web interface (IMPI/BMC/whatever your brand calls it) on a separate always-on NIC on the mainboard.

https://docs.hetzner.com/robot/dedicated-server/maintainance...


When I installed FreeBSD on my Hetzner servers, I did so by booting the servers into the Linux based rescue mode and then I think I used dd to write the mfsBSD media onto one of the hard drives.

This way I didn’t have to request KVM access for my servers.

Perhaps a similar method can be used in order to install ZFSBootMenu


Nice tip but if you fail your server is toast and that's an interesting support ticket.


Their Linux based rescue mode boots from network.

I should probably specify though that I do this when setting up a new server.

If you fail then it should still be possible to select their Linux rescue image again from the Hetzner control panel and force reboot the server.

I seem to recall that I had to try a couple of times before I got it right. And that I did exactly that; select their rescue image again and force reboot.


Not a big deal though when you're doing a fresh install. Just cancel the server and provision a new one.


rootfs should be network mounted by NIC feature on most clouds, so it should just be the matter of clicking "Force shutdown" and "Reset disk to default install"


As others have pointed out you can request a KVM-console (called LARA there IIRC) for 3 hours for free - however if you install it correctly you can setup everything in the rescue system without requesting a KVM console. We have lot's of data to store and I'm using it with additional 16TB disks in a server and using the 2x enterprise SSD for the special metadata vdev - it's really nice to do a setfacl -R on 5 million files and 12TB data in 15seconds.


Void is my favorite distro, and I'm interested in giving this setup a try. So far I've been reluctant to use ZFS on Linux out of reliability concerns. Have you had any issues with this setup, or have any suggestions?


Setup works fine. Biggest issue was messing up the initramfs and the kernel-module build (I've used the now 2.2.0rc and before that git because I wanted to have overlayfs support and idmapped mounts - machines running docker and lxd - so everything is native in ZFS).

One issue to look out for is the activated features and the ZFS module in the kernel and in ZFSBootMenu - these should match as depending on the features it's not possible to import the pool otherwise.

zfsbootmenu just looks for kernels and boots them. If something fails you can go back to an older snapshot or use an older kernel - and you have an emergency shell.


how do you use this with hetzner-cloud? Do you use it as a vm or as part of, say, k8s?


I wanted to have snapshots and installed just Ubuntu on ZFS using debboostrap - basically using the tutorial on the page here almost verbatim. Hetzner Cloud is not using UEFI afaik so you need to setup a small SYSLINUX setup to boot. But there is no ready cloud image for that. Another idea I didn't implement yet is to have a minimal image that I can bootstrap using send/recv but at the moment there's no need for that.


It seems lame that UEFI firmware needs to 'mount' a filesystem to load a bootloader.

That bootloader needs to mount a filesystem to find the kernel.

The kernel needs to mount the filesystem to run the system.

Each of those mount operations is done with different code, and normally each involves some config or search process to find the right disk/partition. If any of the searches finds the wrong partition or is misconfigured, you get a boot failure.

It really feels like the boot process is more complex than it needs to be, with more opportunities for failure than necessary.


The firmware can execute UEFI binaries from any filesystem it can read. The spec mandates FAT32 as a supported filesystem, but nothing prevents additional filesystems from being supported - Apple's firmware for example understands APFS (and previously, HFS+).

The Linux kernel can act as an EFI binary/application using a mechanism/feature called EFISTUB and thus can be loaded directly. The concept of a "bootloader" in UEFI-based Linux system is mostly vestigial (to enable feature-parity with non-UEFI systems or work around broken firmware). On a non-defective UEFI firmware, a bootloader is unnecessary, the UEFI firmware can load your kernel and initrd directly from the EFI system partition.

Ideally you'd want your UEFI to be able to read and understand your main filesystem (ZFS in this case) which means you no longer need a separate EFI system partition to store your kernel/initrd (and can enjoy the redundancy and features provided by your filesystem of choice). UEFI is actually extensible so you can have third-party drivers (you'd need to store those drivers somewhere, but a USB stick/memory card would do, or you could technically embed it in your firmware)?

The problem when it comes to ZFS specifically is that there is no UEFI-based driver with feature parity to the main ZfsOnLinux project. GRUB has a primitive implementation (extracted as stand-alone EFI drivers here: https://efi.akeo.ie) but it lacks support for many features, effectively forcing you to have a separate boot-time ZFS partition with all unsupported features disabled (if you're going to use a separate partition, why not just use FAT32 which is natively supported).


There's not really a way around it unless you hardcode the bootloader rather than store it on disk.

That said, there are only two steps in the modern boot process on a PC: the UEFI firmware loading a basic FAT driver and the kernel mounting the other filesystems. The UEFI bootloader can use the existing FAT driver to load the kernel and the initramfs which will use the same code to mount partitions.

You can skip the UEFI bootloader and directly boot unified kernel images after putting them on the UEFI partition.


I'd quite like UEFI to be able to pass some kind of argv[0] to the direct-loaded kernel, so the kernel knows exactly which file on which partition of which disk it was loaded from. That would then become the default root filesystem.

That effectively removes all config from the process - and means that any disk with a uefi executable kernel can be booted without the mystery step of "lets try to figure out where we're booting from".


> I'd quite like UEFI to be able to pass some kind of argv[0] to the direct-loaded kernel, so the kernel knows exactly which file on which partition of which disk it was loaded from. That would then become the default root filesystem.

I think UEFI gives an EFI application (such as the Linux kernel) everything it would need for this already - but I guess Linux doesn't use it.

The EFI application entry point includes a handle to your own image [0], and the EFI_LOADED_IMAGE_DEVICE_PATH_PROTOCOL_GUID [1] protocol allows you to query the path it was loaded from. It is possible for an image to be loaded without a path, but it looks like EDK2 provides it at least [2].

[0] - https://uefi.org/specs/UEFI/2.10/07_Services_Boot_Services.h...

[1] - https://uefi.org/specs/UEFI/2.10/09_Protocols_EFI_Loaded_Ima...

[2] - https://github.com/tianocore/edk2/blob/991515a0583f65a64b3a6...


This does work inside of rEFInd, which can load a kernel EFI blob directly, and supply it args.


I think discoverable partitions are intended to solve this problem.

https://uapi-group.org/specifications/specs/discoverable_par...


That bootloader needs to mount a filesystem to find the kernel.

The kernel needs to mount the filesystem to run the system.

Each of those mount operations is done with different code...

Not on FreeBSD. Our bootloader reuses kernel code (because, you know, developing the entire operating system together makes this possible).


I always feel one of the under appreciated features of zfs on freebsd was how nicely it was integrated into the whole operating system. that is, it is easy to boot off a zfs drive in freebsd.


ZFSBootMenu "reuses kernel code", too. It actually reuses kernel binaries, because it is nothing more than a collection of init scripts in a small Linux initramfs to enumerate ZFS filesystems and allow you to kexec one of the kernels found within. When using `generate-zbm` to build a custom image, your copy of ZFSBootMenu will use the very same kernel and ZFS driver that your system is running. When using the the prebuilt UEFI or separate kernel/initramfs that we provide as a convenience, the image is built from a Void Linux container using whatever kernel and ZFS driver was current at the time the container image was created.

I'm not sure what the complaints are about mounting filesystems. Unless you want to raw-dump your kernel and other components necessary for booting to some known offset on the disk, you'll always have to walk some filesystem to find the kernel. This even happens in the FreeBSD boot process, where one of the stages has to go looking at the filesystem for a kernel.


This is precisely why UEFI is a heap of garbage, and things like coreboot, u-boot are much more appealing. It is far too complicated, and the complexity of the standard, coupled with half-assed vendor code, mean that not only is the boot process more fragile, it's also much less secure.


It's so, so, SO, unnecessarily complicated while providing me no new features (like a faster boot). LILO worked. Even with LVM. As the bootloader, of course it exists outside of the kernel, initrd, my rootfs. That code would run, find the kernel, run that.

GRUB has some slight improvements like running other operating systems, but that's about it. Unsure if it's worth the price...

None of this crazy modern boot-time ouroboros. Too many layers, too much software.

At least we should _get_ something for this. How about a 2s cold boot on my thinkpad?


>It really feels like the boot process is more complex than it needs to be

You can use EFISTUB kernels directly (through efibootmgr) and use UEFI bios as the bootloader. There is no automatic kernel discovery with this method of course.


More people than just you are starting to feel that way:

"I have come to bury the BIOS, not to open it: The need for holistic systems" https://www.osfc.io/2022/talks/i-have-come-to-bury-the-bios-...

This also talks about how you need a boot processor to do things like train up the RAM interface just so you can boot the main processor.


Not to mention the initrd, another filesystem that need to be mounted and must be kept aligned with the rest.

I've long switched to compiling my kernel including all needed modules and the EFI stub. I don't have initrd or bootloader anymore, and can boot in a few seconds.


Is the bootloader itself considered as an operating system?

Does it have a kernel too?


Can be. A bootloader is just a baremetal application, so called not by architecture but its primary purpose of loading much larger images, jumping to it and not coming back. If the CPU is coming back, it's called a monitor program, if it's going back and forth, it's a supervisor, or a kernel. Old/mainframe folks sometimes call operating system kernel a supervisor program.

Theoretically it should be possible to flash Linux Kernel onto BIOS Flash ROM to directly load and run it from there. But x86 being x86, you also have to have bunch of circus tricks to initialize motherboard and to get out of 16-bit 8086 compatible mode and to load Kernel from disk, in such hypothetical Kernel image. BIOS/UEFI and bootloader each do parts of those.


Does ZFSBootMenu allow for entering encryption password remotely on encrypted root?


Yes - both Dracut and mkinitcpio allow you to embed an SSH server in the ZFSBootMenu initramfs (dropbear, or OpenSSH) and connect to it. Once you connect, you can access the main interface and unlock any datasets prior to kexec.

https://docs.zfsbootmenu.org/en/v2.2.x/guides/general/remote...


It's an aside - but I wish there was a way to do a kexec and keep the ZFS datasets unlocked. I don't think there's anything that stops it being technically possible, but I'm pretty sure it would require kernel mode changes...


I just add the keyfile to the initramfs. It sits under an encryptionroot and is only readable by the root user, so it's largely as safe as native encryption can be.


yes you can add an ssh server or setup network in zfsbootmenu and use keylocation=https


I was looking at using this for my arch zfs-on-root setups, but I've instead just been hacking on /etc/grub.d/10_linux and /lib/initcpio/hooks/zfs to get the boot menu setup I want with grub. I like the simplicity of it this way with less dependencies (especially otherwise needing to use AUR for the zfsbootmenu build or use the pre-built binary blob).

One concern I had with zfsbootmenu was I couldn't figure out how to load microcode. With kexec, zfsbootmenu can only load one image and late loading microcode may be "dangerous" [1]. I don't know practically if that is a real security issue or not. I tried cat'ing my images together as below, but it still didn't work for me:

  mv initramfs-linux.img initramfs-linux.img.orig
  cat intel-ucode.img initramfs-linux.img.orig > initramfs-linux.img

[1] https://docs.kernel.org/arch/x86/microcode.html#why-is-late-...


There shouldn't be any issues catting the real initramfs with microcode into another file. I do that, as does another ZBM developer. What do you see when you try it?

I started ZBM years ago by hacking on the same grub script, then progressed to what it is now!


When I booted normally with the concatenated image (ensuring removing the original microcode img from the grub.cfg initrd command), I booted and I confirmed the microcode loaded with (dmesg | grep microcode).

Then switching to ZBM, while it did boot with the concatenated image, I didn't see microcode loaded in dmesg.


Nice project. In evaluating this for possible use, I have a question about workflow integration with snapshots. On gentoo-esque distros I believe there are two common configurations with ZFS, namely: (1) have a separate package repo (portage tree) ZFS dataset to the system root ZFS dataset and (2) to have them combined on one dataset.

In case 2 portage changes are automatically synchronized to system snapshots, but multiple system instances (VMs, diskless nodes, etc.) will have to redundantly update portage. However, in case 1 they are de-facto desynced and this can cause gentoo issues (yet saves duplicate network operations so is nominally desirable). Does ZFSBootMenu have a built-in system for managing ZFS root system snapshots with co-dependent dataset snapshot versions to enable case 1?


No, the ZFSBootMenu GUI does not do recursive operations on datasets / try to infer dependencies. However, there's a Bash shell that's a key press away with a full set of OpenZFS binaries. You can also do a custom build of ZFSBootMenu and add in whatever additional scripts you'd like so that you can manage snapshot rollback/promotion as you see fit.


Thanks, that's what I suspected.

After writing the grandparent comment I realised some people (in particular distro devs, I suspect) may treat kernel module datasets in a similar fashion, which provides a very similar use case.

I wonder if - zfs dataset mountpoints and snapshot timings aside (both of which are already embedded in zfs) - it could be worth considering adding some zfs dataset properties as hints.

One idea (half-baked) are "zfsbootmenu-boot-significant" = "true" for ZFSBootMenu guesses for adding a default inference of latest/greatest combos. Such a binary flag should elegantly cover both kernel module and portage tree type use cases, by indicating to ZFBootMenu that the dataset in question (and snapshots thereof) provide system-integrity critical system state and should be paired with a root dataset selection. ZFSBootMenu could then make appropriate assumptions around default selections / temporally proximal pairings.

A second idea is a ZFS dataset property "zfsbootmenu-last-successful-boot" = "<datetime>" which combined with snapshot times should provide a useful inference. This could be rolled in to the ZFSBootMenu userland as an rc script updating the property on boot. Additional information such as "zfsbootmenu-last-successful-boot-options" and "zfsbootmenu-last-successful-boot-dataset-<dataset>-snapshot" could then be added. Alternatively or in addition a zfsbootmenu dataset could be added with more detailed log files.

The result should be a cross-distribution (indeed cross-OS) methodology for the autonomous inference and validation of boot configurations involving multiple datasets in all use cases, a plethora of debugging information in a standard location, and even a mechanism to iteratively and autonomously fall back toward a functional boot configuration in the event of problems (which may be further improved using watchdog drivers, ie. if boot does not complete in X minutes, reboot and try another configuration).


I used the FreeBSD version of this, I'm a shill at this point but I find nixos booting to an ephemeral tmpfs to be much better.

This wouldn't apply if you needed to have divergent state though, though it's hard to imagine a use case for that unhandled by fs snapshots.


In what way do you find NixOS to be better?

I'm actually thinking of going the other way, from NixOS to Void+ZFS. I've been using NixOS on 2 machines for a few months now, so I'm relatively new to it, but I still struggle with basic things, and don't really grok the Nix language. If ZFS+ZFSBootMenu can give me easy snapshotting and rollback functionality, then I might prefer it over NixOS.

Sure, Nix does many more things besides snapshots, but for my use case it's the main benefit, so I wouldn't be missing much.


NixOS comes alive when you have ephemeral root. It's generating all of /var and /etc as it boots, from the config, and then just mounting your home directory at the end.

So if I want to switch from pulse to pipewire, or some other messy change, I just edit the config, boot into it and it's like pulse never existed. If I don't like pipewire (I love pipewire) then I just choose the prior config option at boot and equally, pipewire never existed.

I don't want my bootloader handling snapshots of my homedir, I can do that myself if I need (zfs auto snapshots, etc.). So with an ephemeral root, there is no other state to manage, making this kind of boot menu redundant.

Think of it this way, you can carefully manage filesystem state using snapshots, or you can slap it all in the /nix store and have your system live assemble itself to spec on every boot. Like using git for your source tree instead of tar'ing up the whole thing everytime you make a scary change.


I think you misunderstand what ZFSBootMenu does. It doesn't manage any snapshots. It refuses to do anything to any file system that isn't clearly marked as an operating system root. (There are a few well-defined criteria that must be met before ZBM will even attempt to determine if a filesystem has Linux kernels that it will allow you to boot.) Once it identifies one or more file systems that contain bootable Linux kernels, it allows the user to select a kernel from one of those file systems for booting. It also allows the user to enumerate snapshots of those bootable file systems and boot from them via ZFS cloning (with or without promotion) or a send-receive duplicate that avoids interdependencies.

Yes, Nix manages a history of past system instances and NixOS modifies the bootloader to present each of these states as a bootable option. This maps loosely to the ability to elevate ZFS snapshots to boot environments in ZBM, but the functionality is not redundant. In fact, it isn't even a compatible alternative---we haven't found a good way to make ZBM boot NixOS. If you want NixOS, you're booting the NixOS way.

Nix is a very interesting concept that offers several advantages. It also has drawbacks. For example, it can be inordinately complex to manage small deviations from upstream configurations that aren't represented by pre-existing options. (Ever try to add a single line to a PAM configuration file in Nix?)

The Nix way of booting falls apart should you want to have multiple Linux distributions coexisting on a single pool. NixOS works best when you have complete buy-in. ZFSBootMenu doesn't care; if it can find kernels in the `/boot` directory of a ZFS filesystem, it will show you the filesystem and let you boot it.


In case you weren't aware:

Timeshift

> System restore tool for Linux. Creates filesystem snapshots using rsync+hardlinks, or BTRFS snapshots. Supports scheduled snapshots, multiple backup levels, and exclude filters. Snapshots can be restored while system is running or from Live CD/USB.

https://github.com/linuxmint/timeshift


Which Linux distribution provides this out of the box in its installer like FreeBSD does? ... along with optional LUKS encryption as FreeBSD offers optional GELI encryption.

I will wait ...


OpenSuse Tumbleweed has btrfs+snapper and the installer sets it up automatically. I guess they could (technically) boot other operating systems but their focus is to boot into read only snapshots of the same OS.

The same setup is possible on all Linux distros but the user has to set it up.


Can BTRFS snapshots on OpenSUSE Tumbleweed (btrfs+snapper) provide same functionality as ZFS Boot Environments?

Nope.

Cite from System Recovery and Snapshot Management with Snapper for OpenSUSE Linux.

● Limitations

A complete system rollback, restoring the complete system to the identical state as it was in when a snapshot was taken, is not possible.

With ZFS Boot Environments you are bulletproof.

With btrfs+snapper you are crossing fingers if it will work the way you need.

Not the same thing.


It's okay for FreeBSD if Linux can do some things it can, and vice-versa. If you're that insecure about running FreeBSD after all this time, you'd do well to think about why that is.


Running an unencumbered FreeBSD system will never make any user insecure.


Been using this for arch https://github.com/prabirshrestha/simple-arch-installer and server https://github.com/prabirshrestha/simple-ubuntu-installer with remote ssh unlock for zfs encryption.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: