I'm not a filesystem person, but this sets off similar red flags to rolling your own encryption.
Isn't writing a robust file system something that routinely takes on the order of decades? E.g. reiserfs, bcachefs, btrfs.
Not to rain on anyone's parade. The project looks cool. But if you're writing an OS, embarking on a custom ZFS-inspired file system seems like the ultimate yak shaving expedition.
Sometimes doing things because they are hard is a great reason to do them to see if the reasons those things are hard are still valid. Doing a filesystem in Rust potentially mitigates some of those things. Most existing filesystems have gone through a lengthy stabilization phase where using them meant exposing yourself to nasty data corruption bugs, obscure race issues, and other issues that, when you root cause them, have a lot to do with the kinds of things Rust explicitly addresses (memory safety, safe concurrency, etc.). So there's a great argument to just try to leverage those features to make things easier and try to build an awesome file system.
Worst case this doesn't work. Best case, this works amazingly well. I think there's some valid reason for optimism here give other hard things that Rust has been used for in the past few years.
I see Redox as an incubator of new developments for a low level Rust ecosystem. It's not a production ready OS, its purpose is to spark new ideas, propose alternative implementations, try on new paths, etc. I see them implementing a ZFS variant as completely in-line with this objective.
There needs to be projects like that for any kind of innovation to happen.
You're talking about a project to write their own OS for ultimately the fun of it. It probably shouldn't be too surprising that attitude extends elsewhere.
The people who just think OSes would be better with more Rust in them, but aren't looking to reinvent from first principles are in the Rust for Linux universe.
And you know what, that's fine. Linux started out as a hobby project with similar origins before it became a big serious OS.
I don't believe in the "never roll your own encryption" it's literally giving up. Does it make economic sense, or is it just for a hobby? That's more debatable. It's also like a foil of 'don't use regex to parse html' or whatever, where the thread gets closed for comments.
The filesystem is so deeply connected to the OS I bet there's a lot of horror around swapping those interfaces. On the contrary, I've never heard anything bad about DragonflyBSD's HAMMER. But it's basically assumed you're using DragonFlyBSD.
Would I keep a company's database on a new filesystem? No, nobody would know how to recover it from failed disk hardware.
This isn't really my area but a Rust OS using a ZFS-like filesystem seems like a lot of classic Linux maintainer triggers. What a funny little project this is. It's the first I've heard of Redox.
Edit: reminds me of The Tarpit chapter from the Mythical Man Month
> The fiercer the struggle, the more entangling the tar, and no beast is so strong or so skillful but that he ultimately sinks.
The "never create your own encryption" advice is specifically because crypto is full of subtle ways to get it wrong, which you will NOT catch on your own. It's a special case of "never use encryption that hasn't been poked at for years by hundreds of crypto specialists" — because any encryption you create yourself would fail that test.
Filesystems, as complex as they are, aren't full of traps like encryption is. Still plenty of subtle traps, don't get me wrong: you have to be prepared for all kinds of edge cases like the power failing at exactly the wrong moment, hardware going flaky and yet you have to somehow retrieve the data since it's probably the only copy of someone's TPS report, that sort of thing. But at least you don't have millions of highly-motivated people deliberately trying to break your filesystem, the way you would if you rolled your own encryption.
That matches what I've heard, so I think you stated the trope perfectly. Your response is a good point about the actual difficulty. Perhaps I'm confused about what 'rolling your own encryption' means at an abstraction level. I just think it's weird that it comes up in an OS thread. Anyone who is serious about encryption is serious about the encryption hardware. At a higher level, WolfSSL limits the ciphers to a small, modern suite, which reduces the attack surface. Replacing OpenSSL is a fool's errand, I think; it's clearly the perfect implementation of OpenSSL, and it's a perfect security scapegoat. However, this is still about the x86 OS topic. Perhaps it's some TPM politics, similar to the decade-old stigma surrounding ZFS. Maybe I'm just questioning the limits of the x86 platform on any new operating system. Anyway, thanks for the response.
> I just think it's weird that it comes up in an OS thread
The only connection is that writing custom encryption is a thing that smart people like to try their hand at, but its success is defined by the long tail of failure cases not by the cleverness of the happy path. I agree 100% with what rmunn said.
As I said I'm not a filesystem person, but my sense is that filesystem difficulty is also dominated by the long tail of failure cases and for similar reasons. Failure in encryption means you lose control of your data, failure in filesystems mean you lose your data (or maybe you lose liveness/performance) [0]
But really I just meant it in the sense that it's a journey people often go down underestimating just how long it takes. So it's a sort of trap from the project management perspective.
> I'm confused about what 'rolling your own encryption' means at an abstraction level
It cuts through many abstractions. You should definitely not define your own crypto primitives. You also shouldn't define your own login flow. You shouldn't design a custom JWT system, etc. You probably shouldn't write your own crypto library unless there's not one in your language, in which case you should probably be wrapping a trusted library in C or C++, etc. The higher you go in abstraction, the more it's okay to design an alternative. But any abstraction can introduce a weakness so the risk is always there.
[0] Ordinarily you still have backups, which makes file system failures potentially less final than encryption failures. But what if the filesystem holding your backup root keys fails. Then the encryption wasn't a failure but you've potentially crypto shredded your entire infrastructure.
i don't think it has to be all that robust yet as it mostly runs in vms (even though it may be!).
an internet community project to write an entire operating system from scratch using some newfangled programming language is literally the final boss of yak shaving. there is no reason to do it other than "it's fun" and of course writing a filesystem for it would be fun.
Rust really is attractive to a filesystem developer. Over C, it brings generics for proper data structures, iterators (!), much better type safety, error handling - all the things Rust is good at are things you want.
For me, the things that would make it just perfect would be more ergonomic Cap'n Proto support (eliminate a ton of fiddly code for on disk data structures), and dependent types.
it remains an open question as to how reliable, performant and efficient a system built with these higher level constructs would compare to the highly optimized low level stuff you'd see in a mature linux filesystem project.
i suspect the linux stuff would be far more space and time efficient, but we won't know until projects like this mature more.
i'd be curious how many of the higher level features and libraries would be best avoided if attempting to match the performance and space efficiency of a filesystem implemented in purpose designed highly optimized c.
I'm rewriting some of my Arduino projects into Rust (using Embassy and embedded-hal).
It's _so_ _much_ _better_. I can use async, maps, iterators, typesafe deserialization, and so on. All while not using any dynamic allocations.
With full support from Cargo for repeatable builds. It's night and day compared to the regular Arduino landscape of random libraries that are written in bad pseudo-object-oriented C++.
sure, i believe it. the question i have is: if one were to try to match the resilience, storage, memory and time efficiency of the well optimized linux c implementations of mature filesystems, and one were to use rust, would they be using all these high level language features and libraries out of the box or would non-canonical use of the language be necessary? (and if so, (or not) how would the resulting implementation compare from a readability perspective?)
Calling it "optimized" is a stretch. A veeeery big one. The low-level code in some paths is highly optimized, but the overall kernel architecture still bears the scars of C.
The most popular data structure in the kernel land is linked list. AKA the most inefficient structure for the modern CPUs. It's so popular because it's the only data structure that is easy to use in C.
The most egregious example is the very core of Linux: the page struct. The kernel operates on the level of individual pages. And this is a problem in case you need _a_ _lot_ of pages.
For example, when you hibernate the machine, the hibernation code just has a loop that keeps grabbing swap-backed pages one by one and writing the memory to them. There is no easy way to ask: "give me a list of contiguous free page blocks". Mostly because these kinds of APIs are just awkward to express in C, so developers didn't bother.
There is a huge ongoing project to fix it (folios). It's been going for 5 years and counting.
> It's so popular because it's the only data structure that is easy to use in C.
Is this reasoning really true? A quick search reveals the availability of higher-level data structures like trees, flexible arrays, hashtables, and the like, so it's not as if the linux kernel is lacking in data structures.
Linked lists have a few other advantages - simplicity and reference stability come to mind, but they might have other properties that makes them useful for kernel development beyond how easy they are to create.
Well, yes. The kernel _now_ has all kinds of data structures, but you can look at the documentation from 2006 and see the horror of even the simplest rbtrees back then: https://lwn.net/Articles/184495/
A simple generic hashtable was added only in 2013!
> Linked lists have a few other advantages - simplicity and reference stability come to mind, but they might have other properties that makes them useful for kernel development beyond how easy they are to create.
The main property is that it's easy to use from C. Think about growable vectors as an exercise.
> It's so popular because it's the only data structure that is easy to use in C.
I don't understand that statement. Linked lists are no easier or harder to use than other data structures. In all cases you have to implement it once and use it anywhere you want?
Maybe you meant that linked lists are the only structure that can be implemented entirely in macros, as the kernel likes to do? But even that wouldn't be true.
Think about a growable vector. Another basic structure that everyone uses in the userspace.
You can iterate through it fine, it's just an array after all. But then you want to add an element to it. At this point you need to handle reallocation. So you need to copy or move the elements from the old array. But this will break if the elements are not simple data structures.
So you need to have a copy function. There are copy constructors in C++, but not in C. So you need to reinvent them.
Many C++ libraries are unfortunately not ergonomic, because they are tainted by C culture.
We see this in the compiler frameworks that predated the C++ standard, and whose ideas lived on Java and .NET afterwards.
Too many C++ libraries are written for a C++ audience, when they could be just as ergonomic as in other high level languages, and being C++, there could be a two level approach, but unfortunately that isn't how it rolls.
Several FS problems that are difficult to fix with incremental changes to legacy code bases:
* inotify implementation is insufficiently atomic, so event subscribers can fail to receive notifications under certain conditions.
* License/patent encumbrance prevents modern operating systems from implementing/distributing a common next-gen FS.
* ZFS has native encryption in theory but it was bolted on later and has numerous buggy interactions the powerful zfs send/recv.
* ZFS native encryption is architecturally incapable of protecting metadata like filenames.
There is lots of room for innovation, and with modern tools it shouldn't take decades to build a production-ready driver for any one ISA (presumable x86_64.) If this project wants to pilot a superior option, I'm all for it!
Looks like commits go back to 2016. Maybe there was code before then but that's when the repo was created, I can't tell. But it's not exactly new. Also, I don't think it's exactly intended for production usage at this time.
Redox Os is a microkernel operating system, completely different from monolithic kernels like Linux or BSD. I doubt it'll be easy to get existing ZFS drivers working on it at all.
Normally I'd say the same, but it is a matter of what your goals are. Ifnyour goal is to try new things and learn about computers on the way, why the hell not write your own filesystem as well?
If your goal is to gain wide adoption fast, that is a bad idea.
> Ifnyour goal is to try new things and learn about computers on the way, why the hell not write your own filesystem as well?
Yeah, to clarify, that was how I took it. I'm not concerned about Redox OS failing to get adoption. And I love that they're exploring new things and following their interests.
I was thinking more as someone with ADHD and widespread interests. It feels a bit like saying "I'm learning the oboe. But first I want to learn how to make an oboe." Both are cool goals, but if you do the second you may never get to the first in a meaningful way.
That may be okay, I was just intending to express surprise at a detour that could last a significant fraction of the development teams' lives. Especially since as the default file system, delays or breakages in the filesystem can delay or break the OS project as a whole.
The only thing btrfs took from ZFS was the featureset - COW, data checksumming, snapshots, multi device. ZFS was a much more conservative design, btrfs is based on COW b-trees (with significant downsides) and if you can put it in any lineage it would be Reiserfs.
Quotas that didn't work, or quotas that worked perfectly fine, but blew up your system in an unexpected way?
The reason I ask is because I'm trying to tease out if you have architectural problems with the way the filesystem is designed, or if you simply think it's unreliable.
I never wanted quotas at all, but a systemd tool turned them on unbeknownst to me. That was one problem. The second and worse problem was that the performance with quotas on was so terrible that the machine bricked.
I can only speculate, but maybe they're referring to the same thing Andrew Morton meant when he described ZFS as a rampant layering violation.
ie ZFS isn't just a file system. It's a volume manager, raid and file system rolled into one holistic system vs for example LVM + MD + ext4.
And (again I'm only speculating) in their micro kernel design want to have individual components running separately to layer together a complete solution.
It's only a rampant layering violation if you mandate the use of external layers like Linux device mapper as the only allowed way... Or you haven't actually read through the code and assume based on external user interface.
No, ZFS is not "monolithic".
It's just that on the outside you have a well integrated user interface that does not expose you to SPA (block layer), ZIO (IO layer, that one is a bit intersectional but still a component others call), DMU (object storage), and finally ZVOL (block storage emulated over DMU) and ZPL (POSIX-compatible filesystem on top of DMU) or Lustre-ZFS (Lustre metadata and object stores implented on top of DMU). There are also a few utility components that are effectively libraries (AVL trees, key-value data serialization library, etc)
In the Linux world you need to be hard to use in order to prove how pure you are. Anything that is actually easy to use is always considered unpure and bad.
Not sure why you're getting downvoted, considering how people torture themselves with calculating SSD cache sector offsets by hand so they can imitate 1% of ZFS's feature set with LVM2.
> when he described ZFS as a rampant layering violation
I read some blog posts back in the day about why they did this and it sounded a lot like those layers were more historical accidents or something.
You can turn it around and say that ZFS is a full stack filesystem (or vertically integrated if you will) and it should be pretty obvious that a rethink on that level can have big advantages.
Good question. I don't know about other microkernels, but NetBSD is a small kernel that supports ZFS. The support has been there since the 4.0.5 and 5.3[0], possibly earlier too. I'm not adept at navigating the mailing lists here, but I imagine a good place to learn about the challenges of porting ZFS to a smaller kernel would be the NetBSD and ZFS lists from that era (2008-2009). What NetBSD does today is use a 'zfs' modlue that depends on a 'solaris' kernel modile. The dependency of Solaris primitives is probably one of the major challenges with porting ZFS to any kernel. FWIW, somehow a ZFS port for the "hybrid" kernel in Windows also exists[1].
Who is calling it a microkernel? The post youre replying to calls it a “small kernel” - that does not imply it’s a microkernel tho, right? I didn’t think size has anything to do with it.
I came back to maybe delete my comments as I felt I might have came off harsh, esp before I saw the dead comment chain. No ill will, was confused as well I think.
“I don’t know about other microkernels” implies that NetBSD is also a microkernel. It is not.
Microkernel is not a size distinction. NetBSD kernel may even be smaller in terms of LOC or binary size than some microkernels. Idk. But that is beside the point.
Microkernel is an *architecture*. It is a name for a specific type of kernel design, which NetBSD is not.
"Maybe you should read it more careful, instead adding nothing to the discussion" is crossing into personal attack. So is "Read more, comment less." from your other post in this thread.
Generally speaking, it's not against HN's rules to be wrong. How could it be? We're all wrong about nearly everything. But it is against HN's rules to post disrespectfully, put others down, etc. - for the obvious reason that it poisons community discussion, plus is unnecessary.
It's especially important to follow those rules when one is right about something, because when a post contains both correctness and poison, the poison has has the side effect of discrediting the truth. That is bad for all of us (https://hn.algolia.com/?dateRange=all&page=0&prefix=true&sor...).
I don’t think it’s microkernels in general but their microkernel design which wants as much as possible in userspace. They want each component to have its own memory space. ZFS blurs the layers between filesystem and the volume management. This kinda bothers layers of abstraction model folks. And I assume combined with their posix like model it just sorta clashes with what they want to do. Not impossible to integrate, but they want something a little different.
I particularly don't buy it because ZFS used to have a FUSE build, and I'm pretty sure there's at least one company still running it in userspace in some form (something for k8s, IIRC?)
I guess? I'm not personally convinced that even 1 ZFS service per zpool would be a serious problem, but I can see why it would be considered unideal to a microkernel designer.
If I had to guess, it is because ZFS likes to insert itself into things beyond just being a filesystem. It is one of the reasons ZFS notoriously works poorly with database engines, which have a tendency to bypass or ignore the filesystem (for good reason). It is a design choice on the part of ZFS.
Yeah I've always written this off as a fun side project for a group of people but after seeing consistent updates and improvements over the last several years I've been so impressed by how far this project has been going.
Note the binaries are not specific to the kernel, so anything built for Genode will work on Genode systems of compatible ISA irrespective of kernel.
I am surprised to hear 2008, I could swear they have been active far longer. Maybe I am conflating it with TUD:OS.
They are indeed quite active. Just see their backlog of release notes. They release 4 times a year, on the clock, and always document what they've done.
Linux didn't win because it was GPL'd, it won because it was the only real alternative back in '92. The BSDs were all caught up in the moronic SCO lawsuits of the time, otherwise we'd all be using FreeBSD or some other 386BSD variant today instead of Linux. The GPL was a nice bonus but it isn't the real secret sauce that has powered Linux's growth, it was mostly good timing.
That doesn't mean that I'd rather see some form of copyleft in place (like the MPLv2) or at least a licence with some kind of patent protection baked in (like the Apache 2.0), the X11/MIT licences are extremely weak against patent trolls
other licenses being more sane doesn't imply MIT is _insane_ per se. It's just not a very sane option for cooperation and has a very real posibility of driving someone insane. Imagine working on redoxos for years with your friends and then Microsoft takes your work, rebrands it as Windows 19, completely steals all of the market from you and silences you through legal pressure without even crediting your work. All of this is very much possible and similar scenarios have happened before.
I am not native speaker but saying something is more sane doesn't mean the person means/thinks other option is insane (which is the extreme on the scale).
It can mean both of the options might be sane (reasonable) one is just more reasonable. It might also mean both of the options are insane (unreasonable) one is just less so.
None of the competition on the embedded space of FOSS operating systems, including Linux Foundation Zephyr, makes use of GPL.
Unfortunely the license is seen as tainted by all businesses, and plenty of OSes are already seen as Linux alternative in some spaces.
In others Android is the only being used, where the only thing left from GPL is the Linux kernel itself, and only because Fuchsia kind of went nowhere, besides some kitchen devices.
It's far more active than redox and it's actually running on real consumer devices. There are more than a hundred monthly active committers on the repo you were looking at, and that's not the only repo fuchsia has. Calling it dead or prone to dying is simply not based on any objective reality.
Okay, I take that back. Maybe I shouldn't say it is dead, but it is more on life support, where there is no new features being developed. Simply put, it is dead to me not that the project ceased to function, but dead to me in the sense that it is out of relevancy, just like Hong Kong.
What are the 100+ daily commits doing if not adding new features? Google is not spending any effort marketing the roadmap for the project, but it's very much still alive and in active development. There are RFCs published fairly often about technical designs for various problems being solved and you can see lots of technical discussions happening via code review.
Some new things that I can think of off the top of my head:
* More complete support for linux emulation via starnix.
* Support for system power management
* Many internal platform improvements including a completely overhauled logging system that uses shared memory rather than sockets
Most project happenings are not that interesting to the average person because operating system improvements are generally boring, at least at the layers fuchsia primarily focuses on. If you've worked in the OS space, a lot of things fuchsia is doing is really cool though.
Fuchsia is literally a Google project to avoid using Linux.
Look at their other "Open Source" projects like Android to understand why they would want to ensure they would avoid GPL code. It's all about control, and appearances of OS through gaslighting by source available.
Fuchsia would be far more valuable to everyone, including Google, if multiple parties participated in its development. If control was all that was desired, a hard fork of Linux would have made more sense. GPL doesn't compel companies to work with upstream. Just because you don't understand why fuchsia exists doesn't mean you need to invent fiction about it. Is it hard to believe there might be technical advantages to an alternative architecture to Linux and that a company might be willing to invest in trying to bring that innovation to the world?
Ext4 and NTFS both have a 2^32-1 limit on number of files as well. Realistically, you never actually want to make tons of files, so I have a pretty hard time seeing this being an issue in practice.
Files in nested folders are primarily an abstraction for humans. They are a maximally flexible and customizable system. This has substantial costs (especially in environments with parallel work). As such, no one really has millions of pieces of fully separate, unstructured, hierarchical data. Once you have that much data, there is almost always additional structure that would be better represented in something like a database where you can actually express the invariants that you have.
Filesystem is essentially a "simple" database. If it is not performing, then it is not a good db. It shouldn't really matter how many files you have if metadata, and indexing of that metadata is done properly (i.e. like in good db). It also has additional benefits to DB that usually do not even exist there as they aren't practical at all (like random access).
The problem with file systems is that even if it's a competently implemented DB, it's a DB where you cant (easily) change the schema, put type restrictions on the columns, or customize the indexing. File systems are great, but if you have a lot of data, using the right tool for the job is a lot better.
Aren’t block sizes (and minimum file size) normally around 4kB? So a max number of 1-byte files would take up around 16 TB, without adding any overhead. Those drives are available these days
Piles of small files are unpleasant to deal with. Going over millions of files even without touching the contents gets annoying. Trying to back up or move big directories gets worse. If you have a hard drive involved it really gets bad, it can probably seek 10 million times in an entire day.
> Redox had a read-only ZFS driver but it was abandoned because of the monolithic nature of ZFS that created problems with the Redox microkernel design.
Curious about the details behind those compatibility problems.
If it relied on OpenZFS, then I wouldn't be too surprised.
The whole ARC thing for example, sidestepping the general block cache, feels like a major hack resulting from how it was brutally extracted from Solaris at the time...
The way zfs just doesn't "fit" was why I had hope for btrfs... ZFS is still great for a file server, but wouldn't use it on a general purpose machine.
Solaris had a unified page cache, and ARC existed separately, along side of it there as well.
One huge problem with ZFS is that there is no zero copy due to the ARC wart. Eg, if you're doing sendfile() from a ZFS filesystem, every byte you send is copied into a network buffer. But if you're doing sendfile from a UFS filesystem, the pages are just loaned to the network.
This means that on the Netflix Open Connect CDN, where we serve close to the hardware limits of the system, we simply cannot use ZFS for video data due to ZFS basically doubling the memory bandwidth requirements. Switching from UFS to ZFS would essentially cut the maximum performance of our servers in half.
I also imagine you wouldn't benefit from ZFS there either, even if the ARC wasn't there. You have a single application and can presumably accept occssional data loss (just fetch content upstream). Just need to handle bitrot detection, but there's ways to get around that application-side.
Better to just have the filesystem get out of the way and just focus on being good at raw I/O scheduling.
I wonder if FreeBSD is going to get something io_uring-esque. That's one of the more interesting developments in kernel space...
There are benefits to ZFS for spinning drives. Eg, a metadata-only L2 ARC on an NVME drive, with the data coming from a spinning drive would likely perform better than UFS. This is because with UFS, the head has to move around at times to read metadata, where with ZFS and metadata cached on NAND, it could just read data in an ideal case.
FreeBSD has "fire and forget" behavior in the context of several common hot paths, so the need for io_uring is less urgent. Eg, sendfile is "fire and forget" from the application's perspective. If data is not resident, the network buffers are pre-allocated and staged on the socket buffer. When the io completes, the disk interrupt handler then flips the pre-staged buffers (whose pages now contain valid data) to "ready" and pokes the TCP state machine.
Similarly, FreeBSD has OpenBSD inspired splice which is fire-and-forget once 2 sockets are spliced.
For filesystems where in-memory cache is insufficient, maybe a generic ephemeral inode/dentry cache system - L2ARC without zfsisms - would be useful...
But to be fair, we are approaching the point where spinning rust stops making sense for even the remaining use cases, and so designing new optimizations specifically for it might be a bit silly now.
Even on Solaris the ARC existed. ZFS replaces a lot of systems traditionally not directly related to a Filesystem implementation.
For instance using the `zfs` tool one wouldn't only configure file system properties, but also control NFS exports, which traditionally was done using /etc/exports.
This was done as part of major UI/UX reshaping in Solaris 10 to make sysadmin lives easier, what it ultimately does is... Edits exports file..
ZFS and ZPOOL tools provide accesses to multiple different subsystems in ways that make more sense to end user, a lot like LVM and LUKS do on top of device mapper these days
Can you elaborate the last paragraph? In what way doesn't zfs fit? (I couldn't make it out from the first two paragraphs.) Where did btrfs fall short of your expectations? Why would you avoid zfs on general purpose machines if you deem it good enough for file servers?
ZFS managing its own cache that sidesteps the existing pagecache infrastructure and requires dedicating more memory for it to function well than you otherwise would with any other filesystem.
ZFS sidestepping conventional device and mount handling with the way it "imports"/"exports" ZFS pools, usually auto-mounting all datasets straight to your system root by default - if you have a dataset named "tank", it automounts to "/tank".
ZFS operation itself being an inexact science of thousands of per-dataset flags and tunables (again not set through the common path with mount flags), and unless you run something like TrueNAS that sets them for you it's probably best to pretend it's not there.
Common configuration with decent performance commonly involving complexities like L2ARC.
It's far too invasive and resource consuming for a general purpose machine, and does not provide notable benefit there. A dedicated file server won't care about the invasiveness and will benefit from the extra controls.
btrfs fell short by still not having its striped RAID game under control, only being "production grade" on single disks and mirrors for the longest time - probably still?
Zfs relies on Solaris (Unix) kernel primitives IIRC ... I remember hearing that to get zfs to work with an is you basically have to implement a good portion of the Solaris kernel interface as shims
"[..] In this project we will replace Redox's internal file descriptor representation with capability descriptors, optimized for both security and performance. This will provide a foundation for capability-based security on Redox, and possibly capability extensions from other UNIX-like systems, while also supporting POSIX-style file descriptors for application compatibility".
I've been dogfooding bcachefs for a few months, aside from a nixos kernel regression, and LKML drama :-(, it's been good (anecdote +1) I was early on the reiser4 bandwagon back on gentoo, a glutton for data loss is what I am...
I run nixos unstable, at some point in the last few weeks the kernel supplied with
* boot.supportedFilesystems = [ "bcachefs" ];
in my config, went from version 6.16.0 to 6.12.45 and I had very long boot times (30 minutes+) with a lot of messages. My solution was to switch to the latest kernel
* boot.kernelPackages = pkgs.linuxPackages_latest;
Which bumped me back up to kernel 6.16.8 and smooth sailing.
This would be a significant problem with my use case in the very near future. I already have double-digit-TB files, and that doesn't look like much margin on top of that.
I've occasionally pondered, how feasible would it be to write a APFS implementation just from the specs[0] alone. Is it harder or easier to create the implementation when you have a provided layout and mechanism how it works. Would it be easy to keep compatibility, and would it be a dead-end design for extensions that you'd like?
It doesn't currently have any GPU support (for example) - even for a pretty simple desktop CPU rendering is rather incompatible with battery life or performance in a laptop form factor.
Innovation is wonderful, but it’s hard to believe this has enough users to flush out the challenging bugs. Maybe if it had some kind of correctness proof, but it just seems like there are way too many subtle bugs in file systems in general for me to try a new FS.
Building out test infrastructure for correctness to support the project sounds like a fantastic idea.
That said, while it's compatible with Linux via fuse, unless you're helping to build RedoxOS, I don't think there's any real expectation that you would try it.
Well they’d have to write their own driver anyway for one. If they were going to take an existing design and write a new driver, ZFS would be the better choice by far. Much longer and broader operational history and much better documentation.
And you might not get sued by Oracle! RedoxOS seems to use the MIT license while OpenZFS is under the CDDL. Given Oracles litigious nature they'd have to make sure none of their code looked like OpenZFS code, even better make sure any of the developers had ever even looked at the ZFS code.
Its much better to hope that OpenZFS decides to create a RedoxOS implementation themselves then to try and make a clean room ZFS implementation.
Fair enough, though you can’t really understand how BTRFS works without reading the GPLed Linux source while ZFS has some separate disk format documentation. Don’t know that anyone would sue you though.
Its not unreasonable to look at the source code to understand the disk format to then create an independent driver. So long as you are not directly copying code (or in this case, paraphrasing C to Rust.)
More importantly though, Linux or the Linux Foundation are unlikely to file a lawsuit without clear evidence of infringement, whereas Oracle by their nature will have filed lawsuits and a dozen motions if they catch even a whiff of possible infringement.
I wouldn't touch Oracle IP with a 50' fibreglass pole while wearing rubber boots.
Its certainly not as bad as some licences but there are still some gotchas to the CDDL. The 2 big ones here (AFAICT) is it's not compatible with MIT, and any reimplementation of CDDL code would not have the patent protections of the original.
Also, and again, the validity of a lawsuit isn't going to stop Oracle from filing it.
License is the obvious blocker, aside from all the technical issues[0]. Btrfs is GPL, RedoxOS is MIT, ZFS is CDDL. You can integrate CDDL into an MIT project without problems[1], but due to the viral nature of the GPL, integrating btrfs would have impacts on the rest of the project.
What I'm wondering is what about HAMMER2? It's under a copyfree license and it is developed for a microkernel operating system (DragonflyBSD). Seems like a natural fit.
[0] btrfs holds the distinction of being the only filesystem that has lost all of my data, and it managed to do it twice! Corrupt my drive once, shame on you. Corrupt my drive twice, can't corrupt my drive again.
[1] further explanation: The CDDL is basically "the GPL but it only applies to the files under the CDDL, rather than the whole project". So the code for ZFS would remain under the CDDL and it would have all the restrictions that come with that, but the rest of the code base can remain under MIT. This is why FreeBSD can have ZFS fully integrated whereas on Linux ZFS is an out-of-tree module.
> Corrupt my drive twice, can't corrupt my drive again.
Exact same drive? You might want to check that drive isn't silently corrupting data.
I still blame btrfs, something very similar happened to me.
I had a WD Green drive with a known flaw were it would just silently zero data on writes in some random situations. EXT4 worked fine on this drives for years (the filesystem was fine, my files had random zeroed sections). But btrfs just couldn't handle this situation and immediately got itself into an unrecoverable state, scrub and fsck just couldn't fix the issue.
In one way, I was better off. At least I now knew that drive had been silently corrupting data for years. But it destroyed my confidence in btrfs forever. Btrfs didn't actually lose any additional data for me, it was in RAID and the data was all still there, so it should have been able to recover itself.
But it simply couldn't. I had to manually use a hex editor to piece a few files back together (and restore many others from backup).
Even worse, when I talked to people on the #btrfs IRC channel, not only was nobody was surprised the btrfs had borked itself due to bad hardware, but everyone recommend that a btrfs filesystem that had been borked could never be trusted. Instead, the only way to get a trustworthy, clean, and canonical btrfs filesystem was to delete it and start from scratch (this time without the stupid faulty drive)
Basically, btrfs appears to be not fit for purpose. The entire point of such a filesystem is that it should be able to run in adverse environments (like faulty hardware) and be tolerant to errors. It should always be possible to repair such a filesystem back to a canonical state.
> Basically, btrfs appears to be not fit for purpose. The entire point of such a filesystem is that it should be able to run in adverse environments (like faulty hardware) and be tolerant to errors. It should always be possible to repair such a filesystem back to a canonical state.
Pretty sure all file systems and their developers are unsurprised by file system corruption occurring on bad hardware.
There are also drives that report successful flush and fua, but the expected (meta)data is not yet on stable media. That results in out of order writes. There's no consequence unless there's a badly timed crash or power failure. In that case there's out of order writes and possibly dropped writes (what was left in the write cache).
File system developers have told me that their designs do not account for drives miscommunicating flush/fua succeeding when it hasn't. This is like operating under nobarrier some of the time.
Overwriting file systems' metadata have fixed locations, therefore quite a lot of assumptions can be made during repair about what should be there, inferring it from metadata in other locations.
Btrfs has no fixed locations for metadata. This leads to unique flexibility, and repair difficulty. Flexible: Being able to convert between different block group profiles (single, dup, and all the raids), and run on unequal sized drives, and conversion from any file system anybody wants to write the code for - because only the per device super blocks have fixed locations. Everything else can be written anywhere else. But the repair utility can't make many assumptions. And if the story told by the metadata that is present, isn't consistent, the repair necessarily must fail.
With Btrfs the first step is read-only rescue mount, which uses backup roots to find a valid root tree, and also the ability to ignore damaged trees. This read-only mount is often enough to extract important data that hasn't been (recently) backed up.
Since moving to Btrfs by default in Fedora almost 10 releases ago, we haven't seen more file system problems. One problem we do see more often is evidence of memory bitflips. This makes some sense because the file system metadata isn't nearly as big a target as data. And since both metadata and data are checksummed, Btrfs is more likely to detect such issues.
To be clear, I'm not expecting btrfs (or any filesystem) to avoid corrupt itself on unreliable hardware. I'm not expecting it to magically avoid unavoidable data loss.
All I want is an fsck that I can trust.
I love that btrfs will actually alert me to bad hardware. But then I expect to be able to replace the hardware and run fsck (or scrub, or whatever) and get back to the best-case healthy state with minimal fuss. And by "healthy" I don't mean ready for me to extract data from, I mean ready for me to mount and continue using.
In my case, I had zero corrupted metadata, and a second copy of all data. fsck/scrub should have been able to fix everything with zero interaction.
If files/metadata are corrupted, fsck/scrub should provide tooling for how to deal with them. Delete them? Restore them anyway? Manual intervention? IMO, failure is not a valid option.
I wrote a tool to try to attack this specific problem (subtle, random drive corruption) in the general sense https://github.com/pmarreck/bitrot_guard but it requires re-running it for any modified files, which makes it mainly only suitable for long-term archival purposes. I'm not sure why one of these filesystems doesn't just invisibly include/update some par2 or other parity data so you at least get some unexpected corruption protection/insurance (plus notification when things are starting to go awry)
I too have had data loss from BTRFS. Had a RAID-1 array where one of the drives started flaking out, sometimes it would disappear when rebooting the system. Unfortunately, before I could replace the drive, one time when booting my array had been corrupted and it was unrecoverable (or at least it was unrecoverable with my skill level). This wasn't a long time ago either, this was within the last 2-3 years. When I got the new drive and rebuilt the array, I used ZFS and it has been rock solid.
> License is the obvious blocker, aside from all the technical issues. Btrfs is GPL
WinBtrfs [1], a reimplementation of btrfs from scratch for Windows systems, is licensed under the LGPL v3. Just because the reference implementation uses one license doesn't mean that others must use it too.
Last time I looked at DragonflyBSD, it was kind of an intermediate between a traditional kernel and a microkernel. There certainly was a lot more in the kernel as compared to systems built on e.g. L4.
There certainly is a continuum. I've always wanted to build a microkernel-ish system on top of Linux that only has userspace options for block devices, file systems and tcp/ip. It would be dog-slow but theoretically work.
You mean because the CDDL files would have to be licensed under GPL, and that's not compatible with the CDDL? I assume MIT-licensed files can be relicenssd as GPL, that's why that mix is fine?
Yes, if ZFS (CDDL) was integrated into Linux (GPL) then the GPL would need to apply to the CDDL files, which causes a conflict because the CDDL is not compatible with the GPL.
This isn't a problem integrating MIT code into a GPL project, because MIT's requirements are a subset of the GPL's requirements so the combined project being under the GPL is no problem. (Going the other way by integrating GPL code into an MIT project is technically also possible, but it would covert that project to a GPL project so most MIT projects would be resistant to this.)
This isn't a problem combining MIT and CDDL because both lack the GPL's virality. They can happily coexist in the same project, leaving each other alone.
And that's why zfs inches along with a fraction of the progress it could have had for decades.
This lack of required reciprocity and virtuous sounding "leave each other alone" is no virtue at all. It doesn't harm anyone else at least, which is great, but it's also shooting itself in the foot and a waste.
It's not an issue of GPL virality because the CDDL-ed code is not derivative of GPLed code and thus out of scope for GPL.
The problem is, IIRC, that GPLv2 does not allow creating a combined work where part of it is covered by license that has stricter requirements than GPL.
This is the same reason why you can't submit GPLv3 code to Linux kernel, because GPLv3 falls under the same issue, and IIRC even on the same kind of clause (mandatory patent licensing)
> The problem is, IIRC, that GPLv2 does not allow creating a combined work where part of it is covered by license that has stricter requirements than GPL
That is part of the virality: the combined work is under the terms of the GPL and therefore cannot have additional restrictions placed on it. If the GPL wasn't viral then the GPL code and CDDL code would both be under their respective licenses and leave each other alone. The GPL decided to apply itself to the combined work which causes the problems.
For that matter, the "./file" pattern is only required to disambiguate executables in the local directory so it doesn't try to look them up in the PATH. For arguments like here it's redundant.
I hope someone can bring this issue to redox-os project about its package management command "pkgar". Reading it aloud in spanish sounds as "pa cagar". "pa" is a very common contramption of "para" so we have "para cagar" which translated back is "to shit".
Sorry for commenting this here, Redox is using a private gitlab instance I have no access to.
Isn't writing a robust file system something that routinely takes on the order of decades? E.g. reiserfs, bcachefs, btrfs.
Not to rain on anyone's parade. The project looks cool. But if you're writing an OS, embarking on a custom ZFS-inspired file system seems like the ultimate yak shaving expedition.