Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
My 2023 all-flash ZFS NAS (Network Storage) build (stapelberg.ch)
158 points by signa11 on Oct 27, 2023 | hide | past | favorite | 93 comments


Am I misreading or did he create a non-redundant pool spanning the 2 SSD drives? I don't think scrubbing will keep him from losing data when one of those drives fails or gets corrupted.

Edit: Looked again and he's getting redundancy by running multiple NASes and rsyncing between them. Still seems like a risky setup though.


In the just-released 2.2.0 you can correct blocks from a remote backup copy:

> Corrective "zfs receive" (#9372) - A new type of zfs receive which can be used to heal corrupted data in filesystems, snapshots, and clones when a replica of the data already exists in the form of a backup send stream.

* https://github.com/openzfs/zfs/releases/tag/zfs-2.2.0

* https://github.com/openzfs/zfs/pull/9372


-- if using `zfs send` to create the replica, which the author of the post is not.


> I don't think scrubbing will keep him from losing data when one of those drives fails or gets corrupted.

It pains me to see a ZFS pool with no redundancy because instead of being "blissfully" ignorant of bit rot, you'll be alerted to its presence and then have to attempt to recover the files manually via the replica on the other NAS.

I appreciate that the author recognizes what his goals are, and that pool level redundancy is not one of them, but my goals are very different.


In my setup I combine one machine with ZFS with another running btrsfs, therefore using rsync/restic. And Apple devices uses APFS. I'd rather use ZFS solely (or in future: bcachefs) but unfortunately Apple went w/their native next-gen CoW filesystem and I don't think ZFS is available for Synology DSM. Though perhaps I should simply replace DSM with something more my liking (Debian-based, Proxmox).


DSM doesn't support ZFS. It does support btrfs (the default), FWIW.


On the other hand, I myself use a similar approach. For home purpose I prefer having redundancy across several cheaper machines without raid1,10,5,6,z than a more expensive single machine with disk redundancy. And I'd rather pay the additionnal money on putting ECC ram on all machines.


For home-use this is a reasonable tradeoff. Imagine my non tech-savvy wife for some reason has to get access to the data when one of the NAS has malfunctioned because the zfs pool encryption was badly configured. Explaining "zfs receive" or let alone what zfs, linux and ssh is going to be grounds for divorce. Heck, i don't even want to read man pages about zfs myself on weekends, i have enough such problems at work. Besides, you still want 2 physical locations.

It's going to be less optimal and less professional. That's ok, for something as important as backups, keep it boring. Simply starting up the second box is simple, stupid and has well-understood failure modes. Maybe someone like me should just buy an off-the shelf NAS.


That seems like a pretty reasonable plan actually. If I ever do a NAS I was thinking to have one disk for storing files, and a pair of of disks for backing up both the first disk and my laptop.

That way everything has 3 copies, but I'm not backing up a compressed backup that might not deduplicate well and might be a little excessive to have 4 copies.


It wasn't mentioned in the blog, but you can set `copies=N` on ZFS filesystems and ZFS will keep N copies of user data. This provides redundancy that protects against bitrot or minor corruption, even if your zpool isn't mirrored. Of course, it provides no protection against complete drive failure.


Curious if something like Ceph would help.


At a massive performance cost.


Performance isn’t the only requirement in storage.

Are there any other options worth looking at? Thanks!


Clustering filesystems are old. RedHat at some point took over the company behind GFS perhaps worth looking into. I've also seen people in edge computing space go for Longhorn instead of Ceph. But I don't know anything about it apart that Ceph is available in Proxmox.

My homelab with Turing Pi 2 is running RPi4 CM and an Nvidia Jetson. Each also have a small SSD (one native, two miniPCI to SATA, and one USB). They could be used with k8s or a clustering filesystem but haven't played with it as of yet.


ceph is never the answer. Its a nice idea, but in practice its just not that useful.

if you want speed: lustre

if you want anything else: gpfs.


> ceph is never the answer. Its a nice idea, but in practice its just not that useful.

Depends. At my last job we used it for our OpenStack block/object store and it was performant enough. When we started it was HDD+SSD, but after a while (when I left) the plan was to go all-NVMe (once BlueStore became a thing).


Sacrificing 10-30% of performance for other things that might be useful might be well worth it.


My understanding after reading and testing is that we're talking at least two orders of magnitude difference unless you have a lot of disks (certainly more than the 6 I tried with) and quite beefy hardware.

Haven't tried all-SSD cluster yet though, only spinning rust.


Our OpenStack storage back-end at about ten+ storage servers, each with (IIRC) dozen+ disks eacho.


Appreciate the options to check out.


I noticed that the author is using ZFS-native encryption, which in my experience is not particularly stable. I've even managed to corrupt an entire pool with ZFS send/receives when trying to make backups. I'd strongly recommend using ZFS-on-LUKS instead if encryption is required.

The list of open issues on GitHub in the native encryption component is quite telling: https://github.com/openzfs/zfs/issues?q=is%3Aopen+is%3Aissue...


I have to agree, the leadership’s response and handling of errors in regards to encryption has been… honestly pretty disappointing to see as a zfs fan and user.

While it’s not common, the amount of people running into edge cases and killing both their sending and sometimes even receiving pools (better have another backup!) is frankly unacceptable. “Raw sends” seem to be especially at risk, though sending in general seems to be where the issues mostly lay. My thoughts mirror the comments here: https://discourse.practicalzfs.com/t/is-native-encryption-re...

Here’s a currently unmaintained document by Rincebrain that he was using to try to track things before he got burned out by the lack or response. https://docs.google.com/spreadsheets/d/1OfRSXibZ2nIE9DGK6sww...


I was quite surprised when I learned that ZFS-native encryption is so underdeveloped, and that the developers still sometimes seem to forget about it when coding new features. I had presumed it would be table stakes for enterprise storage by now. Is it just the case that everyone uses it on GELI or LUKS, or am I wrong to presume that encryption is as widespread in enterprise deployments as I thought?

It's a shame, because the main draw of the native encryption to me is being able to have a zero-trust backup target with all the advantages of zfs send over user-level backup software. But I've heard of several people running into issues doing this (though luckily no actual data loss that I've heard, just headaches)


I think large enterprises usually do encryption on the application level and not storage level these days. A KMS or encryption service, then storage services just hold encrypted blobs.

I run the 50 zfs disks in my house on LUKS.


I use it with snapshots and ZFS send, and so far it’s been fine. Truenas users also may use it.

But I’m concerned reading these comments. Anyone else experiencing issues?


The fundamental issue with ZFS encryption is that the primary developer that created it is no longer contributing significantly to the project. It's good code, with good tests, but it's not getting any additional love.

The utilities and tooling surrounding encryption are also weak, and there are ways you can throw away critical, invisible keydata without realizing it, and no tool to allow the correction of the issue, even if you have the missing keydata on another system.


It’s interesting that there hasn’t been anyone since to continue developing Tom’s code. I wonder how long the situation might continue. If no one wants to take over in near future, they might have to remove the feature.


I've been using ZFS with native encryption (Ubuntu Server) but also ZFS with LUKS (Arch, Ubuntu Desktop). Zero issues (though inability to run latest kernel can be annoying, esp on rolling distro). Wouldn't surprise me if write cache has a role in this issue though.


At EuroBSDcon 2022, Allan Jude gave the presentation "Scaling ZFS for NVMe":

> Learn about how ZFS is being adapted to the ways the rules of storage are being changed by NVMe. In the past, storage was slow relative to the CPU so requests were preprocessed, sorted, and coalesced to improve performance. Modern NVMe is so low latency that we must avoid as much of this preprocessing as possible to maintain the performance this new storage paradigm has to offer.

> An overview of the work Klara has done to improve performance of multiple ZFS pools of large numbers of NVMe disks on high thread count machines.

[…]

> A walkthrough of how we improved performance from 3 GB/sec to over 7 GB/sec of writes.

* https://www.youtube.com/watch?v=v8sl8gj9UnA

When ZFS was created spinning rust was still the main thing, and SSDs were gaining popularity, so ZFS created "hybrid storage" pools:

* https://cacm.acm.org/magazines/2008/7/5377-flash-storage-mem...

* https://web.archive.org/web/20080613124922/http://blogs.sun....

* https://web.archive.org/web/20080615042818/http://blogs.sun....

* https://www.brendangregg.com/blog/2009-10-08/hybrid-storage-...

* https://en.wikipedia.org/wiki/Hybrid_array

Still useful for use cases that lean towards bulk storage (versus IOps).


Oh I remember this guy! Michael has a nice blog where he writes interesting posts about hardware, networks and all sort of computer stuff. Sometimes I'm feeling a little bit jealous with all these "toys" he has, but he writes them in a good way (without wanting to show off). If you haven't seen it, read about when he upgraded his internet to 25 Gbit/s fiber: https://michael.stapelberg.ch/posts/2022-04-23-fiber7-25gbit...

Coming back on topic, the idea of having 3 custom made NASes is surely interesting, but apart from learning and experimenting with all these, I don't see a very big advantage from backup/security point of view compared to commecrial NASes (Synology, QNAP).

For sure we can all argue here about selection of filesystem (ZFS, btrfs, ext4...), selection of CPU, RAM type and all the others, but it all boils down to what each one wants (and has the money to spare). IMHO I wouldn't go with QVO SSDs and non redundant volumes (especially spanning 2 disks), but hey that's just me :-)


Has anyone built or attempted to build their own flash array, as in 20+ nvme u.2 disks in a hot swap chassis and consume via nvme-of or nvme-tcp

I see there are tri-mode backplanes but looking for something that skips the hba and connects via something like occulink directly?


We are currently testing a number of systems with 12x 30TB NVMe drives with Debian 12 and ZFS 2.2.0. Each of our systems have 2x 128G EPYC CPUs, 1.5TB of RAM, and a dual-port 100GbE NICs. These systems will be used to run KVM VMs plus general ZFS data storage. The goal is to add another 12x NVMe drives and create an additional storage pool.

I have spent an enormous amount of time over the past couple of weeks tuning ZFS to give us the best balance of reads-vs-writes, but the biggest problem is trying to find the right benchmark tool to properly reflect real-world usage. We are currently using FIO but sheer number of options (depth queue, numjobs, libaio vs io_uring) makes the tool unreliable.

For example, comparing libaio vs io_uring with the same options (numbjobs, etc) makes a HUGE different. In some cases, io_uring gives us double (or more) performance than libaio, however, io_uring can produce numbers that don't make any sense (eg: 105GB/sec reads for a system that maxes out at 72B/sec). That said, we were able to push > 70GB/secs large-block reads (1M) from 12x NVMe drives which seems to validate ZFS can perform well on these servers.

OpenZFS has come a long way from the 0.8 days, and the new O_DIRECT option coming out soon should give us even better performance for the flash arrays.


If you are seeing unreasonably fast read throughput, it is likely that reads are being served from the ARC. If your workload will benefit from the ARC, you may be seeing valid numbers. If your workload will not benefit from the ARC, set primarycache=metadata on the dataset and rerun your test, potentially with a pool export/import or reboot to be sure the cache is cleared.

The fact that fio has a bunch of options doesn’t make the tool unreliable. Not understanding the tool or what you are testing makes you unreliable as a tester. The tool is reliable. As you learn you will become a more reliable tester with it.


After seeing some of the unrealistic numbers, I set primarycache=metadata just like you pointed out. And, you are correct, I need to learn to be a better tester...


There are definitely better ways to benchmark.

I design similar NVMe-based ZFS solutions for specialized media+entertainment and biosciences workloads and have put massive time into the platform and tuning needs.

Also think about who will be consuming data. I've employed the use of an RDMA-enabled SMB stack and client tuning to help get the best I/o characteristics out of the systems.


What methods/tools are you using to benchmark your ZFS systems?


It depends on the use case. For high-speed microscopes, I may get a request that says, "we need to support 4.2 Gigabytes/second of continuous ingest for an 18-hour imaging run." - In those situations, it's best to test with realistic data.

For general video and media workloads, it may be something like, "we have to accommodate 40 editors working over 10GbE (2 x 100GbE at the server) and minimize contention while ingesting from these other sources".

I work with iozone to establish a baseline. I also have a "frametest" utility that helps when mimicking some of the video characteristics.


I did but bought a full box, didn't assemble myself.


Why disable swap? And...only 8gb of ram (am I misreading that?) with zfs? It's been years since I've needed a bunch of storage but I remember zfs being much happier with gobs of ram.


ZFS doesn’t need much ram unless you have specific needs. People have run basic zfs storage machines on 4GB of memory total and it works fine. Even 2GB has been done before, but that’s a bit too low for me to suggest. Broadly, 16GB of total system is personal general recommended baseline, because honestly it’s simply too cheap not to have it.

ZFS needs some amount of ram in order to even load the pool, this only becomes a practical concern when your pool gets into the 100’s of TiB.

Deduplication, which should generally not be used, used to be awful with ram consumption, but these days isn’t nearly as bad and can be surprisingly viable. Dedupe can still cause unexpected performance issues unless your skillset is into digging into system analysis and tuning.

You need some amount of RAM buffer for the write TXGs to coalesce efficiently. Generally not a concern.

Finally there’s ARC, which is where all the nice things that improve your experience happen. The more the better, but just like system ram, once you have enough for your usage profile, you stop noticing much benefit beyond that. For dumb file storage, not that much is needed. Ideally you want enough to keep all the metadata, plus whatever your actual repeated read access would be. Working on video editing and VMs would require more RAM for a more optimal experience.


Like I mentioned it's been a long time since I touched this stuff. I do think I set up my (underpowered) system aggressively with ARC now that you mention it. Probably was experimenting with dedup as well. I'm sure some of those things have had improvements over the years.


I'm using XigmaNAS with two zfs pools and only 4G RAM since like 8 years, also on a seemingly underpowered machine (Atom D410). Granted that it doesn't fly, but for home use is more than enough say for serving on the home LAN multiple HD movies at the same time, provided one doesn't load it with other more heavy services. Mine is currently only serving files through NFS and SMB and bittorrent; all else is turned off.

https://ibb.co/BNMWFXm

BTW, currently working around a bigger one that will run on a faster mini PC and a 8 bay USB3.1 enclosure. I'm a little wary about USB connectors in this context, and I'll likely have to secure cables firmly to avoid wearing and accidental pulls, but so far results on the bench are promising.


IIRCC it's not recommend to have swap on a zvol. So you would need a non-zfs managed partition/device for swap.

Not sure about current status?

https://github.com/openzfs/zfs/issues/7734

https://github.com/openzfs/zfs/issues/342

https://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSForSwapM...


He already has a separate NVMe drive for the OS, he could put the swap on that.


Only if he's not running rootfs on zfs?

Ed: I see he doesn't - but could (and optionally with a slice for swap partition): https://openzfs.github.io/openzfs-docs/Getting%20Started/Ubu...


Swap on a zvol I understand, disabling it entirely seems questionable, no?


This is a dedicated NAS. What workload do you expect to be paging to disk? Your primary workload is already reading and writing bits off disk.


Because we live in the real world and no system is truly in a bubble if it's connected to a network. But mostly because seeing oom-killer messages in dmesg makes me sad :).


If something gets leaky enough then it's still going to hurt performance for a while and die.

Swap delays that but extra RAM delays it too. If you take a use case where 2GB of memory is fine, and give it 8GB, then you already solved the problem swap would solve. You can always add more but you're past the point of diminishing returns.


Since there's already a non-ZFS system disk, it's a little weird to not put a swap partition on it.

But I wouldn't say it's questionable. On a NAS I doubt you'd ever use even one full gigabyte of swap space. Keeping it simple is fine.


I did something similar this year with 12x 4tb NVMe drives. I had a 12x 2tb setup is an fashstor 12 but the CPU and RAM limitations led me to traditional RAID there. For the second NAS I just went standard parts but with extreme performance, 13900K and the like. To get the necessary PCIe lanes (since not every port would bifurcate) I used 3 4 way switches from aliexpress. Choosing a motherboard with 2 x8 cpu slots made the bandwidth not a problem, the third does hang off the demo 4.0 lanes though. The boot drive hangs directly off the cpu in the x4 slot. The 13900K actually supports ECC in the workstation motherboards as well. I just expose everything via NFS and call it a day

All in all I'm happy with it. I still use spinning rust for the backup though, no sense worrying about that. At the time it was cheaper to get 12x4 TB budget NVMe and the extra parts than go for any reasonable count of SATA SSDs, not sure if that's still true or not. SATA definitely would have been easier, e.g. stuff like in the Flashstor 12x2TB build I had to submit a kernel patch because the cheap drives were duplicating their nsids.


> the CPU and RAM limitations led me to traditional RAID there

IIRC there was some talk that RAID-Z lead to write amplification compared to mirrored drives, thus not good for cheaper SSDs. Haven't had time to sit down and think it through, does anyone know if thats right or wrong?

> I used 3 4 way switches from aliexpress

Did you find any that were significantly cheaper than ~100 USD?


In general I've found cheap SSDs these days have better endurances compared to cheap spinning disks anyways. https://www.servethehome.com/discussing-low-wd-red-pro-nas-h.... I think the drives I got were 1.6 PBW rated, no errors or failures yet. Beyond that, my workload isn't write heavy enough for this to be a concern of mine whether or not there is amplification. That SSDs don't count reads or active drive time in the endurance rating was more important. For the Flashstor build I did RAID4 instead of RAID5 just because I could though.

I want to say that sounds about right on price. If I didn't also have a desire for high single core performance at the same time a used/old Threadripper/Epyc build and bifurcation would probably make more sense. I also disconnected the onboard tiny "definitely going to be noisy as hell in 3 months" low quality fans from them and just rested a 140mm blowing down across the top of the 3 cards at 30% speed. Temps of the controllers and SSDs became better and it's dead silent.


Can you expand more on the hardware used? Specifically, which 4 way switches you used?

I have a bunch of unused M.2 drives from decommissioned servers that I'd love to use as additional storage.


I started with this page for adapters people had tried and recommended here https://forums.servethehome.com/index.php?threads/multi-nvme... but I think I ultimately went with the LinkReal x8 to quad.

If you don't specifically want the high per core performance of something like a 13900k a used/old Threadripper or Epyc system and bifurcation might make more sense. It'll also enable you to get maximum per drive bandwidth, if that's a concern for your (when you have 12 drives in some form of stripe and parity the per drive bandwidth ends up not being that important for most sane workloads though).


What brand of 4tb nvmes are you using?


I scooped up some MP34s on sale. The have something like a 1.6 PBW endurance rating and I haven't had a single one fail or start throwing errors yet (though my workload isn't as write heavy as many might have).


That's cool. I use some 4 tb silicon power brand drives. I didn't research them much. Prices are definitely going to fall soon on 8 tb, and I'll likely retrofit things pretty quickly with those.


> My main motivation for using ZFS instead of ext4 is that ZFS does data checksumming, whereas ext4 only checksums metadata and the journal, but not data at rest.

There's also dm-integrity [1]:

> The dm-integrity target emulates a block device that has additional per-sector tags that can be used for storing integrity information.

One more thing: where is the ECC RAM?

[1]: https://docs.kernel.org/admin-guide/device-mapper/dm-integri...


I'd be interested to see a follow up in time to see what the IO wear-n-tear looks like.

Also "Using gokrazy instead of Ubuntu Server would get rid of a lot of moving parts. The current blocker is that ZFS is not available on gokrazy. Unfortunately that’s not easy to change, in particular also from a licensing perspective."

I don't pretend to get how licensing works, but is OpenZFS and their licensing not an option here? I know its been really tricky with ZFS in general and I don't think its fully answered? but I'm not up to date with it.


An off the shelf NAS might be better. Some of them allow installing your own operating system such as TrueNAS. These boxes come with the right form factors, CPU, RAM, number of NICs, power consumption, etc.

Solutions such synology provide web interface, phone apps, and applications for syncing, photo management and backup. The units come with software for domain name, SSL certificates, monitoring, drive management, WoL, etc. There is a lot of software functionality useful for NAS built in.


And why not buy server grade ssd`s from ebay?

You can get them new at the same price point only caveat that you need a motherboard that supports bifurcation.


Depending on the vendor you might end up with drives that run custom firmware, which cannot be crossflashed, that might not even work at all in a non-OEM system, that might not be compatible with normal block storage use, that might not support stuff like power management etc. (and allegedly might have reset SMART logs)


You can easily buy the correct price of hardware by searching the forums and following the advice. The Freenas/Truenas forum is full of hardware guides and threads on what 10gb NICs or controllers to buy and when you need it flashed into IT Mode.

Most of the resellers will answers questions or even flash firmware. Local Craigslist guys even offered to do it but they don’t know why you’re doing it, the eBay people understand.


GP is specifically about Enterprise SSDs from eBay, which comes with specific caveats that don’t apply to OEM Mellanox cards.


It does look harder than I thought for SSDs. It looks like a 5 year old thread was commented on this year with some details about Samsung drives.[1]

[1]https://forums.servethehome.com/index.php?threads/firmware-e...


This is really cool. I did a similar thing years ago with FreeBSD + 1TB WD Green drives on an HP Microserver. It had a AMD CPU similar to an Intel atom so power consumption was pretty low. Overall I was really happy with ZFS and it worked well. I ended up using jails to have it perform various tasks like using pf as a firewall.


Why not ECC ram? Lot of AM4 boards support it.


I'd go with ECC RAM even if it costs a little bit more, same with an enterprise NVMe SSD. The latter is worth it in case of a power cut. It allows you to enable write cache and using the SSD as ZFS cache, massively increasing performance if you are using mechanical HDDs. This guy goes full SSD, but for the content he serves that isn't required at all.

Also, just have enough RAM, but don't disable swap. Put swap on an enterprise SSD but let Linux handle it. It will do so cleverly. Whereas with RAM only you cannot use swap at all. Minor disadvantage is you should encrypt your swap. But with modern hardware that shouldn't be a large penalty.


Doesn't look like the 3000G supports it. https://en.wikichip.org/wiki/amd/athlon/3000g


AFAIK, all ryzen cpus UNofficial support ECC, its up to the motherboard IIRC.


No, the 3XXXG series do not support ECC; which simply do not work. This is because the iGPU doesn't support ECC RAM.


> This is because the iGPU doesn't support ECC RAM.

What, how? The only part of the chip that touches the ECC bytes should be the memory controller itself. The reads and writes going across the internal fabric should be exactly the same.

Also the PRO versions support ECC and seem to have the same iGPU.


Only PRO CPUs do though, and they're only available as OEM parts of complete systems, or second-hand that's been ripped out of them. :)

On top of that, finding compatible ECC sticks is not easy and they can be quite pricey.


My Ryzen 3900X supports it fine, ASUS Prime Pro 370 and 570 boards.

Using 64 gigs of this: https://media.kingston.com/pcn/PCN_KSM26ED8_16ME_C1.pdf


Right, it looks like it's for APUs where only the PROs are ECC-compatible [0].

Given that was one of the requirements of the OP (integrated graphics), my point even though technically imprecise, still stands.

I have the same memory sticks. They cost £157.86 each 32Gb stick. That's over 2x the price of non-ECC sticks of otherwise similar specs.

[0] https://www.asus.com/support/faq/1045186


You buy that memory once, then run your precious data through it for years.

Sounds like money well spent.


Assuming the data being accessed by that RAM is precious, you're probably right. In fact, I agree so much that I spent that money.

I wasn't making any value judgements though, only posting observations.


You could use that same logic to justify a $30 power cable, though.


No. Power either is there or it isn't.


That's not quite true. Power fluctuations can cause instability or crashes, either of which could lead to data corruption.

That said, it's much more likely that a faulty or low quality PSU will be the cause of that rather than the cable, and there's a lot already written out there about why a high quality PSU is important.


RAM prices went down quite a bit over the last year or so and a 32GB stick of 3200 Kingston RAM with ECC here is now 70-80€, not that much more expensive than non-ECC (about 60€ for 3200, more for faster modules).

I replaced 2 of the 16GB ECC sticks in my main PC with 32GB ones a while ago and put them in my NAS, will probably do the same with the remaining 2 sticks to max out my RAM.


Not that I'm doubting you but I'm having a hard time finding Unbuffered ECC RAM in the correct configuration (at least in the UK).

This shop [0] sells the same ones I have but it's even more expensive than it was last year, at £206.17 for 32Gb.

[0] https://harddiskdirect.co.uk/ksm26ed8-16me-kingston-technolo...


I think most Zen CPUs support it, but only the PRO APUs. Just upgraded my NAS a while ago after upgrading my main PC and replaced the 2200G with my old 1800X and some ECC ram, but also had to install an old GPU because that one doesn't have an iGPU.


A few notes:

1. A raspberry pi was picked as a single point failure point. This will be a headache within 24 months.

2. Daily power/thermal cycles might age some things more than a steady thermal load of always being on. Not a big deal and optimizing power consumption is totally reasonable, but a trade to track.


> The goals of my NAS setup are:

> When hardware breaks, I can get replacements from the local PC store the same day.

That wouldn't be my M.O. I don't want to go to a local PC store. The ones remaining are all either expensive, sell shit, or do both. I'd order from Amazon or whoever is cheapest according to a local price comparison website, and receive a replacement next day. It also costs less time having to do shopping (heck I even order groceries online, saving time). Besides, with warranty, it'd take longer than same day replacement.

Now, if the guy was running mission critical data solely I'd say 'OK', but nope:

> For over 10 years now, I run two self-built NAS (Network Storage) devices which serve media (currently via Jellyfin) and run daily backups of all my PCs and servers.

As such, it seems a silly requirement for the average content someone serves with Jellyfin.

I use 4x 4 TB HDDs in ZFS RAIDZ2 (running Ubuntu under Proxmox) with one enterprise grade SSD for OS and ZFS cache. All that Jellyfin data doesn't have to reside on any SSD. Not in this setup, and in most setups for home users: neither. You also don't require HA for such, to the point where even RAID is kind of silly.

I use restic for daily backups to a NAS in the same city, a Synology with RAID1 btrfs. But I don't back up any media served via Jellyfin. Seems silly, and I don't have fiber upload speed as of yet. The internet has backups of that. Usenet servers, for example.

None of these servers have a UPS, but in case a power cut the enterprise SSD (a Samsung NVMe costing more than 2x as much as a consumer version of same amount of storage) ensures the data is consistent. Both systems use FDE.

I also have an offsite, offline backup of our most important data. It is encrypted at rest but not I don't regularly rebase which is kind of stupid. But that is on me.

Either way, this fellow wants to have some kind of silent setup (for reasons) and yeah then SSDs make sense. But if you want value for GB/EUR or akin, even with the extremely low SSD prices of recent and even with the higher energy prices, then HDDs are still worth it. Especially for data as content served via Jellyfin.

I'm pondering about upgrading to more than 1 gbit for LAN but for now I don't see it is worth it. Also given my server (MicroServer 10 Plus) only has two PCIe and one is used for PCIe to NVMe and other for iLO I don't have the bandwidth available, a disadvantage of my current setup. I suppose I could use a USB to 2.5 gbit converter if USB could saturate it but then I'd use USB for high data transfers. It appears to me that because my switch is managed, pair bonding would be a better approach. Which the Synology also supports.


> with one enterprise grade SSD for OS and ZFS cache. All that Jellyfin data doesn't have to reside on any SSD

lately I've been experimenting with just putting the OS on spinning rust too. It seems like it's just so ingrained in us to put it on flash because of the order-of-magnitude gains in boot time, but for servers, is bootup time really a factor (damn supermicro motherboards take 1 minute+ post...)? Once your services are loaded to ram, how much does the OS actually hit the drives? If your OS supports ZFS-on-root, then you have one less point of failure.


Nah, boot time is irrelevant. If the boot time does matter one would use other measures. Like failover, redundancy, or k8s (though I don't have experience w/that as of yet).

I've had various consumer grade (but branded) USB sticks running Raspberry Pi OS, EdgeOS and Proxmox fail on me to the point where I use an USB to NVMe or USB to SATA. Why? I've tried industrial grade aMLC/MLC/SLC flash and performance is worse, and cost is high. For example, for Proxmox or EdgeOS you need at least 4 GB storage. They've never failed me but SATA or NVMe have high MTBF. On Raspberry Pis I've opted for log2ram on consumer grade (but branded) microSD with great effect. Another option is not log at all, or use e.g. rsyslog.


If you're happy putting your OS on your data drives, sure.

Otherwise I can get a good quality SSD for $20. A hard drive for the OS is more money for worse performance.


You just said, the SSD is $20 more. So a HDD for the OS is less money, and maybe-worse performance.


$20 is more than nothing, if you're okay putting the OS on the data drives.

If you're not okay with that, then $20 is cheaper than a hard drive.

"A hard drive for the OS" is referring to the latter situation.


Every time I visit Switzerland, I notice how wealthy this country is indeed.


This guy’s apartment must be packed with servers, PC’s, network cables and the like.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: