I've been using a 3 nuc (actually Ryzen devices) k3s on SuSE MicroOS https://mic...

imiric · on Oct 11, 2023

Thanks for your perspective.

How has your experience been with Longhorn? Performance, flexibility, issues, maintenance...? I'm interested in moving away from a traditional single-node NAS to a cluster of storage servers. Ceph/Rook seem daunting, and I'd prefer something easy to setup and maintain, that's performant, reliable and scales well. Discovering issues once you're fully invested in a storage solution is a nightmare I'd like to avoid. :)

sgarland · on Oct 11, 2023

Ceph is a nightmare if you don’t set it up exactly how the docs say - and in fairness, the docs are excellent.

My advice, having done Ceph/Rook, Longhorn, and now Ceph via Proxmox is the latter, assuming you have access to an actual host. Proxmox-managed Ceph is a dream, and exposing it to VMs and then K8s via RBD is easy.

Longhorn is fairly easy to set up, but its performance is terrible in comparison.

imiric · on Oct 11, 2023

Thanks for the insight. I've tried Proxmox before, and as much as I appreciate what it does, it's mostly a black box. I prefer a more hands-on approach, as long as the docs are comprehensible, and the solution is simple enough to manage without a special degree. Ceph always seemed daunting, and I've mostly ruled it out, which is why these k8s-native solutions are appealing.

Good to know about the performance. This is not critical for my use case, but I guess I'll need to test it out for myself, and see whether it's acceptable.

pas · on Oct 12, 2023

Ceph is ... well, it's an amazing journey. Especially if you were there when it started, and watched as each release made it more capable & crazier.

From the earily days of trying to remember WTF the CRUSH and "rados" acronyms actually meant and focusing on network and storage concerns and [0] ... architectural optimizations [1] ... a RedHat acquisition ... adjusting to the NVMe boom [2] ... and then Rook, probably one of the first k8s operators trying to somehow operate this underwatere beast in a sane manner.

If you are interested in it ... set it up manually once (get the binaries, write config files, generate keyrings, start the monitors, set up OSDs, RGW and you can use FileZilla or any S3 client). For some more productionish usage there's a great all-in-one docker image https://quay.io/repository/ceph/ceph?tab=tags .

[0] oh don't forget to put the intra-cluster chatter on an internal network, or use a virtual interface and traffic shaping to have enough bandwidth for replication and restore, and what to do if PGs are not pairing and OSDs are not coming up]

[1] raw block device support, called BlueStore, which basically creates a GPT partitioned device, with ~4 partitions, stores the object map in LevelDB - and then later RocksDB

[2] SeaStore, using the Seastar framework, SPDK and DPDK, optimize everything for chunky block I/O in userspace

sgarland · on Oct 12, 2023

> I prefer a more hands-on approach ... which is why these k8s-native solutions are appealing

"K8s-native" here implies Rook, which is in no way hands-on for Ceph.

> as long as ... the solution is simple enough to manage without a special degree

Ceph is not simple, that's my point. I assume you've read the docs [0] already; if not, please do so _in their entirety_ before you use it. There are so many ways to go wrong, from hardware (clock skew due to BIOS-level CPU power management, proper PLP on your drives...), to configuration (incorrect PG sizing, inadequate replication and/or EC settings...) and more.

I'm not trying to dissuade you from tackling this, I'm just saying it is in no way easy or simple. Statements like "k8s-native solutions" always make me cringe, because it usually means you want to use an abstraction without understanding the fundamentals.

To be clear, I have read the docs, set it up on my own, and decided I didn't want to try to manage it. I've ran a ZFS pool for a few years on Debian and shifted over to TrueNAS Scale last week; not because I was unable to deal with ZFS' complexity (knock on wood, the only [temporary] data loss I ever had was an incorrect `rm -rf`, and snapshots fixed that), but because of continual NFS issues. I may yet switch back; I don't know - I just no longer had the time to troubleshoot the data layer. Ceph makes ZFS looks like child's play in comparison.

[0]: https://docs.ceph.com/en/latest/

imiric · on Oct 12, 2023

By "k8s-native" I meant solutions that were built from the ground up with k8s in mind. Rook is built on Ceph, and as such requires knowledge and maintenance of both, which seems like much more difficult than managing something like Longhorn.

I think you misunderstood me. I'm not inclined to give Ceph/Rook a try because of its complexity, regardless of the state of its documentation. I was refering to Proxmox in my previous comment, in the sense that I don't need a VM/container manager/orchestrator with a pretty UI. If I'm already running k3s, I can manage the infrastructure via the CLI and IaaC, and removing that one layer of abstraction that Proxmox provides is a positive to me. Which is why adding just a storage backend like Longhorn to a k3s cluster seems like the path of least resistance for my use case.

Ultimately, I don't _want_ to deal with the low-level storage details. If I did, I'd probably be managing RAID, ZFS, or even Ceph. For my current NAS I just use a single JBOD server with SnapRAID and MergerFS. This works great for my use case, but I want to have pseudo-HA and better fault tolerance, and experiment with k8s/k3s in the process.

So I'm looking for a k8s-native tool that I can throw a bunch of nodes and disks at, easily configure it to serve a few volumes, and that it gives me somewhat performant, reliable, and hopefully maintenance-free, block or object-level access. Persistent storage has always been a headache in k8s land, but I'm hoping that nowadays such user friendly and capable solutions exist that will allow me to level up my homelab.

sgarland · on Oct 12, 2023

Ahhh, yes, I misunderstood - sorry!

Also, I feel obligated to ask “are you me?” at your stated homelab journey. I also went from mergerfs + SnapRAID, albeit shifting to ZFS. I also went with k3s, but opted for k3os which is now dead and thus leaving me needing to shift again (I’m moving to Talos). Finally, everything in my homelab is also in IaC, with VMs buiit by Packer + Ansible, and deployed with Terraform.

Happy to discuss this more at length if you have any questions; my email is in my profile.

MPSimmons · on Oct 11, 2023

I've run Rook/Ceph, and I run Longhorn right now. I wish I didn't, and I'm actively migrating to provider-managed PVs.

My advice for on-prem is to buy storage from a reliable provider with a decent history of hybrid flash/ssd, so that you can take advantage of storage tiering (unless you just want to go all flash, which is a thing if you have money).

If you must use some sort of in-cluster distributed storage solution, I would advise you to exclude members of your control plane from taking part, and I would also dedicate entirely separate drives and volumes for the storage distribution so that normal host workload doesn't impact latency and contention for the distributed storage.

imiric · on Oct 11, 2023

Good points, thanks. What makes you wish you didn't use Rook/Ceph/Longhorn?

In a professional setting, and depending on scale, I'd probably rely on a storage provider to manage this for me. But since this is for my homelab, I am interested in a DIY solution. As a learning experience, to be sure, but it should also be something that ideally won't cause maintenance headaches.

Keeping separate volumes makes sense. I can picture three tiers: SSDs outside of the distributed storage dedicated to the hosts themselves, SSDs part of distributed storage dedicated to the services running on k3s, and HDDs for the largest volume dedicated to long-term storage, i.e. the NAS part. Eventually I might start moving to SSDs for the NAS as well, but I have a bunch of HDDs currently that I want to reuse, and performance is not critical in this case.

MPSimmons · on Oct 11, 2023

>Good points, thanks. What makes you wish you didn't use Rook/Ceph/Longhorn?

It seems like my volumes are constantly falling into degraded and then rebuilding. Resizing volumes requires taking the workload that's attached down, and then it seems to take forever (15m+) for my clusters to figure out that the pod is gone and a new pod is trying to attach.

Really, it's a PITA and all of the providers' storage classes seem better than Longhorn. Ceph I had less experience with but very similar problems - long-gone pods held a lock on PVCs that had to be manually expunged, or wait for a very long timeout.

organsnyder · on Oct 12, 2023

I've had similar issues with Mayastor (another in-cluster storage solution). It's under heavy development, so I've assumed the more mature options were better.

I'm working on v2 of my homelab cluster, and I'm going with plain old NFS to a file server with a ZFS pool. Yes, I will have a single node as a point of failure, but with how much pain I've had so far I think I'll be coming out ahead in terms of uptime.

iamdbtoo · on Oct 11, 2023

I can't speak to performance because the workloads aren't really intense, but I run a small 3 node cluster using k3s and Longhorn and Longhorn has been really great.

It was easy to setup and has been reliably running with very minimal maintanence since.

arcastroe · on Oct 12, 2023

If you use longhorn, make sure to enable the network policies when installing the helm chart. For some odd reason, these are disabled by default, which means ANY pod running on your cluster has full access to the longhorn manager, API, and all your volumes

https://github.com/longhorn/charts/blob/v1.5.x/charts/longho...

intangible · on Oct 11, 2023

I wouldn't really treat it as a replacement for a NAS, mostly only for the container workloads running on kubernetes itself... Ideally, any apps you develop should use something more sane like object storage (Minio etc) for their data.

I push it pretty minimally right now, so no great performance testing myself, and I do run it in synchronous mode, so that means its write performance is likely going to be limited to the 1gbps network it syncs over.

organsnyder · on Oct 11, 2023

Funny: I've been running a Talos cluster for the past six months, and just today decided to look into k3s. Talos has a lot of really nice things, but I have found that the lack of shell access can be frustrating at times when trying to troubleshoot.

davkan · on Oct 12, 2023

Curious, what have you run into that you couldn’t troubleshoot with a privileged pod with host networking?

organsnyder · on Oct 12, 2023

I had a situation where the etcd cluster got hosed, making it basically impossible (at least with the ways I know) to interact with the k8s API at all. So I didn't have any way to get a privileged pod running.

davkan · on Oct 13, 2023

Ah gotcha. Haven’t had to deal with that yet. Maybe possible to add a static pod via the machine config? But yeah it’s basically throwing out a bunch of linux admin muscle memory.

osigurdson · on Oct 11, 2023

Kudos to you. I feel like setting things up on real hardware is somehow needed in order to make things concrete enough to full understand. At least for me (I fully admit this may be a personal flaw) working with a vm on in the cloud is a little too abstract - even though eventually this is where things will land.

c0wb0yc0d3r · on Oct 11, 2023

How did you go about deploying k3s to MicroOS? Did you go the route of installing it via combustion and a systemd unit?[0]

To me it seems strange that a systemd unit is used, but I didn't know if I was missing something about the way MicroOS worked.

[0]: https://en.opensuse.org/SDB:K3s_cluster_deployment_on_MicroO...

sgarland · on Oct 11, 2023

Re: Talos persistent storage, why not run it as a VM and pass in block devices from the hypervisor? You also then gain the benefit of templated VMs that you can easily recreate or scale as needed.