People who decry graphical admin interfaces in favor of command line are missing...

tremon · on Oct 16, 2023

clickops is no way to run a server

server state should be reproducible from scratch

Why? I'm not necessarily disagreeing, but too often are these kinds of statements thrown about without any qualification, as if they are self-evident truths. But they're not -- there are engineering trade-offs behind any choice, and it's no different here. So, in order to guide this discussion away from dogmatic platitudes: why should server state be reproducible from scratch? What does "from scratch" mean? Why is clickops no way to run a server?

Install an OS, add software, apply configuration

Do you think this captures "server state" completely? Software patch levels are not part of server state? What about application data? User data?

So here's my counterstatement: for any working machine, I can reproduce the server state exactly by performing a restore from backup. Backup/restore is perfectly compatible with clickops, and it's faster and more reliable than reinstalling an OS, adding software and applying configuration -- even when the software and configuration are scripted. And if your server stores non-volatile data, as is often the case in clickops environments, you will need to have a backup system anyway to restore the user data after deploying a new server.

mbreese · on Oct 16, 2023

> too often are these kinds of statements thrown about without any qualification, as if they are self-evident truths

It's because different people think in different levels of abstraction. One admin might be thinking about a handful of servers and another an entire fleet of VMs. The way you manage each is very different. Clickops can work well for a small number of servers and a full orchestration setup can be over engineering.

But your real issue is that blanket statements never work in such scenarios. However, I think it's pretty well established that reproducible server state is a best-practice. How you get there is up to you.

But as an argument against backup/restore -- you can't use backup/restore to generate new servers from an existing template without some kind of extra scripting (if for no other reason to avoid address/naming conflicts). And if you're already scripting that...

fnordpiglet · on Oct 16, 2023

There are a lot of reasons we arrived here over the decades of struggling to keep servers in good working order in a sea of change. One is that backup and restore is inherently fragile, and we have many instances where restorability degrades for many reasons over a long life. Backup restore verification is not a regular part of hygiene because it’s intrusive, tedious, and slow. If ever done it’s usually done once. Reproducible builds allows for automated verification and testing offline.

Changes done are only captured at snapshot intervals and are no coherent and atomic, so you can easily miss changes that are crucial but capture destructive changes in between deltas. Worse are flaws that are introduced but not observed for a long time and are now hopelessly intermixed with other changes. Reproducible build systems allow you to use a revision control system to manage change and cherry pick changesets to resolve intermixed flaws, and even if they’re deeply intermixed you can resolve in an offline server until it’s healthy to rebuild your online server.

The issue with reproducible build systems isn’t they aren’t superior to backup and restore in every way. It’s the interfaces we provide today are overly complex compared to the simple interface of “backup and restore,” which despite its promised interface always works in the backup part but often fails in the restore. These ideas of hermetic server builds are relatively new and the tooling hasn’t matured.

I would say actually click ops is an ideal way to solve that issue. Click ops that serializes resiliently to a metadata store that drives the build and is revision controlled solves that usability issue. If the metadata store is text configs and can be modified directly without breaking the user interfaces would be necessary to deal with the tedium for complex changes in a UI, while providing a nice rendering of state for simple exploratory changes. Backup and restore would be only necessary for stateful changes, but since the stateful changes aren’t at the OS layer, you won’t end up with a bricked server.

belthesar · on Oct 16, 2023

This assumes that you're running in an environment where your servers are cattle and not pets, and in all fairness, not everyone is running large scale web platforms on some orchestration platform. I don't disagree that, even in a pets world one should know how to restore/rebuild a system, because without that, you don't have a sound BDR strategy.

marginalia_nu · on Oct 16, 2023

Arguably, about 80% of those running their app on a cattle farm should really have gone with a pet cafe instead. Resumes would certainly be a lot less impressive, but they'd also have a lot less fires to put out and a significantly smaller infra bill.

But regarding the topic at hand, I don't think being able to manage these things with a graphical interface is necessarily a bad thing. It's basically user-space iDRAC/IPMI.

xupybd · on Oct 17, 2023

I maintain 3 servers. It's not worth automating the deployment.

I'll spend less time just setting them up by hand.

The company will survive a few hours of downtime.

berkes · on Oct 16, 2023

Are there any tools that allow you to manage a server like a pet, yet ensure it can be restored/rebuild?

And, while with the analogy of pets, when you are on holiday, allow your neighbors to look after your pets?

JeremyNT · on Oct 16, 2023

There's no reason you can't use puppet/chef/ansible/whatever on pets!

The reason that (some) people don't do this is the cost/benefit analysis looks kind of weird. You'll spend a lot of time mucking around in puppet/chef/ansible/whatever for a single snowflake server, and it would be a lot faster to just go edit that config file directly.

In reality, proper backups and shell history can get you pretty far if you ever find you need to replicate a snowflake.

KAMSPioneer · on Oct 16, 2023

I have a homelab that is mostly pets (one or two servers that do a job, e.g. one DNS server, one VPN server), and I absolutely spend my time mucking about with Ansible to set them up. But it's awesome when I need to upgrade a server to a new OS version and I can just delete the entire VM and re-configure from scratch relatively fearlessly. Before my silly HaC (Homelab as Code) kick, it wasn't a huge deal to rebuild a server during an afternoon, reference docs and old notes, etc., but I prefer it this way.

Also Ansible is incredibly useful at my work and there's a very large overlap. Which is obviously the main motivation.

belthesar · on Oct 16, 2023

In my homelab, I use Portainer to manage my hosts. All of my workloads are installed as collections of Docker containers, and I'm slowly but surely migrating even single container installs to Compose stacks. With some real bare bones GitOps, those stack files can be in Git, and deploy to the host in Portainer, thus at least giving me the recipes to rebuild my environment should it ever be lost.

choilive · on Oct 16, 2023

I've also stumbled into the same paradigm - everything as compose files checked into git, deployed onto portainer. IMO pretty nice and low maintenance.

HankB99 · on Oct 16, 2023

> For a working machine, server state should be reproducible from scratch. Install an OS, add software, apply configuration, leave well alone.

I'm curious if you have a specific tool or tools in mind. I've been using Ansible in my home lab, particularly for configuring Raspberry Pis. The OS install part (only?) works because it involves a bitwise copy of the image to the boot media (and some optional configuration.)

jameshart · on Oct 16, 2023

Ansible is a good choice.

When I say ‘working server’ though, I typically mean one that is doing a job - providing a critical business service.

A ‘home lab’ of raspberry pis is a different beast.

jefurii · on Oct 16, 2023

I'd like to see a tool, maybe a Cocpit-like or a wrapper around SSH, that would build Ansible playbooks for you as you clicked around or typed commands.

tiffanyh · on Oct 18, 2023

> “For a working machine, server state should be reproducible from scratch. Install an OS, add software, apply configuration, leave well alone.”

I presume you only run NixOS then?

2OEH8eoCRo0 · on Oct 16, 2023

Both are good for different reasons. I prefer working in a terminal but I didn't think it was controversial that a GUI is better for visualization.