More

fduran · 2025-03-01T03:12:19 1740798739

(SadServers guy here) Good article and although I don't see the author here, thanks for the mention :-)

Curiositry · 2025-03-01T03:19:05 1740799145

Thank you :) I'm here just slow at typing.

fduran · 2024-12-27T02:35:35 1735266935

(SadServers guy here) happy to help!

fduran · 2024-12-23T04:16:43 1734927403

I've called that "Organization as Code" some years back :-)

fduran · 2024-08-21T15:44:51 1724255091

Hello, SadServers author here. Zero shame in adapting an idea you found somewhere; I'm super happy you built something useful.

I've thought of adding programming debug scenarios (I even got sadbugs.com lol), may implement in the future.

I'd love to see what you've done, please feel free to connect :-)

fduran · on May 23, 2024

So I've created ~300k ec2 instances with SadServers and my experience was that starting an ec2 VM from stopped took ~30 seconds and creating one from AMI took ~50 seconds.

Recently I decided to actually look at boot times since I store in the db when the servers are requested and when they become ready and it turns out for me it's really bi-modal; some take about 15-20s and many take about 80s, see graph https://x.com/sadservers_com/status/1782081065672118367

Pretty baffled by this (same region, same pretty much everything), any idea why?. Definitively going to try this trick in the article.

paranoidrobot · on May 24, 2024

My guess is probably related to AWS Spot capacity.

The second and third spikes at 80 and 140 seconds lines up nicely with this kind of behavior.

The second spike would be optimised workloads that can respond to spot interruption in under 60 seconds.

The third spike would be Spot workloads that are being force-terminated.

The reason it's falling on those bounds is because of whatever is trying to schedule your workload only re-checks for free capacity once a minute.

I used to be able to spin up spot instances and basically never get interruptions. They'd stay on for weeks/months.

In my experience, it used to be fairly safe to have Spot instances for most workloads. You'd almost never get Spot interruptions. Now, some regions and instance types are difficult to run Spot instances at all.

fduran · on May 24, 2024

Thanks, pot capacity being scheduled differently would explain the behavior.

Almost all my ec2 instances are spot, and actually I can compare the distribution with the on-demand ones.

My spot instances are very short lived (15-30 mins max) and AFAIK I've never seen a spot instance force-terminated (this would be hard to find I think).

paranoidrobot · on May 25, 2024

When I say "force-terminated" I mean when you don't voluntarily shut down in response to a SpotInterruption event.

When the event is sent, they give you two minutes to shut down.

If you either don't subscribe to the events, or don't shut down fast enough, they kill the instance.

fletchowns · on May 23, 2024

Perhaps in one case you are getting a slice of a machine that is already running, versus AWS powering up a machine that was offline and getting a slice of that one?

fduran · on May 23, 2024

Yes, some internal (AWS operation) explanation like the one you suggest makes sense.

fduran · on Feb 4, 2023

Oh hi, author here, AMA :-)

I'm happy that a lot of people use and find SadServers beneficial while not costing me a lot of money. Still a lot of features to implement on the website and a shrinking backlog of scenario ideas to materialize (if somebody has ideas for Linux/Docker/Kubernetes etc troubleshooting scenarios, please let me know).

jeroenhd · on Feb 4, 2023

I've just tried several of your challenges and they're all painfully accurate for real-world scenarios. I will definitely point people at these next time I'll get asked how I learned to fix <random Linux configuration problem>!

As for suggestions, here are some random things I needed to do recently:

- resize the boot partition of an OS (don't know how doable this is with your vserver setup, maybe use one of those WASM Linux emulators?)

- set up a systemd service/timer/socket that starts at the right time and responds correctly to reloads/restarts

- set up IPv6 correctly

- troubleshoot why a device wasn't connecting to the WiFi (DHCP service problem!)

- set up a VPN (wireguard/openvpn/etc). Expert mode: make the remote endpoint have an A/AAAA record that the server isn't listening on

- troubleshoot why some of my devices couldn't ssh into a server despite the pubkeys being in the authorized_keys folder (old sshd version didn't understand the most recent key algorithm!). Bonus problem: ~/.ssh had the wrong permissions so the authorized keys weren't loading.

- renew an ACME/letsencrypt certificate in nginx in proxy mode (location / was proxied but location /.well-known/... shouldn't have been!)

- check your preferred smtp daemon to see if it's set as an open relay

- upgrade postgres from an old version to a new version without data loss (hard mode: the partition postgres uses by default doesn't have the free space to make a copy and migrate the data)

- figure out why the firewall isn't blocking port 1234 despite UFW being enabled and a block-all rule being present (it was because of Docker iptables rules overriding UFW rules)

- update a package that has some kind of dependency issue (i.e. an external repository that is no longer needed)

- make Ubuntu shut up about Ubuntu Pro and stop it from fetching ads on ssh login

- alter a systemd service file so that it no longer runs as root (hard mode: set up dynamic users and other hardening features)

lazyant · on Feb 4, 2023

thanks! (some of these ideas are not possible to implement in the current setup but I can explore some of the other ones)

fduran · on Oct 26, 2022

yes good catch (I should forbid internet access to this end point), poor queue is waiting on VM up but there's no quota left until other VMs are garbaged-collected.

fduran · on Oct 26, 2022

A few people have suggested offering content offline as a Docker image etc, good idea, thanks.

fduran · on Oct 26, 2022

Oh that reminds me, I loved the original Stripe CTF, it's been 10 years already! https://twitter.com/fduran/status/240321390698442753

fduran · on Oct 26, 2022

Didn't know about this one. There's quite a few labs/sandbox SaaS but what I've seen so far is that they are more for training with a "follow the recipe" model (do this do that to configure something, rather than "this (real) server is broken, fix it (with possibly different solutions)" which imho is more real-life and useful.

hotpotamus · on Oct 26, 2022

I believe the company was founded by some coworkers of mine way back when at Rackspace who often interviewed Linux admins with a lab VM and I assume they just automated the setup and spun it off as their own business. At least that's what happened as far as I can tell; I didn't know the parties involved.