Show HN: SadServers – Test your Linux troubleshooting skills

apawloski · on Oct 26, 2022

Based on your architecture diagram it looks like you're spinning up an instance per-user? As you're probably finding now, you will hit AWS limits quickly.

You might instead want to have a smaller pool of (larger) servers that you run co-resident VMs on with https://firecracker-microvm.github.io/. That will avoid account limits and also keep your AWS costs more predictable.

ilyt · on Oct 26, 2022

That's kinda nice use case for the WASM machine/linux emulators, then you just need to provide image and user can run it in the browser

> You might instead want to have a smaller pool of (larger) servers that you run co-resident VMs on with https://firecracker-microvm.github.io/. That will avoid account limits and also keep your AWS costs more predictable.

I'd imagine (still waiting for it to load lmao) most of it could be containers too.

twalla · on Oct 26, 2022

Someone else linked https://github.com/copy/v86 which seems really neat.

I like making jokes with coworkers about implementing this or that bit of infra with WASM-based tools mostly to get a rise out of them but each time I make the joke I look into some of the tools or projects and the balance of joke to "I'm actually serious" shifts a little bit to the right.

encryptluks2 · on Oct 26, 2022

So then users experience will be poor due to the slowness and non-standard implementation. A better solution IMO would be to provide a container with SSH access.

yamtaddle · on Oct 26, 2022

Just run them in Linux VMs with WASM, on the users' browsers. Make them all pay for it with higher utility bills and greater wear & tear on their hardware.

trollface.jpg

freeone3000 · on Oct 26, 2022

This is actually a good idea for this -- the user wants the education, they can pay for it with their own hardware. Keep your costs low!

cogman10 · on Oct 26, 2022

Probably a better experience for everyone. You just have to distribute the image (rather than running vms) and the user gets instantaneous responses.

georgyo · on Oct 26, 2022

If it is hosted on AWS the bandwidth of distributing the images is likely more than the cost of the compute.

jacooper · on Oct 26, 2022

Cloudflare exists

hunter2_ · on Oct 27, 2022

I thought Cloudflare only ensures high usage of the free tier for "web"-ish responses, which doesn't even include .txt files. But I suppose this use case is several orders of magnitude away from that of EasyList, at least in request rate.

jacooper · on Oct 27, 2022

I mean you could just pay and use R2 directly. I think it would still be much cheaper.

BossingAround · on Oct 26, 2022

Why not spin up containers instead of VMs? Seems to me containers would fit much better than VMs.

paulfurtado · on Oct 26, 2022

If the goal of the test is to debug a sad linux server, containers are going to severely limit what ways the server can be sad in, isn't it?

BossingAround · on Oct 27, 2022

Can you give me an example of some of the severe limitations you're mentioning?

mmh0000 · on Oct 27, 2022

I can give you a bunch of things that can't be simulated in a container:

* Boot problems, such as: GRUB config/install errors, kernel parameters, init startup errors, blocking processes

* Many network scenarios, such as: PXE issues, multipath, load-balacing, anything requiring configuring network interface settings, firewall configuration.

* Resetting an unknown root password

* Booting directly to bash

* Filesystem mounts through fstab or systemd mounts

There's probably more I could think of, but I think that's a good list.

nijave · on Oct 27, 2022

I don't think the DNS exercise would behave the same although that probably depends on how the container was setup. Docker usually controls /etc/resolv.conf. Another exercise is "try to figure out if you're in a container or VM so that'd definitely be different"

BossingAround · on Oct 27, 2022

The question is not if the exercises would behave identically, but if you can test the objective in a container. For example, you can totally test, screw up, and fix DNS in a container. I would think that "try to figure out if you're in a container or VM" would be exactly the same as it is right now.

spiffytech · on Oct 26, 2022

Containers have a history of escape vulnerabilities, for reasons like sharing a kernel with the host and other containers.

VMs are designed from the ground up to isolate guests, rather than focusing on application deployment.

Firecracker is the modern container alternative in untrusted compute scenarios, with Fly.io even converting container images into Firecracker VMs.

NovemberWhiskey · on Oct 26, 2022

>Containers have a history of escape vulnerabilities

Generally agreed, but for this use-case do we care?

nneonneo · on Oct 27, 2022

I haven’t gotten any of the challenges to load, but if you’re going to simulate a sysadmin it would make sense to give you high privileges (or even root) on the box. The more privileged you are inside a container, the more attack surface you expose.

BossingAround · on Oct 27, 2022

Which is why you create a "dummy" host VM that hosts containers. Nobody's saying "host containers on your prod webserver." On the other hand, spinning up a VM for every user seems insane to me.

encryptluks2 · on Oct 26, 2022

User mapping is now a standard feature in Kubernetes, so escape vulnerabilities aren't so much an issue anymore. Additionally, you can use gVisor.

paulfurtado · on Oct 26, 2022

User namespaces have resulted in multiple new container breakout CVEs in the last year. Some guides actually recommend disabling user namespaces because they are still somewhat new and perilous.

cyphar · on Oct 27, 2022

You're talking about creating new user namespaces inside a container, not running a container in a user namespace. Running a container in a user namespace is strictly a security improvement over running it in the host user namespace.

Also, all container runtimes automatically block unshare(CLONE_NEWUSER) with seccomp already (unless they've disabled seccomp, which I'm not sure if Kubernetes still does).

encryptluks2 · on Oct 27, 2022

What are the ones in the last year? They provide security benefits as well. I mean, you could say the Linux kernel is also dangerous and the Windows kernel and pretty much anything that has ever had a CVE. You can also limit it to specific users too if that is a major concern.

cogman10 · on Oct 26, 2022

Bypassing container security is easier than bypassing VM security.

tamrix · on Oct 26, 2022

Then wouldn't that be the ultimate test ;)

temp0826 · on Oct 26, 2022

I haven't fully grokked this yet, but one trick I've used in the past to get around limits is AWS Organizations, creating a sub-account per property. A bit more setup but can keep things cleaner administratively.

icedchai · on Oct 26, 2022

AWS will raise limits if you ask. Increasing EC2 instance limits is usually a quick turn around.

ericbarrett · on Oct 26, 2022

Yes, the default limits are there to prevent abuse and runaway misconfigurations. They won't turn down revenue if you confirm it's intentional.

andrewstuart2 · on Oct 26, 2022

At least for the tests I've done on a small startup recently, they've also implemented some automatic quota increases for EC2. I ran commands that would have (or did) eclipsed my quota, and got an email that my quotas were bumped a few minutes later.

fduran · on Oct 26, 2022

Yes thanks!

dugmartin · on Oct 26, 2022

I'd suggest integrating https://bellard.org/jslinux/ and running the VM in the browser if you can - then you can scale without running out of resources.

fduran · on Oct 26, 2022

Thanks, I've been looking at WASM, for ex https://github.com/snaplet/postgres-wasm/tree/main/packages/... , it would certainly simplify everything to "download a fat file".

jodrellblank · on Oct 26, 2022

Have you seen https://copy.sh/v86/ ? It doesn't run as fast as jslinux but is BSD Licensed, on Github, and supports resuming the VM from a snapshot.

https://github.com/copy/v86

fduran · on Oct 26, 2022

Didn't know about this, thanks!

m00dy · on Oct 26, 2022

or linux kernel port on webassembly.

grepLeigh · on Oct 26, 2022

Very cool! This reminds me of the ops challenge @ Slack. I'm not sure if they still do this, but the SRE/platform infra interview used to involve a VM running a malfunctioning LAMP stack.

You'd get SSH access to the VM, then submit a diagnostic report of what was broken (and how you fixed it).

Reminded me of how Red Hat used to run their certification test (RHCE). I probably still have the live CDs for my RHCE laying around somewhere.

stevekemp · on Oct 26, 2022

I've had interviews like that in the past, and really enjoyed them. Much better than "Draw an architecture diagram for how you'd handle a serverless IoT application" - where you lose points, silenly, because you didn't pick something the interviewer expected you to do.

Usually a simple combination of immutable files, SELinux policies, and types in configuration files were enough for most of the challenges. Though now and again you'd find they'd given you a server with packages removed, or not yet installed.

fduran · on Oct 26, 2022

Oh that reminds me, I loved the original Stripe CTF, it's been 10 years already! https://twitter.com/fduran/status/240321390698442753

jer0me · on Oct 26, 2022

New challenge: Fix SadServers’ sad servers

vetkat · on Oct 27, 2022

And while we’re at it, we might as well write a wrapper around low-upvote Server Fault questions in the hope that they attract more attention when the problem is gamified.

vermon · on Oct 26, 2022

Seems like it's out of capacity:

    An error occurred (VcpuLimitExceeded) when calling the RunInstances operation: You have requested more vCPU capacity than your current vCPU limit of 64 allows for the instance bucket that the specified instance type belongs to. Please visit http://aws.amazon.com/contact-us/ec2-request to request an adjustment to this limit.

Maybe something like https://leaningtech.com/webvm-server-less-x86-virtual-machin... would be cheaper and more reliable for this kind of thing?

fduran · on Oct 26, 2022

Yes, HN effect lol-sob.

Mitigation: reducing servers life time temporarily so more people can try.

warent · on Oct 26, 2022

Usually I roll my eyes when someone posts their own website to HN and it crashes under load. But given the nature and complexity of yours I think there's room for understanding and patience :)

fduran · on Oct 26, 2022

Thanks, I did some stress-testing and infra is scalable enough but I forgot about the AWS quotas, my bad. Quota increase requested and servers are killed off so hopefully "soon" the issue will go away.

Nextgrid · on Oct 26, 2022

Scaling this service without breaking the bank could become its own "sad server" scenario.

I'd start by moving the test VMs to bare-metal servers running libvirt. You can get a 128GB RAM server for ~110 EUR and that should be able to run around 120 concurrent VMs assuming 1GB of RAM to each (CPU isn't a major issue in this case).

sylvainkalache · on Oct 27, 2022

Very cool project!

I was the founder of a school training software engineers, we had an infrastructure track that got a lot of our students to land SRE positions. When asking employers for feedback about our grads, one feedback kept coming: they lack experience when it comes to troubleshooting.

So I went on a quest to simulate that infra debugging while in an academic context.

I came up with the idea of giving students broken servers. I used Docker container and would setup a simple workload and mess it up with classic issues.

Needless to say students generally did not like it :) debugging isn’t fun. But it did help a lot.

BossingAround · on Oct 26, 2022

I'd love to get the actual VM content offline, packaged as Vagrantfiles or Containerfiles. Love the idea though! Go to Pluralsight and pitch it to them :)

fduran · on Oct 26, 2022

A few people have suggested offering content offline as a Docker image etc, good idea, thanks.

Timja · on Oct 26, 2022

The idea is really cool, but all I see is "Waiting for server..." and nothing happens.

kiyundai · on Oct 26, 2022

That's the trick you failed the first challenge : "Did you try to turn it off and on again?"

computershit · on Oct 26, 2022

I love this idea, I'll definitely try it out when provisioning for scenario machines is up again. Nice work.

PanosJee · on Oct 26, 2022

Hack The Box -> Fix The Box

lagrange77 · on Oct 26, 2022

Really cool idea.

After choosing a problem, the endpoint you poll at https://sadservers.com/celery-progress/xxxx repeatedly returns {pending: true, current: 0, total: 100, percent: 0} for me.

fduran · on Oct 26, 2022

yes good catch (I should forbid internet access to this end point), poor queue is waiting on VM up but there's no quota left until other VMs are garbaged-collected.

bm-rf · on Oct 26, 2022

I'm assuming you're spinning up an EC2 instance for each lab. What do you think about using pre-built docker images for each challenge instead? that way they can spin up in just a couple of seconds. Might also be cheaper?

fduran · on Oct 26, 2022

I wanted to do full VMs rather than Docker images but yes I could do Docker images or dedicated big instances with VMs on top like somebody else is suggesting.

bravetraveler · on Oct 26, 2022

Not a bad idea but something to consider; this limits the options for kernel level things quite considerably

clvx · on Oct 26, 2022

probably lxd would be better.

mewse-hn · on Oct 26, 2022

Completed the first challenge and it was a lot of fun - spoiler I've never had to use the 'lsof' command before.

nobody9999 · on Oct 27, 2022

>Completed the first challenge and it was a lot of fun - spoiler I've never had to use the 'lsof' command before.

I've been waiting a while for the "sad server" to come up for me and read the scenario (saint john) whilst waiting.

lsof was the first thing that came to mind after reading the scenario.

I guess that once I actually get a "sad server" I'll make it "happy" quickly :)

yapril · on Oct 26, 2022

Can I download the images so I could run it on my own machine ? I'd really appreciate, I've got an interview very soon :)

bravetraveler · on Oct 26, 2022

Commenting to give this a try later, I've routinely been the person to get these kinds of gremlins escalated

I've long wanted for some sort of mock, "things are broken - I want to see how you think" approach for sysad

shagie · on Oct 26, 2022

In the "tricks of hacker news" -

     188 points by fduran 3 hours ago | unvote | flag | hide | past | favorite | 68 comments

If you click 'favorite' it will save it to your favorites list. This is a publicly visible list - yours is https://news.ycombinator.com/favorites?id=bravetraveler and mine is https://news.ycombinator.com/favorites?id=shagie which makes it easy to get a bookmark type style functionality within HN.

As I tend to favorite less often than I comment, it makes it easier to find those things I want to find again.

bravetraveler · on Oct 26, 2022

Much appreciated! I'm woeful about using not using features like this, it's a character fault at this point.

The HN interface too tends to just have my eyes filter out those links... but that's no defense.

Especially good to know that it's publicly viewable!

Not that I'm particularly worried of being outed by anything I favorite here, it's just good to be mindful of the data we make and where it goes.

yubiox · on Oct 26, 2022

Can't get to the first problem because of HN hug but anyway there are fake ways to "solve" it like renaming the logfile (what they test for solved is provided).

BossingAround · on Oct 26, 2022

This is a self-test, not a certification. The goal is not to defeat the verification goal, but to learn something. So yeah, it's perfectly acceptable that the tests are not bullet-proof.

Timja · on Oct 26, 2022

Depends on how the broken program writes to the log.

If it does

    while true; do echo hello >> bad.log; done

Then renaming bad.log will not solve the challenge.

teddyh · on Oct 26, 2022

Replace it with a symlink to /dev/null! Or /dev/full if we feel like it.

(Yes, these are bad solutions, since the instructions explicitly said to stop the process which is writing.)

loa_in_ · on Oct 27, 2022

It will still keep writing to an open inode

teddyh · on Oct 27, 2022

No, the “while” loop I was commenting would not.

fduran · on Oct 26, 2022

There are ways to cheat but not so simple; there's a script that checks for the solution and a hash of the script is checked for modifications.

hotpotamus · on Oct 26, 2022

Are you familiar with Trueability? https://www.trueability.com/

It seems like this is a similar SaaS.

fduran · on Oct 26, 2022

Didn't know about this one. There's quite a few labs/sandbox SaaS but what I've seen so far is that they are more for training with a "follow the recipe" model (do this do that to configure something, rather than "this (real) server is broken, fix it (with possibly different solutions)" which imho is more real-life and useful.

hotpotamus · on Oct 26, 2022

I believe the company was founded by some coworkers of mine way back when at Rackspace who often interviewed Linux admins with a lab VM and I assume they just automated the setup and spun it off as their own business. At least that's what happened as far as I can tell; I didn't know the parties involved.

fzyzcjy · on Oct 28, 2022

This looks interesting! But it keeps loading forever saying "Your server is being created" (hit VM limit again?)

10g1k · on Oct 26, 2022

"Have you turned it off and on again?"

N3Xxus_6 · on Oct 26, 2022

Well this sucks I wanted to try it lol. It's timing out for me or throws an error.

arwt · on Oct 27, 2022

Interesting idea! Looking forward to trying this once some VMs are available. :-)

ASalazarMX · on Oct 28, 2022

I only want to say that I love the name SadServers. Strongly memorable.

diffcheck · on Oct 27, 2022

The tasks loading infinitely, is it a zero challenge?

b20000 · on Oct 26, 2022

did you read up on the problems with leetcode?

fduran · on Oct 26, 2022

Hi, not sure what the question means, I came up with the scenarios not copying from leetcode if that's what you mean.

pxc · on Oct 26, 2022

I think they mean 'are you aware of the limitations of Leetcode-like tests and the downsides of their (over)use in hiring processes?'

(FWIW I think this is a very cool and fun educational project regardless of what usefulness it might or might not have in IT hiring decisions, and I'm looking forward to playing with it)

imwillofficial · on Oct 26, 2022

This is badass, just what I need!

DeathArrow · on Oct 26, 2022

>Practice for your next SRE/DevOps interview.

Are SREs and DevOps tasked with administration of operating systems?

KaiserPro · on Oct 26, 2022

> Are SREs and DevOps tasked with administration of operating systems?

yes, eventually.

you can dress it up in all the fancy terms that you like. but devops and SREs are sysadmins with better PR.

its critical that SREs understand _how_ to debug a system, so that they can work out how to put in fixes, and or design better systems.

asmr · on Oct 26, 2022

Both SRE and DevOps are essentially evolved sysadmin roles. The DevOps philosophy is cross-functional and many sysadmins have adopted a DevOps approach. The latest edition of the classic sysadmin book "The Practice of System and Network Administration" is now centered around DevOps.

dsr_ · on Oct 26, 2022

If you have ops somewhere in your responsibilities, then yes.

jabroni_salad · on Oct 26, 2022

depends on what layer the issue is happening at. I know everyone thinks the OS has been abstracted away but my ticket queue says otherwise. "yaml engineering" is just a control surface, I still need to pop the hood often.

jen_h · on Oct 26, 2022

Yeah. Random data point: One of my most favorite SRE interviews ever (serious fun!) involved hands-on troubleshooting that eventually required gdb.

BossingAround · on Oct 26, 2022

How do you automate something you can't do manually?

andrewmcwatters · on Oct 26, 2022

My only feedback is that this is unrealistic because today developers wouldn’t try to debug something, they’d just destroy the instance, push a commit and hope it fixed something infra related then recreate it.

Why would you need to understand how something works? Just use containers. /s

vsareto · on Oct 26, 2022

Developers just need to understand everything because we need developers to do everything and meet all deadlines. We wouldn't dare consider a support role that could troubleshoot it because then there would be no point to having developers that can do everything! /s

cube00 · on Oct 26, 2022

Support doesn't deliver features, we need new features! /s

grepLeigh · on Oct 26, 2022

If most developers can't debug a VM, then anyone who can will be able to charge a premium. If you have a proficiency in ops, remember that the next time you negotiate a compensation package.

[Edited my compensation numbers to avoid down votes - yikes]

andrewmcwatters · on Oct 26, 2022

I feel like you definitely have to target particular companies and more specifically specific titles and skills to offer to do so.

My guess is trying to sell high end services as a "principal software engineer" isn't going to be enough to justify that cash comp to a lot of people hiring.

grepLeigh · on Oct 26, 2022

I wouldn't think of it as trying to sell yourself as a "principal software engineer" on an open market.

I'd make a list of the companies where hiring/scaling the ops team will make or break the business's value delivery, and filter by companies aware of this.

You can knock this out at the recruiting step, just by asking about open developer headcount vs. open SRE ops headcount. Ask which direction that ratio seems to be going, and if there's anyone you can talk to whose job it is to change that ratio (director or VP mandate).

The referral network from working at a hyperscaler co in ops is a great way to break into the space.

andrewmcwatters · on Oct 26, 2022

Thanks for the heads up!

sshd · on Oct 26, 2022

This is so sad but so true!

edmcnulty101 · on Oct 26, 2022

If its dumb and it works it's not dumb.

deeblering4 · on Oct 26, 2022

> It's also my not-so-secret hope that a sophisticated enough version of SadServers could be used by tech companies (or for companies that carry on job interviews on their behalf) to automate or facilitate the Linux troubleshooting interview section.

Yup, that's what I was afraid of.

lbotos · on Oct 26, 2022

Why are you afraid of this? My org has run a hands-on technical exam with a stack of linux admin basics (I won't enumerate them here because people do their research) but they are based on real problems we've had and the feedback is overwhelmingly "this was one of the best technical interviews I've ever had."

We ask the engineer who is proctoring the interview to think about the following question: Would you want to pair with that engineer again?

If that answer is no, then we probably won't go further because pairing with engineers to troubleshoot is what we do every day.

Some great resumes have died with not knowing how to see what's running on port 80.

deathanatos · on Oct 26, 2022

Yeah, we did this at a previous employer.

One example, is we had them ssh, download & extract a tarball (the Linux source, but the content doesn't matter). Sometimes, they'd gunzip to stdout. The reaction tells you a lot "lol whoopsie" followed by a quick fix: person knows what they're doing. "uh… what is going on? did I break it?" followed with general cluelessness… maybe not.

That did occasionally break tmux, though.

Part of it was "what are the specs of this thing you're SSH'd into?" and we had one candidate who was adamant the numbers must be wrong: 2 GiB is too little RAM, no machine is that small! Yeah we didn't spin up 128 GiB VM for your interview…

Volundr · on Oct 26, 2022

I never cease to be amazed at how few people really realize just how little hardware is often required for getting real work done. You'd be surprised just how much that 2GB vm with a couple cores can handle!

sorongopowa · on Oct 26, 2022

I started with a single 1xx MHz core and 16MB of RAM. And I'm sure some with even less, lol.

Supporting your point: Hardware is awesome if you use it wisely.

icedchai · on Oct 26, 2022

My first Linux box was a 20mhz 386SX laptop with 3 megs of RAM (1 meg on the motherboard, 2 in an expansion.) I could barely run Linux 0.99.x. The distro was SLS, and it came on 12 or so floppy disks. I quickly upgraded to a 486 with 8 megs RAM, then 20... which seemed incredible at the time (1994-ish.)

It's amazing how bloated today's software is...

joenot443 · on Oct 26, 2022

If you give the person you're interviewing access to the same tools they'd have in a regular day on the job (Google, manpages, etc.), I'd say that's a fair and probably relatively enjoyable interview.

Rejecting someone because they can't recall the correct netstat syntax doesn't seem like good hiring practice, but I assume in good faith that's not what you meant :)

yamtaddle · on Oct 26, 2022

Yeah, I google, tealdear, "--help", and manpage anything I don't use at least once a week, every time. Usually I don't remember them otherwise, and if I think I do, I don't trust my memory that well. Only exception is if I remember enough to be able to ctrl+r them out of shell history faster than I can do those things—and actually, for some of those, I do use them often, but couldn't possibly tell you how because I only run a couple commands 99% of the time and always pull them out of history unless it's one of the rare exceptional cases—I couldn't rsync for a particular outcome without consulting a reference, to save my life, even though I use it often.

And usually you only use a fairly small set of tools that often, in any job, and which set will depend on the employer, how things are set up, and what exactly you're doing.

Oh and somehow I get "-r" versus "-R" for "recursive" wrong almost every time, even for commands I type almost daily, unless I check first. It's weird. If tools could get on the same damn page about which means "recursive", that'd be great.

TL;DR I do have a pretty good idea what I'm doing, but look like an absolute idiot if anyone watches me do it. Much worse, even, if I know they're watching and we're not in some kind of relatively high-trust relationship (so, definitely not in an interview setting).

lbotos · on Oct 26, 2022

Exactly, all man pages and google is fair. We want to see how they think not rote memorization.

Multicomp · on Oct 26, 2022

I love this point. Joke: are are you hiring?

I'm quite happy to try to demonstrate how I think, but I hate hate hate leet code because A) it's not relevant to showing how one thinks and B) I've read so much dunking on it on HN that I'm now stopping interviews when they pull out the hackerrank or live code to say 'without using the library, reverse this linked list'.

joenot443 · on Oct 27, 2022

That sounds awesome! Wish I got the chance to do more hands-on interviews in the mobile dev space, most of my interviews just end up being run of the mill leetcoding.

mathverse · on Oct 26, 2022

People in higher up positions like yourself will rarely be subjected to testing with tools like this. You are basically trying to remove the human from equation and industrialize the whole process.

rednerrus · on Oct 26, 2022

What we're trying to do is respect peoples' time. We can get more about someone's technical understanding in 30 minutes of hands on exercises than we can in a full day of panel interviews. It's better for us as we have a much better understanding of where you're at Linux wise and it's better for you because you only need to come to two hours of interviews, total. Seems like a win win to me.

mike_d · on Oct 26, 2022

In my experience this type of interview (and coding interviews in general) usually fall into one of two categories: 1) "I learned this neat trick and want to show candidates how smart I am" or 2) "I have this bug in prod and I want to see if you can fix it for me."

If the interview was along the lines of upgrading the packages on the system, debugging why nginx was crashing, figuring out the specs of the system, etc. that is totally fine with me and I believe respectful of a candidates time. Unfortunately it always turns into something else when people need to come up with new "challenges" for canidates.

deeblering4 · on Oct 26, 2022

Framing a question like “a system has a high load average, what commands would you use to begin diagnosing that?” and taking that conversation as deep as the candidate can go is neither time consuming nor requires a panel of people.

deathanatos · on Oct 26, 2022

No, I'm trying to make sure the person who is interviewing for a job where they will deal with computers on a daily basis appears to have seen a computer at some prior point in their life.

I wouldn't feel the need to do this if so many candidates didn't fail rudimentary tests. A SWE candidate MUST be able to write the function min(), in the language and tooling of their choice. But in an interview, a sizable fraction cannot. (The actual bar is far higher than min(), ofc., but min() ought to be trivial.)

deeblering4 · on Oct 26, 2022

> Why are you afraid of this?

> My org has run a hands-on technical exam with a stack of linux admin basics ... they are based on real problems we've had and the feedback is overwhelmingly "this was one of the best technical interviews I've ever had."

You essentially answered your own question.

Putting thought into the interview process and working with candidates through real problems is valuable. I cannot say the same for outsourcing or "automating" this portion of an interview using 3rd party SaaS.

rednerrus · on Oct 26, 2022

We do this in our org as well. 30 minutes of troubleshooting linux issues is a good way to evaluate a candidates experience. We run it as a team exercise with the candidate so that we also get the added bonus of how do they work in a team setting, how do they communicate, etc.

Nextgrid · on Oct 26, 2022

Is it bad though? The problem with Leetcode is that it's an extremely unrealistic test. This on the other hand seems like it actually tests real-world scenarios, and you can get there without grinding. I'm pretty sure I can pass all the tests they've currently got despite having no formal sysadmin experience, just using common developer knowledge, common sense and strategic Google-fu.

technofiend · on Oct 26, 2022

The Redhat Certified System Admin, Redhat Certified System Engineer and similar tests require practical, general hands-on skills to solve broken systems. The performance tuning and troubleshooting exams go into more detail and more complex scenarios. No internet access, but resources are available if you understand how to use them. Would never suggest people should solely hire on those certs, but if someone takes the time to complete 7 hands on tests for the certified architect certification, it's a strong indicator they have skills.

Even so, test taking can be stressful but it's arguably less stressful than actual production support with people waiting on the result. Whether people really want to put candidates in a stressful situation is up to them. Sadserver seems like it's somewhere in the middle vs some of the things I've seen. One job interview put me in a room with a boot cd, and an ancient computer with a cdrom so slow you got exactly one chance to boot the media and recover the system in the time limit. But the job was for a trading company, so if you couldn't handle that they didn't want you. It was a fun exercise but would I do that to someone else? Probably not.

pvg · on Oct 26, 2022

Please don't post shallow dismissals, especially of other people's work.

[...] Please don't pick the most provocative thing in an article or post to complain about in the thread.

https://news.ycombinator.com/newsguidelines.html

Sebguer · on Oct 27, 2022

Already exists. I can't remember the name, but the infra company that I used to work for used one of these as part of their interview loop.

fduran · on Oct 26, 2022

That doesn't mean that I'd charge individual users :-)

Heck, I'm not even asking for an email (and I had to do extra session management coding for that).

KaiserPro · on Oct 26, 2022

but why? a real test that is repeatable, realistic and not _overly_ hard. Sure for a junior software its a bad fit. but for a devop/sre/sysadmin, its a great fit.

its certainly better than some crappy whiteboarding session, or worse a take home test.

aliqot · on Oct 26, 2022

I knew this is where it was headed :/

Pr0ject217 · on Oct 26, 2022

Cool!