Hacker News new | past | comments | ask | show | jobs | submit login

When I worked at AWS as a support engineer I was unfortunately dumped into the containers team (EKS, ECS, Fargate, etc) despite being a "greybeard" in background.

A customer wrote in trying to figure out why his Fargate application kept crashing. The app would hit 100% CPU usage and then eventually start failing health checks before getting bounced (rebooted)

I relayed this back to the customer who insisted the app shouldn't be spiking in CPU usage and wanted to know why. Of course being a Fargate workload there's minimal ways to attach debugging to it. You can't just spawn htop on Fargate!

Doing due diligence I fired off an email to the team that managed the infrastructure. They curtly replied

"it failed healthchecks and got bounced"

"Okay but why"

"It hit 100% CPU"

"Okay but why?"

"It failed healthchecks and got bounced"

At no point were they either willing to interrogate or even consider the lower layers of the stack. The very existence of everything below the containerized app was seemingly irrelevant to them.

After going back and forth about this for nearly a month and a half with the customer I asked my boss to add me to the "Linux" support Slack channel. Reasoning that there's got to be other greybeards in there who (frankly) knew what they were doing better than these kids.

After writing a multi-paragraph explanation of all my findings along with the customer, moments before I hit SEND I got an email

The customers app was not releasing threads properly and causing the system to reach thread exhaustion and begin context switching, a CPU intensive process that eventually would take so long that the health check probes would breach timeout and say the app was down and restart it

Saying that "Linux knowledge is unnecessary" is to put it bluntly, ignorant to the point of clownishness. Having a holistic understanding of how a system operates is invaluable




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: