Hacker News new | past | comments | ask | show | jobs | submit login

There are some hilarious tidbits in here

> Starting in the first week of 2024, the FreeBSD boot process suddenly got about 3x slower. I started bisecting commits, and tracked it down to... a commit which increased the root disk size from 5 GB to 6 GB. Why? Well, I reached out to some of my friends at Amazon, and it turned out that the answer was somewhere between "magic" and "you really don't want to know"; but the important part for me was that increasing the root disk size to 8 GB restored performance to earlier levels.




The original object size limit for S3 was 5 GB, as noted in my 2006 blog post:

https://aws.amazon.com/blogs/aws/amazon_s3/

I do not know if this has anything to do with the cliff that you saw.


Pretty sure that's not related. For one thing I don't think EBS snapshots are stored in S3 as 5 GB segments.


This is why I keep reading the comments here.

Deep deep greybeard wisdom from the founding fathers of modern computing.


Now I really want to know though.


My understanding is that EBS has some heuristics for deciding whether to keep data cached; an AMI which has a cached snapshot as its root disk will boot much faster than an AMI where all the data needs to be pulled from S3.


Some huge customer chunked their data into 5GB pieces so now there's a "if size == 5GB" in the cache code.


Maybe, but I don't think that would explain 8 GB also being fast while 6 GB is slow?


Yeah, I found that pretty unintuitive when I read it. How did you find 8GB worked? Trial and error?


Customer started using 8GB chunks /s


What's the smallest size for which those heuristics keep the snapshot cached?

(I'm currently using 1GB snapshots, because my actual disk image is a tiny fraction of that size. But if bumping that to 2GB or 4GB would make it faster, that's a small price to pay.)


I believe 1 GB is also fast.


Thanks, that helps to hear!

Do you have any other wisdom regarding mysterious reasons for fast or slow booting? EC2's boot process is deeply opaque, and any insight at all is better than nothing.


Nothing comes to mind, but if you want to drop me an email I can walk you through some benchmarking.


At a guess, powers of 2 are fast?


5 is not a power of 2. ;-)


Gotta admit it's pretty close though.


Yeah, I am constantly curious about how the sausage that is cloud services like AWS is made. It seems generally slick on the surface, but what’s holding it all together? I imagine it as a tangled ball of tools like Puppet, Chef, etc. and custom glue.


A lot of AWS services are built on other AWS services. Like Lambda, SQS, and other such "core services" are used by others under the hood.


At Amazon scale mostly everything is custom

Less puppet/chef


Yeah, I would imagine they maybe started with off-the-shelf tools that were then gradually replaced as the system grew and matured.


Kind of the opposite, I think AWS was the first hyper scaler so tooling did not exist for many of these problems back then

Like they have their own custom clustering software where you would probably use k8s if you were to rebuild things today

Repeat this over a million different tools, etc

This article is interesting if you want to take a peek behind the curtain: https://www.allthingsdistributed.com/2014/11/apollo-amazon-d...


I wonder how long did it take to bisect such issue. Build image every time and reboot a vm?


I can't remember exactly but it was a few hours. I already knew which week the issue arose (from comparing weekly snapshots) so that gave me a head start.

But yes, I built a lot of AMIs. And launched new EC2 instances for each of them -- it wasn't just a matter of rebooting since the first time an AMI launches there's different behaviour (both from FreeBSD, e.g. growing the root disk, and from EC2, e.g. disk caching).


Thanks for the additional information, a few hours sounds great, I was expecting multiple days to narrow it down, given a lengthy feedback loop.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: