Autoscaling is slow. If you're using AWS autoscaling group, decisions are based ...

Autoscaling is slow. If you're using AWS autoscaling group, decisions are based on several different metrics that are typically averaged over a period. If the instance pool size is increased, that fact gets picked up by yet another event loop that runs periodically, and actually starts instances. So there are multiple chained delays before the instance is actually launched. In practice, even if your instances have extremely fast start-up and can begin processing the queue quickly, the job in the queue could be waiting 4+ minutes to get picked up, in a scale-to-zero situation. You've also got things like cooldown periods to ensure that you are not flapping.

With k8s you have more control over knobs and switches, and you don't have an instance start-up delay, but the same type of metrics and event loops are used, particularly if you're using an external metric (eg SQS queue depth) in your calculations.

Some type of predictive and/or scheduled scaling can reduce delays at the expense of potentially higher cost.