>It has automatic scaling, so it can scale up and down with traffic (all of which is configured in ECS)
Doesn't scaling take time, though? Doesn't downloading a new docker container definition and starting it take at least as long as initializing a new lambda function?
Also with lambda there's no configuring to do for scaling. If anything lambda gives you tools to limit the concurrency.
Thanks for pointing that out. I should have clarified because I agree that "Automatic" is a relative term.
Lambda is entirely automatic like you point out. You literally don't need to think about it. You upload your function and it scales to meet demand (within limits).
ECS however still requires configuration, but it is extremely simple to do. They actually call it "Service Auto-Scaling". Within there you choose a scaling strategy and set a few parameters. That is it. After that, it really is "automatic".
Most of the time you will be selecting the "Target Tracking" strategy. Then you select a Cloudwatch metric and it will deploy and terminate Fargate instances (called "tasks" in the docs) to stay within your specified range. So a good example would be selecting a CPUUsage metric and keeping the average CPUUsage between 40-70%. If the average CPU usage starts to get above 70% across your tasks (Fargate instances), then ECS will deploy more automatically. If it falls below 40% then it will terminate them until you get within your desired range. You get all this magic from a simple configuration in ECS. So that's what I mean by automatic. Its pretty easy. Depending on what you are doing, it can set scaling to any other metric. It could be bandwidth, users, memory usage, etc. Some of these (like memory) require you to configure a custom metric, but again it isn't bad.
You can also scale according to other strategies like scheduled. So if get lots of traffic during business hours you can scale up during business hours and scale down during the night. Again, just set your schedule in ECS. It is pretty simple.
The difference in scaling is more subtle than that. The thing that makes lambda so nice from scalability point of view is that you don't need to worry about the scalability of your application. You don't need any awkward async stuff or tune application server flags or anything like that. Your only concern with lambda code is to respond to one request as fast as possible. You can write something that burns 100% CPU in a busyloop per request in a lambda if you want and it'll scale all the same. In fargate making sure that the application is able to handle some economical amount of concurrency is your responsibility, and it can in some cases be very much non-trivial problem.
Scaling does take time, but you would normally scale based on resource utilization (like if CPU or RAM usage exceeded 70%). So unless you had a really large and abrupt spike in traffic, the new container would be up before it's actually needed.
It's definitely not apples to apples with Lambda though--if you do have a very bursty workload, the cold start would be slower with Fargate, and you'd probably drop some requests too while scaling up.
If your app allows for it, a pattern I like is Fargate for the main server with a Lambda failover. That way you avoid cold starts with normal traffic patterns, and can also absorb a big spike if needed.
I think it's just the trade off between these two scenarios.
- Relatively poor amortized scale out time with good guarantees in the worst case.
- Good amortized scale out time with dropped requests / timeouts in the worst case.
With lambda, it doesn't really matter how spiky the traffic is. Users will see the cold start latency, albeit more often. With Fargate, users won't run into the cold start latencies - until they do, and the whole request may timeout waiting for that new server to spin up.
At least that seems to be the case to me. I have personally never ran a docker image in fargate, but I'd be surprised if it could spin up, initialize and serve a request in two seconds.
> With Fargate, users won't run into the cold start latencies - until they do, and the whole request may timeout waiting for that new server to spin up.
In practice that sort of setup is not trivial to accomplish with Fargate; normally while you are scaling up the requests get sent to the currently running tasks. There is no built-in ability to queue requests with Fargate(+ELB) so that they would then be routed to a new task. This is especially problematic if your application doesn't handle overloads very gracefully.
> Doesn't scaling take time, though? Doesn't downloading a new docker container definition and starting it take at least as long as initializing a new lambda function?
Yes, especially because they still don't support caching the image locally for Fargate. If you start a new instance with autoscaling, or restart one, you have to download the full image again. Depending on its size, start times can be minutes...
Doesn't scaling take time, though? Doesn't downloading a new docker container definition and starting it take at least as long as initializing a new lambda function?
Also with lambda there's no configuring to do for scaling. If anything lambda gives you tools to limit the concurrency.