Hacker News new | past | comments | ask | show | jobs | submit login

At my last job we built an entire API on top of serverless. One of the things we had to figure out was this cold start time. If a user were to hit an endpoint for the first time, it would take 2x as long as it normally would at first. To combat this we wrote a "runWarm" function that kept the API alive at all times.

Sure kind of defeats the purpose of serverless but hey, enterprise software.




"To combat this"

Did you actually need to?

That's one of the things that always threw me with complaints about cold starts - how many apps/etc do I use daily where I interact, and there's a 10 second delay before something happens? The answer: quite a lot.

Yeah, we can do better. And in fact, with Serverless, -most users will experience better-. It's only when load is increasing that you see those delays, and then it's still only a delay. Not shed load.

The fact I can experience that delay easily in dev makes people think it's going to be a huge problem, but, A. In real use it probably isn't as common as it is in dev (since you have way more traffic) B. You can design to minimize it (different API endpoints can hit the same lambda and be routed to the right handler there, making it more likely to be hot), C. It forces you to test and plan for worst case from the beginning (rather than at the end where you've built something, and now have to load test it).

Not to say to use it all the time, of course; there are plenty of scenarios where the cost, the delay, etc, are non-starters. But there are also plenty of scenarios where an engineer's instinctual reaction would be "too slow", but in reality it's fine; your p95 is going to look great, and only your P99 is going to look bad (on that note, a slow API response accompanied with a spinner is a very different thing from a UX perspective than a slow page load with no indication of progress), and even then it's predictable when it happens, and it forces function scale out rather than tanking a service. Of course, it's often not obvious upfront which scenarios those would be until/unless you try it, and that's definitely a barrier.


There is actually a really awesome middle-ground that AWS offers that no one seems to talk about.

That is using ECS + Fargate. This gives you (IMHO) the best of both worlds between Lambda and EC2.

ECS is Elastic Container Service. Think docker/podman containers. You can even pull from Dockerhub or ECR (Elastic Container Registry - amazon's version of dockerhub). ECS can then deploy to either a traditional EC2 compute instance (giving you a standard containerization deployment) or to "Fargate".

Fargate is a serverless container compute instance. It is like serverless EC2. You get the "serverless" benefits of Lambda, but it is always-on. It has automatic scaling, so it can scale up and down with traffic (all of which is configured in ECS). You don't need to manage security updates of the underlying compute instance or manage the system. You get high-availability and fault tolerance "for free". But at the end of the day, its basically a non-managed EC2 instance. You can choose the ram/cpu options that you need for your Fargate just like any other compute instance. My recommendation is go as small as possible and rely on horizontal scaling instead of vertical. This keeps costs as low as possible.

When I hear people trying to keep Lambdas running indefinitely, it really defeats the purpose of Lambda. Lambda has plenty of benefits, but it is best used for functions that run intermittently and are isolated. If you want the serverless benefits of Lambda, but want to have the benefits of a traditional server too, then you need to look at Fargate.

And of course there is a world where you combine the two. Maybe you have an authentication service that needs to run 24/7. Run it via ECS+Fargate. Maybe your primary API should also run on Fargate. But then when you need to boot up a bunch of batch processing at midnight each night to send out invoices, those can use Lambdas. They do their job and then go to sleep until the next day.

I should also add that the developer experience is far superior going the ECS+Fargate route over Lambda. I have built extensive APIs in Lambda and they are so difficult to debug and you always feel like you are coding with one hand tied behind your back. But with ECS+Fargate you just build projects as you normally would, with your traditional environment. You can do live testing locally just like any other container project. Run docker or podman on your system using an Amazon Linux, Alpine Linux, CentOS base. And that same environment will match your Fargate deployment. It makes the developer experience much better.


>It has automatic scaling, so it can scale up and down with traffic (all of which is configured in ECS)

Doesn't scaling take time, though? Doesn't downloading a new docker container definition and starting it take at least as long as initializing a new lambda function?

Also with lambda there's no configuring to do for scaling. If anything lambda gives you tools to limit the concurrency.


Thanks for pointing that out. I should have clarified because I agree that "Automatic" is a relative term.

Lambda is entirely automatic like you point out. You literally don't need to think about it. You upload your function and it scales to meet demand (within limits).

ECS however still requires configuration, but it is extremely simple to do. They actually call it "Service Auto-Scaling". Within there you choose a scaling strategy and set a few parameters. That is it. After that, it really is "automatic".

Most of the time you will be selecting the "Target Tracking" strategy. Then you select a Cloudwatch metric and it will deploy and terminate Fargate instances (called "tasks" in the docs) to stay within your specified range. So a good example would be selecting a CPUUsage metric and keeping the average CPUUsage between 40-70%. If the average CPU usage starts to get above 70% across your tasks (Fargate instances), then ECS will deploy more automatically. If it falls below 40% then it will terminate them until you get within your desired range. You get all this magic from a simple configuration in ECS. So that's what I mean by automatic. Its pretty easy. Depending on what you are doing, it can set scaling to any other metric. It could be bandwidth, users, memory usage, etc. Some of these (like memory) require you to configure a custom metric, but again it isn't bad.

You can also scale according to other strategies like scheduled. So if get lots of traffic during business hours you can scale up during business hours and scale down during the night. Again, just set your schedule in ECS. It is pretty simple.


The difference in scaling is more subtle than that. The thing that makes lambda so nice from scalability point of view is that you don't need to worry about the scalability of your application. You don't need any awkward async stuff or tune application server flags or anything like that. Your only concern with lambda code is to respond to one request as fast as possible. You can write something that burns 100% CPU in a busyloop per request in a lambda if you want and it'll scale all the same. In fargate making sure that the application is able to handle some economical amount of concurrency is your responsibility, and it can in some cases be very much non-trivial problem.


Scaling does take time, but you would normally scale based on resource utilization (like if CPU or RAM usage exceeded 70%). So unless you had a really large and abrupt spike in traffic, the new container would be up before it's actually needed.

It's definitely not apples to apples with Lambda though--if you do have a very bursty workload, the cold start would be slower with Fargate, and you'd probably drop some requests too while scaling up.

If your app allows for it, a pattern I like is Fargate for the main server with a Lambda failover. That way you avoid cold starts with normal traffic patterns, and can also absorb a big spike if needed.


I think it's just the trade off between these two scenarios.

- Relatively poor amortized scale out time with good guarantees in the worst case.

- Good amortized scale out time with dropped requests / timeouts in the worst case.

With lambda, it doesn't really matter how spiky the traffic is. Users will see the cold start latency, albeit more often. With Fargate, users won't run into the cold start latencies - until they do, and the whole request may timeout waiting for that new server to spin up.

At least that seems to be the case to me. I have personally never ran a docker image in fargate, but I'd be surprised if it could spin up, initialize and serve a request in two seconds.


> With Fargate, users won't run into the cold start latencies - until they do, and the whole request may timeout waiting for that new server to spin up.

In practice that sort of setup is not trivial to accomplish with Fargate; normally while you are scaling up the requests get sent to the currently running tasks. There is no built-in ability to queue requests with Fargate(+ELB) so that they would then be routed to a new task. This is especially problematic if your application doesn't handle overloads very gracefully.


> Doesn't scaling take time, though? Doesn't downloading a new docker container definition and starting it take at least as long as initializing a new lambda function?

Yes, especially because they still don't support caching the image locally for Fargate. If you start a new instance with autoscaling, or restart one, you have to download the full image again. Depending on its size, start times can be minutes...


The big issue with ECS+Fargate is the lack of CPU bursting capability. This means that if you want to run a small service that doesn't consume much, you have two options:

1. Use a 0.25cpu + 0.5gb ram configuration and accept that your responses are now 4 times slower because the 25% time is strictly enforced.

2. Use a 1cpu + 2gb ram (costing 4 times more) even though it is very under-utilized.

AWS is definitely in no rush to fix this, as they keep saying they are aware of the issue and "thinking about it". No commitment or solution on sight though:

https://github.com/aws/containers-roadmap/issues/163


Agreed - I'm in the process of moving hundreds of Java Lambdas into a Spring application running in ECS. It costs more to run, but I get flexibility with the scaling parameters and I can more easily run my application locally too. I'm still stuck on AWS but less so than with Lambda.


We're hedging our bets by doing initial small scale work in Lambda with containers, explicitly to enable a shift to Fargate ECS when it's needed.


> To combat this we wrote a "runWarm" function that kept the API alive at all times.

This doesn't really work like you'd expect and isn't recommended, as it only helps a particular use-case. The reason is that AWS Lambda will only keep a single instance of your function alive. That means if two requests come in at the same time, you'd see a cold start on one of those invocations.

Instead, you want to look at something like provisioned concurrency.


Provisioned concurrency is insanely expensive. If you have any kind of a thundering herd access pattern then Lambda is a complete non-starter because of the warm-up and scaling characteristics. We eventually just put an nginx/openresty server on a regular medium EC2 instance and got rid of Lambda from our stack completely and now we're paying about 1/300th the cost we were previously and the performance is infinitely better.

I'm sure it has some use-cases in some kind of backoffice task queue scenario, but Lambda is nearly unusable in a web context unless you have a very trivial amount of traffic.


This is another example of AWS over marketing Lambda. Lambda is horrendously expensive when requests pass a certain level per second. You can graph it against ECS / EC2 to see the point it stops becoming economical.

Taking all of this into account, Lambda is then useful for a very small niche:

- Tasks that don't care about low P99 latency. These tend to be asynchronous processing workflows, as APIs in the customer request path tend to care about low P99 latency.

- Tasks that have a low request per second. Again, these tend to be asynchronous processing workflows.

You talk to anyone on the AWS serverless team and the conversation eventually focuses on toil. If you can quantify engineering toil for your organization, and give it a number, the point at which Lambda stops being economical shifts right, but it doesn't change the overall shape of the graph.


> This is another example of AWS over marketing Lambda. Lambda is horrendously expensive when requests pass a certain level per second.

I feel this is a gross misrepresentation of AWS Lambdas.

AWS lambdas are primarily tailored for background processes, event handlers, and infrequent invocations. This is how they are sold, including in AWS' serverless tutorials.

Even though they can scale like crazy, and even though you can put together an API with API Gateway or even Application Load Balancer, it's widely known that if your API handles more more traffic than a few requests per second then you're better off putting together your own service.

The rationale is that if you don't need to do much with a handler, or you don't expect to handle a lot of traffic on a small number of endpoints, AWS lambdas offer a cheaper solution to develop and operate. In some cases (most happy path cases?), they are actually free to use. Beyond a certain threshold, you're better off getting your own service to run on EC2/Fargate/ECS/whatever, specially given that once you have a service up and running then adding a controller is trivial.


> I feel this is a gross misrepresentation of AWS Lambdas.

AWS Lambda is a serverless compute service that lets you run code without provisioning or managing servers, creating workload-aware cluster scaling logic, maintaining event integrations, or managing runtimes. With Lambda, you can run code for virtually any type of application or backend service - all with zero administration. Just upload your code as a ZIP file or container image, and Lambda automatically and precisely allocates compute execution power and runs your code based on the incoming request or event, for any scale of traffic. You can set up your code to automatically trigger from over 200 AWS services and SaaS applications or call it directly from any web or mobile app. You can write Lambda functions in your favorite language (Node.js, Python, Go, Java, and more) and use both serverless and container tools, such as AWS SAM or Docker CLI, to build, test, and deploy your functions.

https://aws.amazon.com/lambda/

Edit:

> it's widely known that if your API handles more more traffic than a few requests per second then you're better off putting together your own service.

How is it widely known? Is it on their documentation clearly or in their marketing materials to use another AWS product?

That's what's I mean by over marketing here. Requiring insider baseball knowledge because using it as described footguns your company at infection points isn't a great customer experience.


> AWS Lambda is a serverless compute service that lets you run code (...)

So? It can run your code the way you tell it to run, but you still need to have your head on your shoulders and know what you're doing, right?

> How is it widely known?

It's quite literally covered at the start of AWS's intro to serverless courses. Unless someone started hammering code without spending a minute learning about the technology or doing any reading at all whatsoever on the topic, this is immediately clear to everyone.

Let's put it differently: have you actually looked into AWS's docs on typical lamba usecases, lambda's pricing and lambda quotas?

> That's what's I mean by over marketing here. Requiring insider baseball knowledge (...)

This sort of stuff is covered quite literally in their marketing brochures. You need to even be completely detached from their marketing to not be aware of this. Let me be clear: you need to not have the faintest idea of what you are doing at all to be oblivious to this.

There's plenty of things to criticize AWD over, but I'm sorry but this requires complete ignorance and a complete lack of even the most cursory research to not be aware.


You've been going on and on. I linked you the AWS marketing page on Lambda that includes it scales with no infrastructure and can be used for all use case.

You've had two chances to cite something on their vast marketing and documentation other than marketing brochures (are you serious?) and AWS specific training, paid or otherwise.

You even quoted the wrong part of the marketing spiel.

Just upload your code as a ZIP file or container image, and Lambda automatically and precisely allocates compute execution power and runs your code based on the incoming request or event, for any scale of traffic

ANY scale of traffic, requests or events. Just upload a ZIP or image and you're done. We know that isn't the case, don't we? Even without AWS sales people showing up personally to provide us marketing brochures they wouldn't put on their website.


I use Netlify serverless functions (which is just a wrapper around AWS Lambda) because it basically fits the criteria for me. I have a low but bursty access pattern that fits into the free tier, and there's a static SPA page that can serve up instantly while the XHR triggers to do the cold start fetch. I don't think I would use it for anything consumer facing though. This is just a backend where an extra 300ms isn't going to make a big difference to the admins.


In my experience cold starts don't affect the p99 if you have substantial traffic, because you have enough lambdas consistently running that cold start rate is ~0.1%. P99.9 also matters though!


If you have substantial traffic, the cost savings of Lambda are gone, and you can just use ECS or something.


Insanely expensive is definitely a flexible term. I think numbers help here.

Provisions Concurrency $8.64 / GB / month

256 MB per Lambda (Assuming Python, Ruby, NodeJS, or Rust)

$2.16 per Lambda per month

A lot of organizations can probably make a good business case for keeping 100s or even 1000s of Lambda's warm. You also don't need to keep them warm 24x7, can get an additional 12% discount using savings plans, and if you're a big guy you get your EDP discount.


I'm sure there are plenty of companies that are happy to throw $5,000/mo at a problem that can be solved better for $250/mo. Not mine though.


Does your $250/mo include all the ops cost of other solutions?


Yes it does


You should look at offering this as a service perhaps. 2,500 250MB lambdas for $250/month with all AWS guarantees (ie, Multi-AZ, permissioning on every call etc etc) would be pretty compelling I think for folks running intermediate lambda workloads (ie, 5-10K lambadas at a time).


I'm not trying to offer it as a service. I'm trying to run my workload in a way that can scale from 0 -> 10,000 request/second in an instant and doesn't cost my company $5,000/month to do so.

It's pretty easy if you know what you're doing (or care to figure it out).


If you can do $250/month with all ops costs and features of lambda for 5,000 or 10,000 requests per second - you would be silly not to offer a service.

There are plenty of us who can run a system that scales to 10krps. That's relatively easy? I personally can't stand lambda and don't use it FWIW. I like EC2, I actually like fargate a lot of all sorts of things including lambda like services without a separate lambda for each request.

But for folks with a payload, that want the lambda like experience - if you have a solution, all ops cost included (ie, no well paid developer or ops person needed for customer) for $250/month for the scale we are talking here (2,500 x 250MB = 625GB etc) then you have an amazing solution going especially if you can do the networking, IAM controls etc that aws provides.

The problem I've seen, when folks say amazon is "insanely expensive" they are usually not actually comparing the AWS offering to a similar offering. If your cheap solution is not lambda like, you need to compare to EC2 or similar (with perhaps a good programmer doing something a bit more monolithic than aws).


You could make a fortune selling that by itself, unless your ops cost is just yolo provisioning and never doing backups/patching/etc.


I'll never understand how we got to this point of learned helplessness where people think hosted services like Lambda are the only ones capable of being secure and robust. It's madness..


That's not what I said or implied.


I'm not sure what you're trying to say then.

> your ops cost is just yolo provisioning and never doing backups/patching.

You think Amazon is the only one capable of doing backups and keeping software up to date?


No, but I think that it's super common to discount to $0 all the work that using lambda saves you from maintenance and operations. And if you can do any of that at scale for $250/mo you're lightyears ahead of nearly everyone.


1 GB of lambda provisioned concurrency is $10.95 USD a month in us-east-1.

That's what you pay for your lambda sitting there doing nothing.

You can get an on-demand ec2 instance with 1 GB of RAM, for $9.13 USD a month, or $6.87 if you get a reserved instance.

You can fully utilize those instances the whole month.

Source: https://calculator.aws/#/estimate?id=7e1d1c2f32a2c63ba4ded19...


Yeah I used the ARM price. I should have pointed that out. It's definitely a tradeoff. Here's a few things I think are advantageous for Lambda,

* Integration with all the AWS Event Sources.

* Faster autoscaling.

* No need to setup a VPC (subnets, NAT gateway, security groups, VPC endpoints potentially, etc)

* No need to setup autoscaling, multi-AZ, etc.

* No need to support long lived instances.

I don't think one is always better than the other.


it ends up being cheaper overall if you have high utilization of your provisioning since the per second fee while a function is running is cheaper. using https://calculator.aws/#/createCalculator/Lambda, if you have a steady 1 request/s and each requests takes 1 second, 2592000 seconds in a month. at 1024mb, i get 36.52 for provisioned and 43.72 for on demand. With autoscaling...you wont get 100% utilization, but it probably ends up being close enough to a wash


> I'm sure it has some use-cases in some kind of backoffice task queue scenario, but Lambda is nearly unusable in a web context unless you have a very trivial amount of traffic.

This has been the outcome for me on several projects too. Just use loadbalanced EC2 (or EB, for simplification) and pay for a few instances running 24/7. It's actually cheaper than having a busy lambda in all my cases.

The only other case (other than occasional backoffice jobs) would be long-tail stuff: an API endpoint that is used in rare situations: for example the "POST /datatakeout" or "DELETE /subscription/1337" or such. Things that might be heavy, require offbeat tools and so on. We've had them for building PDFs and .docx from reports; a feature used by <2% of the users, yet requiring all sorts of tools, from latex to pandoc.


Yeah the caveats, gotchas, and workarounds you have to do to get something reasonable running on Lambda are just goofy.

At some point we just stopped and wondered why we were punishing ourselves with this stuff. We switched to a traditional webserver on regular EC2 instances and haven't looked back.


Have you run into issues with Lambda with complex tasks? I thought there was a 15 minute limit to tasks, plus a maximum storage size when importing large dependencies, etc?


The latex example did not run entirely on Lambda. Lambda would write a job into a queue (just Postgres), trigger a launch of a beefy ec2 instance, after which a worker on that ec2 picked up the job. Another lambda function would be called by the server itself to shut down the worker when all jobs were done.

Kludgy and slow. But it worked and did save some money, because the instance running this latex worker was big and chunky yet utilized maybe 10 hours a month.

Lambda was mostly acting as a kldugy load-balancer really.


I would guess this situation is maintained on purpose by AWS as the upsell reason for Fargate.


...or a big latency budget? Slow start is fine for a sudden burst for a lot of use cases.


Here is a little AWS doc describing what parent is talking about. Personally, I had confused "provisioned concurrency" with "concurrency limit" since I don't work with cloud stuff outside of hobbying.

https://aws.amazon.com/blogs/aws/new-provisioned-concurrency...


Have your runWarm sleep for 500ms and execute 50 of them concurrently. As long as none of the functions are finished and you start a new one you get a new instance, at least that's what I think.

You can get 50 hot instances that way no?

I'd rather scale per connections. Have a lambda instance do 50 concurrent requests. Something like https://fly.io but cheaper.


That reminds me of a custom Linux device driver that I worked with in the past. It implemented "mmap" so that a user application could map a ring buffer into userspace for zero-copy transfers.

It used lazy mapping in the sense that it relied on the page fault handler to get triggered to map each page in as they were touched.

This resulted in a latency increase for the very first accesses, but then it was fast after that since the pages stayed mapped in.

The solution?

Read the entire ring buffer one time during startup to force all pages to get mapped in.

I eventually changed the driver to just map them all in at once.


Same thing here, on a different cloud.

When we notice someone is using a form, we fire a no-op request to the function that will handle the data from the form so that it is less likely to be cold when the user is ready to proceed.

(We could get better results by switching to a different implementation language; but we have a body of code already working correctly aside from the extra second or two of cold start.)


> ...Sure kind of defeats the purpose of serverless but hey, enterprise software.

No, no, no. It's "hey, Amazon/Microsoft cloud engineering". They should be amazing with whiteboard interview exercises though.


What you and others are doing is attempting to predict your peak traffic when you take this approach. It may work for some companies, but more commonly in my experience, it hides P99+ tail latency from companies that may not instrument deeply (and they think the problem is solved).

The rate at which you execute `runWarm` is the peak traffic you're expecting. A request comes in over that threshold and you'll still experience cold start latency.

Provisioned concurrency doesn't change this, but it does move the complexity of `runWarm` to the Lambda team and gives you more control (give me a pool of 50 warmed Lambdas vs. me trying too run `runWarm` enough to keep 50 warmed myself). That's valuable in a lot of use cases. At the end of the day you're still in the game of predicting peak traffic and paying (a lot) for it.

We're almost always trying to predict peak traffic though! The difference is using a course grain computing platform, like EC2 for example, where a single box can handle hundreds++ requests per second, gives you more room for error, and is cheaper.

There are a lot of other trade-offs to consider. My biggest issue is this isn't enumerated clearly by AWS, and I run into way too many people who have footgun themselves unnecessarily with Lambda.


If you build an API and are concerned about cold-starts you could look at Lambda@Edge and "CloudFront Functions".

I could imagine these to perform better.

If you aren't married to AWS, then Cloudflare Workers could also be worth a shot.


Lambda@Edge helps with latency, definitely not with cold start times. You also can't buy provisioned Lambda@Edge, so for low traffic scenarios it's even worse than typical Lambda (where you can easily provision capacity, or keep on-demand capacity warm, which is not so cheap or easy when that must be done across every CloudFront cache region). For a low traffic environment, running e.g. 3-5 regular provisioned Lambda functions in different regions will produce a much more sensible latency distribution for end users than Edge would.

CloudFront Functions have no cold start, but their execution time is sorely restricted (1ms IIRC). You can't do much with them except origin selection, header tweaks or generating redirects, and there is no network or filesystem IO whatsoever.


> To combat this we wrote a "runWarm" function that kept the API alive at all times.

This sort of hack is not needed in AWS Lambdas, as they support provisioned concurrency.


Nor does it actually work. If you have a synthetic "runWarm" event, you'll trigger one concurrent lambda to stay warm. This helps if your cold start time is long and your average invoke time is short but you're just levying the cold start tax to the second concurrent user.

There's no reasonable way to keep a concurrency > 1 warm with synthetic events without negatively impacting your cold start percentage for users.

Provisioned concurrency is the correct solution and I'll remind everyone here that you can put provisioned concurrency in an autoscaling group, since the comments here seem to be saying keeping 100 lambdas warm is worse than a server that can handle 100 concurrent users (DUH!)


To be fair, all you’d need to accomplish that without more than necessary parts for production is to ensure that the code path invoking the function is accessed via an external monitoring probe with an adjustment to SLA or SLO to account for the cold start time. Obviously not going to work for many systems, but it’s easy to forget all the side effects of the observability plane when writing applications.


> To combat this we wrote a "runWarm" function that kept the API alive at all times.

maybe one should create 1 huge lambda instead of several hundred


Cold starts aren't just going from 0 to 1, you hit it anytime you scale. Putting everything in a single lambda likely means more scaling up.


Is there any guarantee of maximum startup time, or is there some upper timeout bound that you always have to anticipate in the worst case?


Not that I ever saw. They have made many improvements. But a cold start time of 2 minutes wasn't considered an bug or issue before they fixed the VPC/Lambda interconnect.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: