This thread model is actually similar to the approach that Node.js takes in using libuv as a thread pool. There’s a main thread listening for incoming requests that immediately offloads the handling of them to the next available thread in the pool. Of course the handler thread can also offload some of its internal work to other pool threads.
What other server thread models exist? I know
php-fpm gives each request it’s own process, but I can’t think of any other feasible strategies off the top of my head.
There's also the process worker pool model, in which a master process forks a number of worker processes to handle all requests. Unicorn (for Ruby) does things this way...in part because when it was released, Rails apps couldn't handle multithreading all that well. I think Ruby also had some trouble with it, so this was just the easiest/most predictable way of getting good performance out of your Rails app.
Originally, the way `php-fpm` works is pretty much how the entire web worked. Every request into a server would go through `inetd`, it would fork a handler process, and the request would respond when the process exited. This became very difficult to handle at scale, mostly because of the overhead for forking a process, so the next logical step is to have a pool of worker processes ready to go in order to immediately receive requests. This is great for applications that don't need to get redeployed very often, but as we moved into more of a continuous deployment workflow, it began to break down as worker processes would need to be restarted, going back to that whole "overhead" thing. Unicorn suffers from this problem a bit, if you try to `SIGHUP` and reload application code when workers are still processing requests, Unicorn will wait until those processes have finished responding to restart the process and load the new version. If a client is taking too long and hanging onto the process, it's very possible that the process will just never reload the code and you'll have weird errors happening every so often.
A solution to this problem is to move this pool of worker processes into a pool of threads, which allow the server to control a bit more about how the application code is reloaded, and not have to deal with the overhead of forking processes. I believe that's how NGINX, Node.js, Puma, et. al. work under the hood...there's a thread pool of workers and another thread that listens to requests. Everything is event-driven, so when a new event comes in, the listener thread just sends that event off to the pool of worker threads. Basically the same idea, but using an event loop as a model for better concurrency support. (Puma is a bit different because it does allow for worker processes in addition to threads, but this isn't necessary, it just allows for better performance on larger machines)
I believe that you can also have multiple threads accepting connections on the same fd. This lets the kernel do the scheduling which removes the need for a coordinator thread. You can then choose to handle the connection on the same thread pinned to a specific CPU (per-cpu) or have a cross-thread task queue that whatever CPU is idle can handle (work stealing).
Varnish (HTTP caching only) also uses one thread per client. I believe worker threads are used to handle reuqests while a dedicated thread handles all the idle connections between requests using epoll(). Also per-threads stack size is lowered so thousands of threads don't occupy massive amount of memory.
Single threaded HTTP servers have their own issues. If the bottleneck is the storage then lack of async open()/stat() and some other calls is problematic. We feel that serving hundreds of millions of files (long tail content) from slow storage using nginx. For that reason you can configure nginx to spawn multiple processes.
Thought nginx epolls file i/o too along side socket i/o. Or did you find that the first call to open() or stat() stalls, while read/write after that continues normally?
File IO isn’t epoll-able. The operations are always blocking.
Those could however be offloaded onto a threadpool, to avoid the blocking to affect any other requests that are processed by the same Nginx worker. Nginx however only partially does that - whole file read and write operations are offloaded a whole bunch of other IO (stat, open, close) are executed on the main thread. I guess due to implementation challenges - one can’t just make one operation async but also needs to make each operation that utilizes those methods async.
Also in microservices land you could have the service doing its business logic but needing to support a server for metrics and/or debugging. Just piling on, this is good to study :)
What kind of latency are you talking about? Async latency for completing a task isn’t deterministic, and there’s no guarantee data will be processed as soon as it becomes available. Async runtimes rely heavily on hints from their tasks as to when to poll next.
Generally, low latency means producing a result as soon as possible. Threads are ideal for that case, expending spinning cpu time for asap processing.
The best description I’ve heard of async is concurrent waiting, vs concurrent processing for threads (from the excellent zero2prod book).
What other server thread models exist? I know php-fpm gives each request it’s own process, but I can’t think of any other feasible strategies off the top of my head.