You don't have to kill/respawn the process after every request, only the ones th...

preseinger · on Aug 9, 2022

If you don't bind request lifetime to process lifetime, it's not process-per-request, it's a worker pool implemented with processes. That means you need a dispatcher, which tends to becomes the bottleneck. OS processes get you super excellent isolation, I agree, and sometimes that's the most important thing to have and it makes sense to design things that way. But request-per-process is just really hard to make fast. OS scheduling overhead is way less than it used to be but it's still a lot.

toast0 · on Aug 9, 2022

Blocking accept(2) makes a decent to good dispatcher, depending on your OS, and if you can stomach one request per socket (if you can't, you would need to pass the sockets back to a dispatcher between requests to wait for the socket to become ready. In the good old days, you could use accept filters and not see incoming connections until they were ready, but that doesn't really work for TLS or modern http with persistent connections.) You could make that pretty fast if you run one dispatcher per core, and align them with the NIC queues; each dispatcher with its own pool of workers.

If your work is mostly compute, then you usually don't really want to run more concurrency than one, maybe a few workers per core, and then OS scheduling is easy. If your work is more of waiting for i/o, large concurrency makes more sense, but the OS scheduling is not going to be too hard there, because it takes almost nothing for the OS to leave a process blocked on i/o; but you do need to have good timer scalability if you have a lot of processes, since they're all going to want to set and clear a timeout on most of the syscalls. io_uring etc with a small number of os processes/threads might be less work for the kernel, but certainly at the cost of isolation.

preseinger · on Aug 14, 2022

My experience is that basically all request-servicing work is I/O-bound. And AFAIK there is no request-servicing system in normal production use which does process-per-request. Even request-per-socket is basically outmoded; modern protocols multiplex logical requests over physical connections one way or another, e.g. HTTP/2.