There are other costs to regular context switching as opposed to goroutines/gree...

jashmatthews · on March 8, 2020

Green threads are much cheaper to switch than pthreads, yes. In real applications the difference is far smaller than it was 20 years ago when C10k was challenging. In 2020 you can just open 10k threads and forget about it.

With 100k threads and 100k Goroutines, each doing nothing but waiting on a mutex: pthreads in C takes ~20 microseconds per thread and in Go it’s about ~5 microseconds using Goroutines.

This difference disappears really easily. Parse some JSON and it’ll be gone.

Entering kernel code is the expensive part of context switching so syscalls are very nearly as expensive. Reading from a socket still needs a syscall, even with green threads or asynchronous IO.

The more different bits of IO you do, like in a real web app, the less advantage their is to green threads. This is one reason Rust dropped their M:N threading implementation.

fulafel · on March 8, 2020

There shouldn't be any 100k hard limit for threads at least in Linux, though you need enough memory for 100k stacks of course. You need to increase some default limits for it though (https://stackoverflow.com/a/26190804)

Assuming a generous(?) 20 kB per thread in stack and other corresponding OS bookkeeping inforation you could have 1k threads in 20 MB, or 1M threads in 20 GB.

Doing 100 Hz timer wakeups and IOs concurrently in 100k threads makes 10 M wakeups/second, that takes a chunk of CPU independent of green / native threads choice. Performance vs kernel threads will depend on the green threads implementation.

jashmatthews · on March 8, 2020

Yup. The Linux scheduler wakes threads based on IO events. You don’t end up just cycling through 100k threads all waking and sleeping again.

fulafel · on March 8, 2020

It's worth noting that the c10k writeup came out 20+ years ago, and those bottlenecks have been addressed both by fixing software bottlenecks and 20 years of semiconductor improvements.