Right; a process is just a thread (or set of threads) and some associated resources - like file descriptors and virtual memory allocations. As I understand it, the scheduler doesn’t really care if you’re running 1000 processes with 1 thread each or 1 process with 1000 threads.
But I suspect it’s faster to swap threads within a process than swap processes, because it avoids expensive TLB flushes. And of course, that way there’s no need for IPC.
All things being equal, you should get more performance out of a single process with a lot of threads than a lot of individual processes.
> a process is just a thread (or set of threads) and some associated resources - like file descriptors and virtual memory allocations
Or rather the reverse, in Linux
terminology. Only processes exist, some just happen to share the same virtual address space.
> the scheduler doesn’t really care if you’re running 1000 processes with 1 thread each or 1 process with 1000 threads
Not just the scheduler, the whole kernel really. The concept of thread vs process is mainly a userspace detail for Linux. We arbitrarily decided that the set of clone() parameters from fork() create a process, while the set of clone() parameters through pthread_create() create a thread. If you start tweaking the clone() parameters yourself, then the two become indistinguishable.
> it’s faster to swap threads within a process than swap processes, because it avoids expensive TLB flushes
Right, though this is more of a theorical concern than a practical one. If you are sensible to a marginal TLB flush, then you may as well "isolcpu" and set affinities to avoid any context switch at all.
> that way there’s no need for IPC
If you have your processes mmap a shared memory, you effectively share address space between processes just like threads share their address space.
For most intent and purposes, really, I do find multiprocessing just better than multithreading. Both are pretty much indistinguishable, but separate processes give you the flexibility of being able to arbitrarily spawn new workers just like any other process, while with multithreading you need to bake in some form of pool manager and hope to get it right.
> The concept of thread vs process is mainly a userspace detail for Linux. We arbitrarily decided that the set of clone() parameters from fork() create a process, while the set of clone() parameters through pthread_create() create a thread. If you start tweaking the clone() parameters yourself, then the two become indistinguishable.
That's from Plan 9. There, you can fork, with various calls, sharing or not sharing code, data, stack, environment variables, and file descriptors.[1] Now that's in Linux. It leads to a model where programs are divided into connected processes with some shared memory. Android does things that way, I think.
Or rather the reverse, in Linux terminology. Only processes exist, some just happen to share the same virtual address space.
Threads and processes are semantically quite different in standard computer science terminology. A thread has an execution state, i.e. its set of set of processor register values. A process on the other hand is a management and isolation unit for resources like memory and handles.
But I suspect it’s faster to swap threads within a process than swap processes, because it avoids expensive TLB flushes. And of course, that way there’s no need for IPC.
All things being equal, you should get more performance out of a single process with a lot of threads than a lot of individual processes.