I've seen this over and over. One of the main issues pointed out by TFA is that ...

cogman10 · 2024-12-03T23:53:06 1733269986

The rule of thumb also keeps you from doing a lot of task switching. It isn't free enqueue and dequeue tasks. It is better if you have a million things to do to have a smaller set of tasks. Especially if the runtime for those tasks are somewhat uniform.

mcronce · 2024-12-04T04:18:02 1733285882

For sure. Context switching tasks is certainly a lot cheaper than context switching threads, but it isn't free.

menaerus · 2024-12-04T09:19:52 1733303992

> One of the main issues pointed out by TFA is that there's too many small tasks allocated for parallel execution

Valid concern but I don't think this was the OP case though?

From my understanding, author gained the most benefits by dumbing down the generic rayon implementation to the same kind (thread-pool with task queues) but with different work-stealing algorithm.

> Rayon is not going to magically distribute your work perfectly, though it very often does a decent job.

Work-stealing by definition kinda makes distributing the work "correctly" a difficult task, doesn't it?

jvanderbot · 2024-12-04T17:55:43 1733334943

Well, sure, in practice work stealing makes correct distribution difficult, but in theory, work stealing is to repair an incorrect work distribution, right?

If every CPU is 100% utilized without needing context switch (and running the right number of worker threads without switching those), then work stealing is not required.

But my comment is solidly "Rule of thumb". I claim no theoretical basis other than "Giving fewer longer tasks to fewer threads, (still >= number of worker threads), is better than giving more shorter ones"

dboreham · 2024-12-04T01:12:41 1733274761

Larger grain size better.