Well, sure, in practice work stealing makes correct distribution difficult, but ...

Well, sure, in practice work stealing makes correct distribution difficult, but in theory, work stealing is to repair an incorrect work distribution, right?

If every CPU is 100% utilized without needing context switch (and running the right number of worker threads without switching those), then work stealing is not required.

But my comment is solidly "Rule of thumb". I claim no theoretical basis other than "Giving fewer longer tasks to fewer threads, (still >= number of worker threads), is better than giving more shorter ones"