> If I have a choice of making single job faster vs just running 64 jobs in parallel on CPU I'd pick the second every time I can because code will be simpler and less buggy every single time.
I maybe misunderstanding what you're trying to say, don't you mean you'd pick the first one every time? Make the single job faster instead of introducing concurrency with 64 parallel jobs?
I think he is arguing in favor of multiple concurrent processes ("single thread each"), instead of messing around with low-level threads + synchronization primitives (which would achieve a single fast process that uses multiple CPU cores by itself).
I maybe misunderstanding what you're trying to say, don't you mean you'd pick the first one every time? Make the single job faster instead of introducing concurrency with 64 parallel jobs?