A comment from someone who knows or knew the author or was part of the project sharing details that makes readers feel like they've just been handed backstage passes.
The required computing resources double at every branch where you take both paths, and if you speculate ahead by 100+ instructions, with let's say up to 20 branches, it gets way out of hand.
I could see CPUs sometimes taking both paths for close, hard to predict branches. Does anyone have information on that?
I believe the way things are currently trending is that architectures might turn some short hard to predict branches into predicated instructions instead (similar to x86 CMOV or some ARM conditional execution instructions). Outside of short branches the overhead for loading up to 2 instructions for every 1 that gets executed can be too costly. Branch predication on SIMD/SIMT instructions is already the way things work for GPUs and AVX256/512 from my understanding.
Elijah Sandiford at Linus Tech Tips asking Linus Torvalds what happens to Linux if Linus dies. Recorded around September 2025. Aired in November. Plan made in the 2025 Maintainers Summit.
Great episode:
https://www.youtube.com/watch?v=mfv0V1SxbNA&t=1860s
reply