> And IIRC, each Intel SMT (hyper-thread) unit has its on instruction pointer an...

slashdev · on Dec 22, 2022

In reality you never have a finely tuned load that uses all the resources all of the time. Processors spend a lot of time waiting for memory, so hyperthreading allows for better utilization of compute. It’s rare to have a workload that benefits from turning it off, and in those cases it’s usually because the hyperthread is hurting the cache hit rate enough to offset the gains.

MauranKilom · on Dec 22, 2022

I recently checked whether an ugly hack I implemented years ago for 30% performance gains ("halve the default OpenMP thread count at start of computation and restore it afterwards" -> effectively, disable hyperthreading) was still necessary. It's now apparently a 3x performance gain... All because I'm saturating memory bandwidth. I don't know whether to be happy or sad...

slashdev · on Dec 22, 2022

Ouch, looks like they’re clobbering each others cache. Also, well done. It looks like you optimized the hell out of your code.

hinkley · on Dec 22, 2022

Both threads are waiting on the same memory pipeline. The actual performance has always lagged behind the theory

kevin_thibedeau · on Dec 22, 2022

If one thread has to hit main memory but the other one can get what it needs from L1 cache they aren't competing for the same resources.

slashdev · on Dec 22, 2022

But not the same memory operations. Unless your code fully saturates the memory bandwidth, which is rare, you get some gains here.

AdrianB1 · on Dec 22, 2022

It is almost never about saturating the memory bandwidth, but waiting to load instructions and data from memory. That wait time counted in computer cycles is huge.

slashdev · on Dec 22, 2022

Yes, that’s precisely why hyperthreading is such a good deal.

hinkley · on Dec 23, 2022

“Such” a good deal is 2-30% in benchmarks (and they don’t say if that’s with cache leak protections turned on or off). In previous generations it was more like -20-20%. If one thread is having issues with L1 and L2 cache, splitting that evenly with a completely different workflow isn’t going to help.

If cache contention weren’t a problem, and it was just a matter of jumping into previously unseen instructions and data (cold cache), you’d expect to see 50-300% numbers from hyperthreading, precisely because of how long the stalls are.

FpUser · on Dec 22, 2022

Many years ago a customer on a single core computer had problems of heavy video stutter in my product. It was multithreaded and ran as I recall 5 threads (GUI, video decoding, 3d pipeline, device control and computation). Others did not have this problem. After investigation it turned out that said customer had hyperthreading disabled in BIOS. Enabling it fixed the problem instantly.

So "real" or not but from my experience HT does work to the benefit.

hinkley · on Dec 22, 2022

I don’t know how many hardware revisions Intel went through where every benchmark said hyperthreading was slower than turning it off, but it was a lot. It became Lucy’s football at some point.

And in these days of post-Dennard scaling, you have thermal throttling, so idle cycles aren’t actually idle, they’re allowing the heat sink to catch up with heat production.

zamadatix · on Dec 22, 2022

I agree with this stance. It's the same as branch prediction, out of order execution, or anything else about computing efficiently.