Hacker News new | past | comments | ask | show | jobs | submit login

Interesting... according to this paper

  http://www.gsd.inesc-id.pt/~salaa/wttm2015/html/abstracts/Hasenplaugh.pdf
it's the opposite. The maximum reliable transaction write capacity if contiguous and well-aligned is small on POWER (63 cache lines) and high with TSX (400 lines). TSX is using L1 cache to buffer writes, and so the number of concurrent transactions only scales with cores. Whereas POWER uses a per-thread buffer which can scale linearly with the number of hardware threads (8 threads per CPU, 80 in their test).

So if I'm understanding this correctly, TSX has better capacity but poor concurrency, and POWER has poor capacity but better concurrency.

That's very cool.




Quick Side as the two commenters in this thread seem knowledge.

I assume xbegin is a memory barrier. But is it a full fence like mfence or lock prefix?

I see a lot of benchmarks using TSX for locking, but one of the nicer features of lock compxchg or lock xchg is they carried an implicit mfence this was nice because it forced reads/writes before the instructions to be completed.

I know xbegin/xend do _more_ then an mfence for reads/writes within the RTM region but do they provide fencing for instructions _after_ their execution?


No idea and don't want to dig into Intel docs right now, but I would be surprised if they were full fences as I think xbegin/xend would only require acquire/release semantics.

xacquire/xrelease can be used as modifiers to existing lock prefixed RMW instructions which are already full barriers, giving them optimistic locking capabilities.


Okay so I asking around on the intel forums. They fences for instruction ordering, and cache load/flush.

But not for load/stores. If you load/store from data within an RTM region (after _xend()) you have no atomic protections.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: