Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> How is it a misinterpretation? To re-quote that last sentence:

I think we agree in our understanding, but condensing it down to "TSO isn't as much of a deal as claimed" is misleading:

* Efficient TSO emulation is crucial (both on Windows and elsewhere)

* The blog claims hardware TSO is non-ideal on Windows only (because Microsoft adapted the ecosystem to facilitate software-based TSO emulation). (Even then, it's unclear if the author quantified the concrete impact)

* Hardware TSO is still of tremendous value on systems that don't have ecosystem support

> [volatile metadata] doesn't help executables compiled with non-MSVC compilers like Clang, nor any JIT code, nor is there any documentation indicating how to support either of these cases.

That's funny, I hadn't considered third party compilers. Those applications would still benefit from ARM64EC (i.e. native system libraries), but the actual application code would be affected quite badly by the TSO impact then, depending on how good their fallback heuristics are. (Same for older titles that were compiled before volatile metadata was added)



Following up that last part -- I recompiled my x64 codebase with /volatileMetadata-, which reduced the volatile metadata by ~20K (the remainder most likely from the statically linked CRT). The profiling results were negligible, under noise level between the builds and both about 15-30% below the native ARM64 build.

The interesting part is when the compatibility settings for the executables are modified to change the default multi-core setting from Fast to Strict Multi-Core Operation. In that mode, the build without volatile metadata runs about 20% slower than the default build. That indicates that the x64 emulator may be taking some liberties with memory ordering by default. Note that while this application is multithreaded, the worker threads do little and it is very highly single thread bottlenecked.


20% is about the general order of magnitude we observed in FEX a while ago, though as you enable all TSO compatibility settings (including those rarely needed) it'll be much higher even. As people elsewhere in the thread mentioned it'd be interesting to see how FEX fares on Asahi with hardware TSO enabled vs disabled (but with conversative TSO emulation as set up by default) since it's less of a blackbox.


> Efficient TSO emulation is crucial (both on Windows and elsewhere)

Yes, but this is not in contention...? No one is disputing that TSO semantics in the emulated x86 code need to be preserved and that it needs to be done fast, we're talking about the tradeoffs of also having TSO support on the host platform.

> The blog claims hardware TSO is non-ideal on Windows only (because Microsoft adapted the ecosystem to facilitate software-based TSO emulation). (Even then, it's unclear if the author quantified the concrete impact)

> Hardware TSO is still of tremendous value on systems that don't have ecosystem support

That isn't what the author said. From the article:

> Another common misconception about Rosetta is that it is fast because the hardware enforces Intel memory ordering, something called Total Store Ordering. I will make the argument that TSO is the last thing you want, since I know from experience the emulator has to access its own private memory and none of those memory accesses needs to be ordered. In my opinion, TSO is ar red herring that isn't really improving performance, but it sounds nice on paper.

That is a direct statement on Rosetta/macOS and does not mention Prism/Windows. How correct that assessment may be is another matter, but it is not talking about Windows only.

> Those applications would still benefit from ARM64EC (i.e. native system libraries), but the actual application code would be affected quite badly by the TSO impact then, depending on how good their fallback heuristics are.

I will have to check this, I don't think it's that bad. JITted programs run much, much better on my Snapdragon X device than the older Snapdragon 835, but there are a lot of variables there (CPU much faster/wider, Windows 11 Prism vs. Windows 10 emulator, x86 vs x64 emulation). I have a program with native x64/ARM64 builds that runs at -25% speed in emulated x64 vs native ARM64, I'm curious myself to see how it runs with volatile metadata disabled.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: