I don't work on the nvidia side of things but it's likely to be the same. Shader replacement is only one of a whole host of things we can do to make games run faster. It's actually kind of rare for use to do them since it boats the size of the driver so much. A lot of our options do change how shaders work though, like forcing a shader to use double precision floats instead of the single it was compiled with.
> > A lot of our options do change how shaders work though, like forcing a shader to use double precision floats instead of the single it was compiled with.
That will break code sufficienly reliant on the behaviour of sungle precision, though.
In the case that does happen, then we don't apply that setting. Most of the changes applied are extensively tested and toggles like that are more often used for already broken shaders.
Fair enough, I can't say anything I've done has ever caused an issue like that (A new ticket would have been made and sent to me.) But I also can't say that it has never happened, so I'm not really in a position to disagree. We do have a good QA team though and we have an "open" beta program that also catches a lot of issues before they become more widely public.
I will note, half of the customer facing bugs I get are "works on nvidia." Only to find out that it is a problem with the game and not the driver. Nvidia allows you to ignore a lot of the spec and it causes game devs to miss a lot of obvious bugs. A few examples:
1) Nvidia allows you to write to read only textures, game devs will forget to transition them to writable and will appear as corruption on other cards.
2) Nvidia automatically work with diverging texture reads, so devs will forget to mark them as a nonuniform resource index, which shows up as corruption on other cards.
3) Floating point calculations aren't IEEE compliant, one bug I fixed was x/width*width != x, On Nvidia this ends up a little higher and on our cards a little lower. The game this happened on ended up flooring that value and doing a texture read, which as you can guess, showed up as corruption on our cards.
1 and 2 are specifically required by the microsoft directx 12 spec, but most game devs aren't reading that and bugs creep in. 3 is a difference in how the ALU is designed, our cards being a little closer to IEEE compliant. A lot of these issue are related to how the hardware works, so stays pretty consistent between the different gpus of a manufacturer.
Side note: I don't blame the devs for #3, the corruption was super minor and the full calculation was spread across multiple functions (assumed by reading the dxil). The only reason it sticks out in my brain though is because the game devs were legally unable to ever update the game again, so I had to fix it driver side. That game was also Nvidia sponsored, so it's likely our cards weren't tested till very late into the development. (I got the ticket a week before the game was to release.) That is all I'm willing to say on that, I don't want to get myself in trouble.
> Floating point calculations aren't IEEE compliant
To late to edit, but I want to half retract this statement, they are IEEE compliant, but due to optimizations that can be applied by the driver developers they aren't guaranteed to be. This is assuming that the accuracy of a multiply and divide are specified in the IEEE floating point spec, I'm seeing hints that it is, but I can't find anything concrete.
I'm just going off what I was told there, I was forced to make the fix since the game developers no longer were partnered to the company that owned the license to the content.
Good question, I'm assuming it's due to the calculation happen across a memory barrier of some kind or due to all the branches in between so llvm is probably avoiding the optimization. It was quite a while ago so it is something I could re-investigate and actually try and fix. I would have to wait for downtime with all the other tickets I'm getting though. It's also just something that dxc itself should be doing, but I have no control over that.
Just about any. It's pretty difficult to write code where changing the rounding of the last couple bits breaks it (as happens if you use wider types during the calculation), but other changes don't break it.
Originally I said "the premise is that any difference breaks the code".
You replied with "Not any."
That is where the requirement comes from, your own words. This is your scenario, and you said not all differences would break the hypothetical code.
This is your choice. Are we talking about code where any change breaks it (like a seeded/reproducible RNG), or are we talking about code where there are minor changes that don't break it but using extra precision breaks it? (I expect this category to be super duper rare)
> In my opinion, floating point shaders should be treated as a land of approximations.
Fine, but that leaves you responsible for the breakage the shader of an author that holds the opposite opinion, as he is entitled to do. Precision =/= accuracy.
Changes like that will break just as much code as adding extra precision will. Because it will change how things round, and not much else, just like adding extra precision. They're both slightly disruptive, and they tend to disrupt the same kind of thing, unlike removing precision which is very disruptive all over.
> A lot of our options do change how shaders work though, like forcing a shader to use double precision floats instead of the single it was compiled with.
What benefit would that give? Is double precision faster than single on modern hardware?
That's specifically because gpus aren't IEEE compliant, and calculations will drift differently on different gpus. Double precision can help avoid divide by zero errors in some shaders because most don't guard against that and NANs propagate easily and show up as visual corruption.
After a bunch of testing and looking around I think I should actually change my statement. GPUs do offer IEEE floating point compliance by default, but don't strictly adhere to it. Multiple optimizations that can be applied by the driver developers can massively effect the floating point accuracy.
This is all kind of on the assumption that the accuracy of floating point multiplication and division is in the IEEE spec, I was told before that it was but searching now I can't seem to find it one way or the other.
I believe one of the optimizations done by nvidia is to drop f32 variables down to f16 in a shader. Which would technically break the accuracy requirement (as before if it exists). I don't have anything I can offer as proof of that due to NDA sadly though. I will note that most of my testing and work is done in PIX for Windows, and most don't have anti-cheat so they're easy to capture.
What shaders (presumably GLSL & HLSL) do precision wise isn’t an IEEE compliance issue, it’s either a DX/Vulkan spec issue, OR a user compiler settings issue. Dropping compliance is and should be allowed when the code asks for it. This is why GLSL has lowp, mediump, and highp settings. I think all GPUs are IEEE compliant and have been for a long time.
I agree on the dropping compliance when asked for aspect, the problem I'm referring to more is the driver dropping compliance without the game asking for it. If the underlying system can randomly drop compliance when ever it thinks it's fine without telling the user and without the user asking, I would not consider that compliant.
That is fair, if true. But I’m very skeptical any modern drivers are dropping compliance without being asked. The one possibility I could buy is that you ran into a case of someone having dropped precision intentionally in a specific game in order to “optimize” a shader. Otherwise, precision and IEEE compliance is the prerogative of the compiler. The compiler is sometimes in the driver, but it never decides on it’s own what precision to use, it uses either default or explicit precision settings. The only reason it would not produce IEEE compliant code is if it was being asked to.
Only more precision. But no, doubles are not faster. At best they’re the same instruction latency & throughput as singles, and that’s only on a few expensive pro/datacenter GPUs. Even if they are technically the same instruction speed, they’re still 2x the memory & register usage, which can compromise perf in other ways. Doubles on consumer GPUs are typically anywhere from 16 to 64 times slower than singles.
FWIW, I’ve never heard of shader replacement to force doubles. It’d be interesting to hear when that’s been used and why, and surprising to me if it was ever done for a popular game.