Newer editions of Computer Organization and Design: The Hardware Software Interface covers GPUs [1]
Multiflow still has some relevant ideas [2]
Programming on Parallel Machines: GPU, Multicore, Clusters and More. Gives you a look at some of the issues [3]
SPIRV-VM is a virtual machine for executing SPIR-V shaders [4]
NyuziRaster: Optimizing Rasterizer Performance and Energy in the Nyuzi Open Source GPU [5]
Ocelot is a modular dynamic compilation framework for heterogeneous systems, providing various backend targets for CUDA programs and analysis modules for the PTX virtual instruction set. [6]
glslang is the Khronos-reference front end for GLSL/ESSL, partial front end for HLSL, and a SPIR-V generator.
Just some links I had lying around. Well at least the ones that still worked.
A major problem I had was register pressure. Even with a decent sized register file. Memory accesses in the hundred of cycles range, made it very difficult to fill those 'free' cycles while you are waiting for data to be loaded. Double buffering data really just cut the effective size of the register file making everything worse. That did lead to some interesting way of optimising code to reduce temp storage over execution time.
Another was trying to access different memory locations and getting poor usage of the cache. Giving larger gaps to try and fill with something useful.
This was sometime ago now. So the tradeoffs will be different. Different types of RAM and transistor budgets
Multiflow still has some relevant ideas [2]
Programming on Parallel Machines: GPU, Multicore, Clusters and More. Gives you a look at some of the issues [3]
SPIRV-VM is a virtual machine for executing SPIR-V shaders [4]
NyuziRaster: Optimizing Rasterizer Performance and Energy in the Nyuzi Open Source GPU [5]
Ocelot is a modular dynamic compilation framework for heterogeneous systems, providing various backend targets for CUDA programs and analysis modules for the PTX virtual instruction set. [6]
glslang is the Khronos-reference front end for GLSL/ESSL, partial front end for HLSL, and a SPIR-V generator.
[1]: https://www.goodreads.com/book/show/83895.Computer_Organizat...
[2]: https://en.wikipedia.org/wiki/Multiflow
[3]: http://heather.cs.ucdavis.edu/parprocbook
[4]: https://github.com/dfranx/SPIRV-VM
[5]: https://www.cs.binghamton.edu/~millerti/nyuziraster.pdf
[6]:https://code.google.com/archive/p/gpuocelot/
[7]: https://github.com/KhronosGroup/glslang