Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Newer editions of Computer Organization and Design: The Hardware Software Interface covers GPUs [1]

Multiflow still has some relevant ideas [2]

Programming on Parallel Machines: GPU, Multicore, Clusters and More. Gives you a look at some of the issues [3]

SPIRV-VM is a virtual machine for executing SPIR-V shaders [4]

NyuziRaster: Optimizing Rasterizer Performance and Energy in the Nyuzi Open Source GPU [5]

Ocelot is a modular dynamic compilation framework for heterogeneous systems, providing various backend targets for CUDA programs and analysis modules for the PTX virtual instruction set. [6]

glslang is the Khronos-reference front end for GLSL/ESSL, partial front end for HLSL, and a SPIR-V generator.

[1]: https://www.goodreads.com/book/show/83895.Computer_Organizat...

[2]: https://en.wikipedia.org/wiki/Multiflow

[3]: http://heather.cs.ucdavis.edu/parprocbook

[4]: https://github.com/dfranx/SPIRV-VM

[5]: https://www.cs.binghamton.edu/~millerti/nyuziraster.pdf

[6]:https://code.google.com/archive/p/gpuocelot/

[7]: https://github.com/KhronosGroup/glslang



and a few more

Pixel Planes/Pixel Flow [1]

The Geometry Engine: A VLSI Geometry System for Graphics [2]

Tim Purcell's research [3]

BrookGPU [4]

GRAMPS: A Programming Model for Graphics Pipelines [5]

[1]: https://www.cs.unc.edu/~pxfl/

[2]: https://graphics.stanford.edu/courses/cs148-10-summer/docs/1...

[3]: http://graphics.stanford.edu/~tpurcell/

[4]: http://graphics.stanford.edu/projects/brookgpu/

[5]: http://graphics.stanford.edu/papers/gramps-tog/


wow, Thanks! Some super interesting resources, some that I haven't even heard of.


Just some links I had lying around. Well at least the ones that still worked.

A major problem I had was register pressure. Even with a decent sized register file. Memory accesses in the hundred of cycles range, made it very difficult to fill those 'free' cycles while you are waiting for data to be loaded. Double buffering data really just cut the effective size of the register file making everything worse. That did lead to some interesting way of optimising code to reduce temp storage over execution time.

Another was trying to access different memory locations and getting poor usage of the cache. Giving larger gaps to try and fill with something useful.

This was sometime ago now. So the tradeoffs will be different. Different types of RAM and transistor budgets




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: