Metal is a beautiful API and as a hobbyist GPU programer I think Apple GPUs are what's really under-appreciated. Enthusiasts usually dismiss them as "mobile parts" and turn the blind eye to the very interesting features they bring:
- TBDR with user-programmable persistent GPU caches allow you to do some really cool things smartly, drastically cutting down the amount of work and memory fetched
- GPUs are trivially exposed as what they are: machines with very wide SIMD ALUs
- resource binding graphs with full support for pointers and indirection; resource bindings that can be created and populated on the GPU
- sparse resources that actually work and are performant (nobody uses them on mainstream GPUs because they are apparently slow as f** there)
- etc.
- TBDR with user-programmable persistent GPU caches allow you to do some really cool things smartly, drastically cutting down the amount of work and memory fetched - GPUs are trivially exposed as what they are: machines with very wide SIMD ALUs - resource binding graphs with full support for pointers and indirection; resource bindings that can be created and populated on the GPU - sparse resources that actually work and are performant (nobody uses them on mainstream GPUs because they are apparently slow as f** there) - etc.