Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In general: consumer cards with very bad FP64 performance have it fused off for product segmentation reasons, datacenter GPUs with bad FP64 performance have it removed from the chip layout to specialize for low precision. In either case, the main concern shouldn't be FLOPS/W but the fact that you're paying for so much silicon that doesn't do anything useful for HPC.


This theory only makes sense if consumer cards are sharing dies with enterprise/datacenter cards. If the consumer card SKUs are on their own dies, they're not going to etch something into silicon only to then fuse it off after the fact.

Regardless, there's "tricks" you can use to sort of extend the precision of hardware floating point - using a pair of e.g. FP32 numbers to implement something that's "almost" a FP64. Well known among numerics practitioners.


Until recently, consumer, workstation, and datacenter GPUs would all share a single core design that was instantiated in varying quantities per die to create a product stack. The largest die would often have little to no presence in the consumer market, but fundamentally it was made from the same building blocks. Now, having an entirely separate or at least heavily specialized microarchitecture for data center parts is common (because the extra design costs are worth it), but most workstation cards are still using the same silicon as consumer cards with different binning and feature fusing.


consumer cards don't share dies with datacenter cards, but they do share dies with workstation cards (the formerly quadro line), ex. the GB202 die is used by both the RTX PRO 5000/6000 Blackwell and the RTX 5090




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: