I read it similarly - that this is a specific attribute of bfloat16, so the quan...

jasonjmcghee 64 days ago | parent | context | favorite | on: Lossless LLM compression for efficient GPU inferen...

I read it similarly - that this is a specific attribute of bfloat16, so the quants folks tend to run on local hardware don't have the same inefficiency to exploit