Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'd say AES Encryption/Decryption (aka: every HTTPS connection out there), and SHA256 Hashing is big. As is CRC32 (the VPMULDQ instruction), and others.

There's.... a ton of applications of AVX512. I know that Linus loves his hot-takes, but he's pretty ignorant on this particular subject.

I'd say that most modern computers are probably reading from TLS1.2 (aka: AES decryption), processing some JSON, and then writing back out to TLS1.2 (aka: AES Encryption), with probably some CRC32 checks in between.

--------

Aside from that, CPU signal filtering (aka: GIMP image processing, Photoshop, JPEGs, encoding/decoding, audio / musical stuff). There's also raytracing with more than the 8GB to 16GB found in typical GPUs (IE: Modern CPUs support 128GB easily, and 2TB if you go server-class), and Moana back in 2016 was using up 100+ GB per scene. So even if GPUs are faster, they still can't hold modern movie raytraced scenes in memory, so you're kinda forced to use CPUs right now.



> AES Encryption/Decryption (aka: every HTTPS connection out there),

that already have dedicated hardware on most of the x86 CPUs for good few years now. Fuck, I have some tiny ARM core with like 32kB of RAM somewhere that rocks AES acceleration...

> So even if GPUs are faster, they still can't hold modern movie raytraced scenes in memory, so you're kinda forced to use CPUs right now.

Can't GPUs just use system memory at performance penalty ?


> that already have dedicated hardware on most of the x86 CPUs for good few years now

Yeah, and that "dedicated hardware" is called AES-NI, which is implemented as AVX instructions.

In AVX512, they now apply to 4-blocks at a time (512-bit wide is 128-bit x 4 parallel instances). AES-NI upgrading with AVX512 is... well... a big important update to AES-NI.

AES-NI's next-generation implementation _IS_ AVX512. And it works because AES-GCM is embarrassingly parallel (apologies to all who are stuck on the sequential-only AES-CBC)


> Can't GPUs just use system memory at performance penalty ?

CPUs can access DDR4/DDR5 RAM at 50-nanoseconds. GPUs will access DDR4/DDR5 RAM at 5000-nanoseconds, 100x slower than the CPU. There's no hope for the GPU to keep up, especially since raytracing is _very_ heavy on RAM-latency. Each ray "bounce" is basically a bunch of memory-RAM checks (traversing a BVH tree).

Its just better to use a CPU if you end up using DDR4/DDR5 RAM to hold the data. There are algorithms that break up a scene into oct-trees that only hold say 8GBs worth of data, then the GPU can calculate all the light bounces within a box (and then write out the "bounces" that leave the box), etc. etc. But this is very advanced and under heavy research.

For now, its easier to just use a CPU that can access all 100GB+ and just render the scene without splitting it up. Maybe eventually these GPU oct-tree split / process within a GPU / etc. etc. subproblem / splitting will become better researched and better implemented, and GPUs will traverse System ram a bit better.

GPUs will be better eventually. But CPUs are still better at the task today.


I am confused, CPUs have dedicated instructions for AES encryption and CRC32. Are they slower than AVX512?


> I am confused, CPUs have dedicated instructions for AES encryption and CRC32. Are they slower than AVX512?

Those instructions are literally AVX instructions, and have been _upgraded_ in AVX512 to be 512-bit wide now.

If you use the older 128-bit wide AES-NI instruction, rather than the AVX512-AES-NI instructions, you're 4x slower than me. AVX512 upgrades _ALL_ AVX instructions to 512-bits (and mind you, AES-NI was stuck on 128-bits, so the upgrade to 512-bit is a huge upgrade in practice).

-----

EDIT: CRC32 is implemented with the PCLMULQDQ instruction, which has also been upgraded to AVX512.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: