Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Web Server. Say... this web page.

Inspector says that this web page, we're talking on is 48kB in size. That's small enough to fit inside L1 cache. Every single connection to the Web Server needs its own AES-key for encryption sake. The 48kB of data (such as my posts above, or your posts) are hot in Cache, but if 100 visitors to this webpage need to get it, HTTPS needs to happen 100x different times with 100x different keys.

So this 48kB text message (consisting of all the comments of this page) are going to have to be encrypted with different, random, AES keys to deliver the messages to you or me. AES operates on 16-bytes at a time, and AES-GCM is a newer algorithm that allows for all 48,000+ bytes to be processed in parallel.

AVX-512 AES instructions are ideal for processing this data, are they not? And processing them 4x faster (since 4x instances of AES are occurring in parallel, since AVX512 can work on 64 bytes per tick / 16 bytes per AES instance), is a lot better than just doing it 16-bytes at at time with the legacy AES-NI instructions.

----------

Despite being a parallel problem, this will never be worthwhile to send to the GPU. First, GPUs don't have AES-ni instructions. But even if they did, it would take longer to talk to the GPU than for the AVX512-AES instructions to operate on the 48kB of data (again: ~40,000 clock ticks to just start talking with the GPU in practice). In that amount of time, you would have finished encrypting the payload and have sent it off.



I've seen a lot about AVX-512 and didn't know those instructions existed until just now. They're not exactly generic vector instructions. And that's a nice improvement, but is AES-NI ever slow enough to matter? The numbers I found were inconsistent but all very fast.

Probably more important, there's a 256 bit version of that instruction. You can get half of that extreme throughput without AVX-512.


That Netflix guy who keeps optimizing their servers keeps coming back every year or so, talking about the latest optimizations he added.

http://nabstreamingsummit.com/wp-content/uploads/2022/05/202...

And a surprising amount of it was in TLS optimizations, in particular, offloading TLS to the hardware (Apparently Mellanox ConnectX ethernet adapters can do AES offload now, so the CPU doesn't have to worry about it).

Since Mellanox ConnectX adapters are trying to solve the AES problem still, I have to imagine that its a significant portion of a lot of server's workloads. Intel / AMD are obviously interested in it enough to upgrade AES to 4x wide in the AVX512 instruction set.

I can't say its particularly useful in any of _my_ workloads. But it seems to come up enough in those hyper-optimized web servers / presentations.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: