Because you need it outside the context of encryption/decryption. https://news.y...

stavros · 2025-04-29T00:08:01 1745885281

Hm, I understand the use cases, but I don't understand this: The only way to get the AD is to decrypt the ciphertext, right? Otherwise the data is unauthenticated, so I assume it's a big no-no to access it. If you need to decrypt the ciphertext to access the AD, why do you care if it was encrypted or not?

Basically, I'm not sure why `encrypt(key, nonce, (data, associated data))` (ie adding the AD to your ciphertext, with the encryption framework being unaware of it) is that different from `encrypt(key, nonce, data, associated data)` (ie the AD being a first-class citizen).

EDIT: I saw your other message, and this makes it click for me:

> authenticated data can include data that doesn't even appear in the message, but rather is derived from the context at the time the message is encrypted and decrypted

So the AD can be an additional envelope-level thing at encryption/decryption time, that helps a lot, thanks!

tptacek · 2025-04-29T00:10:31 1745885431

This is why message routing headers are kind of a fucky example (you can make it make sense but it begs for this confusion).

Instead, just take the chunked large-file encryption use case I gave in that comment. The chunk offset isn't recorded anywhere in the ciphertext. It's derived contextually while you decrypt the file. The AD ensures that the decryption will explode if you try to cut and paste chunks of the file into different positions.

stavros · 2025-04-29T00:17:24 1745885844

Yeah, you're right, I was thinking about it in a case where the implementation had the ciphertext being `(block data, chunk offset)` so it did make it part of the message, but it's more elegant for the associated data to be separate from the ciphertext.

firesteelrain · 2025-04-29T02:19:09 1745893149

Message headers have a tendency to want to mutate so it makes the problem more complicated to solve for but decrypting chunks in the right order is a good example to grasp because it’s referring to essential metadata that needs to stay open so readers of the data know what to do with it. AD is bound to the cipher text.

tptacek · 2025-04-29T00:18:52 1745885932

It's a really good question, because, in order to verify the AD, you have to have the same key you need to decrypt it.

stavros · 2025-04-29T00:21:12 1745886072

Yep, that's the part that throws me. Is it fair to say that it's a more elegant way to include metadata in the ciphertext, without really messing with the plaintext itself? Ie it's basically "just" a way to distinguish the message from its metadata?

edoceo · 2025-04-29T00:59:14 1745888354

Does that make it like some kind of HMAC?

hxtk · 2025-04-29T02:12:38 1745892758

Yes, in fact, one construction of the AEAD primitive is to use AES-CTR with HMAC to "bolt on" authentication after the fact (AES-CTR on its own is an unauthenticated stream cipher).

You can find an implementation of AES-CTR-HMAC (at a high level where AES-CTR and HMAC are both given) here: https://github.com/tink-crypto/tink-go/blob/main/aead/aesctr...

andrewflnr · 2025-04-29T05:40:52 1745905252

> The AD ensures that the decryption will explode if you try to cut and paste chunks of the file into different positions.

Ah, that's the key bit of perspective. Just talking about "context" is so abstract. That's a case where you don't even need to transmit the AD, right? Do you ever have cases where the AD is a mix of transmitted and locally/"contextually" derived data?

vlovich123 · 2025-04-29T02:30:05 1745893805

The strategy I used instead was to HKDF derive different keys for each chunk using the offset as part of the info to derive the key. No AD needed.

tatersolid · 2025-04-29T10:23:03 1745922183

Two calls to SHA-256 for each block would be very slow compared with a modern AEAD.

vlovich123 · 2025-04-29T14:41:16 1745937676

Hmm… it seemed to run at line speed on our machines. I’m also not sure where you’re getting two calls for sha256 from? Like 1 to derive the key (which is sha256 on a very small amount of data) and the second is?

tatersolid · 2025-04-29T21:50:03 1745963403

HMAC requires two calls to the underlying hash function. In this case one is with a block-size input and the other is smaller (key size plus output of the first call). When called per-block this approach is much slower than any modern AEAD (which typically requires simple polynomial math on each block plus a single AES/ChaCha/whatever finalization call).

It might be “fast enough for line rate” in your situation but even then you could be saving CPU cycles for other work by using a more efficient construction.

tptacek · 2025-04-29T18:03:27 1745949807

What does it mean for an AEAD construction to "run at line speed"? Over how many different sessions and with what message sizes and on what hardware?

vlovich123 · 2025-04-29T18:57:17 1745953037

As in it's processing at the speed that data can be fed into the CPU. This particular use case was files coming in from the network on 10Gbps hardware but that was about the speed the AES HW ran via openssl perf tests. How many sessions and message sizes are irrelevant. Hardware was AMD EPYC 7642.

tptacek · 2025-04-29T21:21:23 1745961683

It's irrelevant to your use case. It's not irrelevant to the broader CS question of why people don't just do what you're doing.

vlovich123 · 2025-04-30T01:11:42 1745975502

If there’s no HW that demonstrates a speed difference, then maybe the theoretical CS concerns aren’t properly modeled? Also, the approach I outlined has a strength whereby there’s no nonce to mismanage which is a big strength.

tptacek · 2025-04-30T03:02:39 1745982159

You'd have to publish details of the construction you came up with for me to have anything to say about it not having or needing a nonce.

dadrian · 2025-04-29T17:15:14 1745946914

I dunno why you say it isn't useful. It is inherently plaintext, but still worth authenticating. If you just used an AEAD but didn't put e.g. the session identifier or connection ID or sequence number in the AD, it would be entirely unauthenticated, but the decryption of, say, the message body would still succeed.

tptacek · 2025-04-29T18:02:16 1745949736

I'm not saying it isn't useful, I'm saying it's not a useful example for getting people to understand the concept. Everyone runs aground on "but you need the decryption key to authenticate the plaintext anyways".