I used to work at WhatsApp (until the end of 2019) on many things, including spe...

dgellow · on Sept 28, 2023

> WA chat is not HTTPS (or even TLS)

If you don’t mind, could you expend on this? Are there specific reasons to not be using TLS?

jedberg · on Sept 28, 2023

We didn't use TLS at Netflix either, and instead used our own encryption protocol that ran on top of HTTP. We could do this because we controlled the clients too.

The why was because of trust store issues. Every device has its own built in trust store, and especially on devices like TVs and DVD players, they couldn't be updated. After looking at all the devices we supported, there was no common certificate signer amongst all of them.

This meant that we would either have to get multiple SSL certs signed by different parties (some of which weren't all that secure) and present the right one depending on your device type, or we could just roll our own over HTTP. So we chose the latter.

toast0 · on Sept 28, 2023

Yeah, at WA we didn't have too much of a problem with trust store issues; although we did do extensive testing when we switched CAs. We did have to deal with the end of SHA1 certs though, I think we were able to get all of our clients to use sha2, but some of the platform browsers couldn't; and then we had to fiddle with our TLS server to send sha2 certs to some clients and sha1 certs to others.

Of course, there's not really very useful client identification in the TLS Hello, so you have to kind of guess who needs what. If we had to use different CAs for different clients, it would have gotten a lot harder, because it's not like we could rely on clients filling out SNI either. So then you need to get more ips for each service. I do recall needing to do that a little, but we only needed a single 'legacy' group that was useful for everything that couldn't manage the modern certs.

waiwai933 · on Sept 29, 2023

Our solution for the same problem was to just have different subdomains for each cert signer (and make sure we ship the right base URL for each manufacturer's app), so we didn't need to do any clever device-sniffing at the SSL termination point. I think rolling our own encryption sounds much scarier, but equally we weren't running at Netflix scale.

eadmund · on Sept 29, 2023

This discussion is another great example of why HTTP without TLS can be just fine, even desirable.

Sohcahtoa82 · on Sept 28, 2023

> Every device has its own built in trust store, and especially on devices like TVs and DVD players, they couldn't be updated.

Was creating your own certificate authority and pinning it in the app not an option?

toast0 · on Sept 28, 2023

Bringing your own trust store to system https libraries is not often supported. Especially when you get into kinds of embedded environments Netflix supports. You also might not have the capability to bring your own TLS library either. If it's a limited environment, you might only get reasonable performance if you use the system ciphers, and they may not be exposed as primitives, and x.509 parsing takes up a lot of code space in the likely event that you've got limitations there too.

jedberg · on Sept 28, 2023

In most environments you have to use the built in libraries for network connectivity, so you have to use their trust stores. Also space is very limited for the client, so you can't just put everything into it.

toast0 · on Sept 28, 2023

I should probably refer you to the encryption whitepaper [1], but the basics are that Chat uses the Noise Protocol rather than TLS. All things being equal, the security properties are about equivalent, however all things aren't equal. The Noise handshake is smaller than the TLS handshake, and Noise doesn't have extraneous features WhatsApp doesn't use. Additionally, at the time of Noise adoption, TLS lacked a means for 0-RTT data (now available with TLS 1.3 Early Data), which meant using TLS would have added at least one round trip; possibly two, depending on which TLS library used. [2] You can use TLS without x.509, but it's not very common; avoiding x.509 was a definite plus.

I wasn't much involved in anything on the chat channel, and I didn't do any implementation work on Noise, but I did some later prototype work with it, and if I recall correctly, it had much simpler framing than TLS as well; although maybe that was mostly TLS options getting me down --- the SNI header has 9 bytes of overhead, 5 of which are lengths, Noise didn't have anything like that as I recall. Do you really two bytes of versioning on every application data packet, like TLS has? I'm not sure you really need a type indicator byte either, context says you're sending a handshake packet initially, and then application data after that, but I'm pretty rusty on this now, so maybe there's a justification.

For users paying for internet by the byte, every byte counts. For users on networks with large delays, every round trip counts. For attachments, it's less critical (if your data access costs were high, you could configure attachments not to load) and that infrastructure was always built around http(s), so while there would have been an efficiency improvement to move that off https, it would be hard to justify the engineering time; especially post the move to FB infrastructure with its CDN that was easily configured for our attachments. OTOH, chat never ran on TLS, so adopting Noise vs adopting TLS was a choice we could consider, and we picked the best solution for us. Unfortunately, it's pretty easy to identify Noise vs TLS --- OTOH, the service IPs are already identifiable, so a little more blending on the protocol level wouldn't help much.

[1] https://www.whatsapp.com/security/WhatsApp-Security-Whitepap...

[2] Also using system TLS libraries is fraught with peril. It's fine, but not super great, for http, but using it for a custom binary protocol is going to be terrible. You'll need to debug all of the edge cases that the system https library doesn't hit, and will then have to craft workarounds that just work, even if you can't reliably identify the underlying versions because Android OEMs do weird stuff.

dgellow · on Sept 28, 2023

Thanks for the answer, I didn’t expect that much details!

blapp · on Sept 28, 2023

It's based on the Noise Protocol Framework in the outermost layer, which encrypts a compressed XMPP stream. The end-to-end encryption is done within various XMPP message payloads using the Signal Protocol, which encrypts message data serialized using Protocol Buffers, with different formats depending on the message type (text, image, video, sticker, etc).