What I usually do is that I use two hostnames. host1 for http requests which can...

kevincox · on May 23, 2024

It's complicated but typically yes. The simplest reason is that TCP+TLS handshakes require multiple round trips for a fresh connection. The CDN can maintain a persistent connection to the backend that is shared across users. It is also likely that the CDN to backend connection goes over a better connection than the user to backend connection would.

e1g · on May 23, 2024

> The CDN can maintain a persistent connection to the backend that is shared across users

We considered using Cloudflare Workers as a reverse proxy, and I did extensive testing of this (very reasonable) assumption. Turns out that when calling back to the origin from the edge, CF Workers established a new connection almost every time, and so had to pay the penalty of the TCP and TLS handshake on every request. That killed any performance gains, and was a deal breaker for us. It’s rather difficult to predict or monitor network/routing behavior when running on the edge.

kentonv · on May 23, 2024

This didn't sound right to me so I did some investigation and I think I found a bug.

Keep in mind that Cloudflare is a complex stack of proxies. When a worker performs a fetch(), that request has to pass through a few machines on Cloudflare's network before it can actually go to origin. E.g. to implement caching we need to go to the appropriate cache machine, and then to try to reuse connections we need to go to the appropriate egress machine. Point is, the connection to origin isn't literally coming from the machine that called fetch().

So if you call fetch() twice in a row, to the same hostname, does it reuse a connection? If everything were on a single machine, you'd expect so, yes! But in this complex proxy stack, stuff has to happen correctly for those two requests to end up back on the same machine at the other end in order to use the same connection.

Well, it looks like heuristics involved here aren't currently handling Workers requests the way they should. They are designed more around regular CDN requests (Workers shares the same egress path that regular non-Workers CDN requests use). In the standard CDN use case where you get a request from a user, possibly rewrite it in a Worker, then forward it to origin, you should be seeing connection reuse.

But, it looks like if you have a Worker that performs multiple fetch() requests to origin (e.g. not forwarding the user's requests, but making some API requests or something)... we're not hashing things correctly so that those fetches land on the same egress machine. So... you won't get connection reuse, unless of course you have enough traffic to light up all the egress machines.

I'm face-palming a bit here, and wondering why there hasn't been more noise about this. We'll fix it. Talk about low-hanging fruit...

(I'm the tech lead for Cloudflare Workers.)

(On a side note, enabling Argo Smart Routing will greatly increase the rate of connection reuse in general, even for traffic distributed around the world, as it causes requests to be routed within Cloudflare's network to the location closest to your origin. Also, even if the origin connections aren't reused, the RTT from Cloudflare to origin becomes much shorter, so connection setup becomes much less expensive. However, this is a paid feature.)

e1g · on May 23, 2024

> So if you call fetch() twice in a row, to the same hostname, does it reuse a connection?

In my testing, the second fetch() call from a worker to the same origin ran over the same TCP connection 50% of the time and was much faster.

We want to use Workers as a reverse proxy - to pick up all HTTP requests globally and then route them to our backend. So our use-case is mostly one fetch() call (to the origin) per one incoming call. The issue is that incoming requests arrive to a ~random worker in the user's POP, and it looks like each Worker isolate has to re-establish its own TCP/TLS connection to our backend, which takes a long time (~90% of the time).

What I want is Hyperdrive for HTTPS connections. I tried connecting to backend via CF Tunnel, but that didn't make any difference. Our backend is accessible via AWS Global Accelerator, so Argo won't help much. The only thing that made a difference was pinning the Worker close to our backend - connections to the backend becamse fast(er) because the TLS roundtrip was faster, but that's not a great solution.

kentonv · on May 24, 2024

> The issue is that incoming requests arrive to a ~random worker in the user's POP, and it looks like each Worker isolate has to re-establish its own TCP/TLS connection to our backend, which takes a long time (~90% of the time).

Again, origin connections are not owned by isolates -- there are proxies involved before we get to the origin connection. Requests from unrelated isolates can share a connection, if the are routed to the same egress point. Problem is that they apparently aren't being routed to the same point in your case. That could be for a number of reasons.

It sounds like the bug I found may not be the issue in your case (in fact it sounds like you explicitly aren't experiencing the bug, which is surprising, maybe I am misreading the code and there actually is no bug!).

But there are other challenges the heuristics are trying to solve for, so it's not quite as simple as "all requests to the same origin hostname should go through the same egress node"... like, many of our customers get way too much traffic for just one egress node (even per-colo), so we have to be smarter than that.

I pinged someone on the relevant team and it sounds like this is something they are actively improving.

> The only thing that made a difference was pinning the Worker close to our backend - connections to the backend becamse fast(er) because the TLS roundtrip was faster, but that's not a great solution.

Argo Smart Routing should have the same effect... it causes Cloudflare to make connections from a colo close to your backend, which means the TLS roundtrip is faster.

e1g · on May 24, 2024

Thank you for looking into it in such detail based on an unrelated thread!

Cloudflare seems to consistently make all types of network improvements behind the scenes, so I’ll continue to monitor for this “connection reuse” feature. It might just show up announced.

bluk · on May 23, 2024

Were you using a Cloudflare tunnel for your origin?

e1g · on May 24, 2024

Yes, tried tunnels too. There is significant variability among individual requests, but when benchmarking at scale I found no meaningful difference in p50 and p90 between “Worker -> CF Tunnel -> EC2 -> backend app” and “Worker -> AWS Global Accelerator -> EC2 -> backend app”

TekMol · on May 23, 2024

Very interesting, thanks. That would make my setup even simpler.

1oooqooq · on May 24, 2024

the "it's complicated" part assumes client side requests i guess.

none of this comment applies to back end + db in the same server or colo.

mschuster91 · on May 23, 2024

> Are you saying it would result in a better user experience when I only use host1 which is behind the CDN and add no-cache headers to the request that can't be cached?

Yes, because that way you can leverage the CDN to defend against DDoS issues, and you can firewall the origin server itself so only the CDN is allowed to communicate with it, but no one else.

1oooqooq · on May 24, 2024

if your market is not usa or Asia, you actually get DDoSed by using cloudflare et all more often (>1) than not (=0)