Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Security and privacy risks of public JavaScript CDNs (httptoolkit.com)
131 points by fanf2 on Aug 11, 2024 | hide | past | favorite | 34 comments


If you truly worry about the performance of your webpage there are many, many places to optimize before ever considering the problematic CDN option. I suggest optimizing everything else first. How many pages I have seen that load 3 fonts with 8 variants each while using one font with two variants is too damn high. The amount of people who don't know how to scale and compress images as well.

I know this is not how most sites operate these days, but consider that your visitor wants to visit you and getting your website. Whenever you embed stuff from other servers you not only gift away your user-data and breach their trust, you just doubled your attack surface and lowered the reliability of your site. And for what exactly?

My suspicion is that developers find it easier to paste a CDN include than downloading the file and including it themselves. Because performance my ass.

That's a bit like that cookie notice thing. Guess what: if you don't collect personal data and store it on your users computer, you don't need to ask them for consent and suddenly your site looks a lot cleaner and needs to deliver less data.


> I know this is not how most sites operate these days, but consider that your visitor wants to visit you and getting your website. Whenever you embed stuff from other servers you not only gift away your user-data and breach their trust, you just doubled your attack surface and lowered the reliability of your site. And for what exactly?

The vast majority of people do not care nor even think about this. They might instead for example find it useful to see Twitter embeds right inline of the page rather than load the site externally.


> find it useful to see Twitter embeds right inline of the page rather than load the site externally.

I find it useful, but I find it even more useful when the tweet content is copied to the site instead of embedded because it's still there when the tweet is deleted or Elon decides it's not profitable to allow embedding tweets.


Most people I know complained at one point or another about convoluted consent popups and slow pages.

Sure they don't know the details, but they know it sucks.


> How many pages I have seen that load 3 fonts with 8 variants each while using one font with two variants is too damn high.

If a font is not used at all it is not loaded, that's why subsetting and loading with unicode-range works as an optimization.

I definitely agree that you should not use a these public CDNs though.


> If you truly worry about the performance of your webpage there are many, many places to optimize before ever considering the problematic CDN option.

Well, no. Loading just one file from an external domain (CD or not) equals one additional DNS resolution (and SSL handshake), equals a lot of wasted time, often blocking.

Self hosting scripts, css and fonts would be one the very first things I'd do.


Many will obsess over JS sizes.

But one image can easily exceed it.


That one image is probably far more useful to the person loading the webpage than the hundreds of MBs of JS.

It's best to reduce the size of images, but even doing that everywhere won't make the bloat of mostly useless JS any better.


Regarding the alleged performance benefits of using a public CDN: if there ever was a cache hit because a user visited one site then yours that coincidentally used the same CDN and query version (pro tip: never happened), and in a short enough time to not suffer a cache eviction, the user did not notice.

When the CDN was slow, every user noticed and thought your website was slow.

You gave away free analytics and made your website worse, there wasn't even a trade off.


Because of Cache Partitioning it's technically impossible to have a cache hit in your example.


True right now in 2024 (and so long as you're using an up-to-date browser). But not always true.


There definitely was a long long time on the web where everyone a half dozens versions of jQuery.

There's probably a dozen versions of react that make up a healthy 20% of the web.

> and in a short enough time to not suffer a cache eviction

These assets are cached by version. They can be basically immutable forever, with maximum expiration.

As commented elsewhere, you don't even have to give away free analytics. Just set a referrerPolicy on your <script> tags. And add subresource integrity to protect from the CDN being compromised.

I get that the Storage Isolation needs forced us to blow up the real awesomeness of CDNs. I'm ok with it. But it feels wild me to me to say that the cross-site caching wasn't worth anything. It was a super excellent path to fast loading, was enormously enormously helpful, especially for the age, when very few people had reliably low latency high bandwidth broadband.


> These assets are cached by version. They can be basically immutable forever, with maximum expiration.

That doesn’t mean they don’t get evicted though, right? I imagine a browser has a maximum cache size and FIFOs the cache if it is filled.


> But it feels wild me to me to say that the cross-site caching wasn't worth anything.

The vibe isn't the issue, the implemented solution just didn't work.


Exactly this… even before cache partitioning the benefits of public CDNs were overstated due to the connection overhead, and range of library versions sites used


> (pro tip: never happened)

It won't work today, but was this really so improbable in the past? There was a time when jQuery was so ubiquitous to be almost a "de-facto standard" at it also had a "canonical url" to load from (per version at least). Why would it have been so improbable that a user visited two or more sites that used the same jQuery version and also included it from the canonical url?


There were multiple free public CDNs, multiple versions of jQuery, cache evictions happen much faster than you're thinking, users had less RAM and disk in the past.

I can only imagine "naturally" producing this scenario by browsing HN in 2010 and clicking every link on the front-page and hoping at least 2 were indie blogs or startups that don't take security seriously, and even then if they're not consecutive, the website between them would have likely evicted the cache.


The website is also slow if the browser requests a resource from an origin that is very far away. Nothing is free. The typical benefit for large-scale CDNs is TTFB latency reduction, which tends to be felt rather acutely by users; and the cache improvements which happen on follow up requests. A big part of this for example is optimizing the TCP handshake (major providers have warmed backbones that skip the 3-way handshake across long paths, so users don't pay for 3xRTT across the Atlantic.) Nobody on this website ever mentions this fact because nobody here ever actually tests their sites under conditions like "the user has bad internet and is really far away from DigitalOcean NYC-3"

> if there ever was a cache hit because a user visited one site then yours that coincidentally used the same CDN and query version

This is not how modern browsers work anymore, and the phrasing makes think it isn't a rhetorical statement?

> the user did not notice.

... That's kind of the whole point; they didn't think the website was slow. They didn't think anything. The website opened and then they went on using it. You do understand this, right?


> This is not how modern browsers work anymore, and the phrasing makes think it isn't a rhetorical statement?

It's in response to the "you should totally direct link your jQuery from this CDN" propaganda.

> That's kind of the whole point; they didn't think the website was slow. They didn't think anything. The website opened and then they went on using it. You do understand this, right?

The marketing pitch included "faster".


https://addons.mozilla.org/en-US/firefox/addon/localcdn-fork...

Supply chain attacks will cause catastrophic damages and massive internet problems one day, as they have. DDoS/outages to the cdn js/resource suppliers and websites come to a standstill. why not host your own .js files? lazy upkeep?

I don't have a solutions for it all, but better to think about solutions now than when solutions are needed because a massive hack happened at some point.


Most web apps bundle and host js. The majority of cdn usage is there because that's how people outside engineering/IT, such as marketing, do it.


I’ve always wished a movie hacker would hack a CDN to take over every web page in the world. I don’t know what the real percentage would be, but it’d be more believable than a lot of the hacks in movies.


Hack google analytics and you've got hooks into a ton of sites.


This isn’t news, this is exactly what we said when the don’t be evil company was pushing them in the name of page loading speed (b/c the faster they get to display an ad making time to sell another ad!)


Does anyone have tool suggestions for manging these dependencies for legacy applications that aren't setup for webpack and similar tools. Lots of legacy sites still use things like jQuery and jquery-ui which don't work nicely in webpack without changing how your JS works. But I also don't want to be manually downloading the libraries and committing to the repo. Something like npm, but specifically for browser libraries, and it can just install the js resources straight into a "public assets" folder. Bonus if it can create a JSON file with the path to the libraries and the file hashes so we can reference them from server side code. All my attempts at npm and webpack/parcel etc fall apart with things like jquery-ui. Edit: years ago bower fit this requirement. I'm not sure it's fit for the current state of JS libraries now though, and they seem to recommend moving away from it.


> But I also don't want to be manually downloading the libraries and committing to the repo

Why not? This is the simplest and most robust technique.

Not everything needs to be part of some nerdtastic dependency resolution architecture. The web is messy. Grab the files you need, keep them safe and move on. There are so many other problems to solve.


> Something like npm

So an inferior version control system that works at odds with the main one that you're using to manage your repo (probably Git) and that works against anyone who's interested in what the version control system is supposed to be for?

Just commit the code to your repo. Orthogonal/overlay SCMs like NPM were a mistake—promulgated by people with serious misapprehensions not unlike those that are the target of the article linked here, i.e. those who are (still) insistent on using JS CDNs for <reasons>—ones they can't explain, but that they're sure makes it a good idea. I mean, doing it this way has got to be good for something, otherwise there wouldn't be so many people doing it this way, right?


Yeah Storage Partitioning has made the upside of shared cache not useful. As the article says. https://developers.google.com/privacy-sandbox/3pcd/storage-p... https://developer.mozilla.org/en-US/docs/Web/Privacy/State_P...

Still, I see the allure of having someone else cdn for you. They have a significantly better serving system than many of us, and it's traffic we don't have to serve.

A naive usage of cdn will have both information-leak problems and expose a security issue, as described in the article. But.

You can basically eliminate the information-leak problem by using a restrictive referrerPolicy (which can be set on fetch or a <script> tag). This will quite effectively blind the cdn to where specifically the traffic is coming from.

You can eliminate the security risk by specifying a subresource-intergrity for your assets. This will prevent the CDN from modifying the file from what you expect.


> it's traffic we don't have to serve

Call me old-fashioned, but if JavaScript is a large portion of your server's HTTP traffic, you're doing something wrong. (Unless the JS is only comparatively big because your website is just text, in which case, who cares? You can serve 100 requests per second of static text with a Raspberry Pi and a 50mbit connection. And with any more traffic than that, you'll want a CDN of your own anyways.)


Isn't the same true for popular cloud services?

One security flaw and thousands are affected.

Maybe back to on prem would be better.


It depends.

You can consume a cloud service from your backend, thus insulating it from your user’s browser.

Sometimes the cost of a 3rd party service is much much cheaper vs on prem (the cost of hardware, yearly maintenance, etc)

On prem isn’t strictly better security wise either. On prem software can have vulnerabilities baked in to the product which could be exploited much more quietly, since the blast radius is a single customer


> Maybe back to on prem would be better.

You don't have any clue about the sizes of the cloud customers' architectures. Like a Kubernetes Cluster of size far, far beyond its advertised 5000 nodes limit.

Maybe if you have a Wordpress site or a tiny internal-only corp React website then it might be feasible to go back on-prem. But the world is already run by a few cloud providers, whether you like it or not.


I doubt that most cloud customers have such big architectures.


Maybe not "most", but those big brands you rely on in your everyday life, they do.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: