6,953 reasons why I still let Google host jQuery for me

nphase · on Sept 21, 2010

I used to let Google host jQuery for me. And then one day their CDN went down for a large chunk of the midwest (where I live), and I noticed the render time of my site (and a few others) jump up anywhere between 5x and 100x.

That was the moment I resolved never to leave critical, blocking elements of any site I run into the hands of others, no matter how well known or reliable they are. (FWIW, this also includes ad network invocation scripts and similar, which always seem to be notoriously slow to load).

lsb · on Sept 21, 2010

One of the major problems involved with using third-party login systems, like Facebook Connect, is exactly that: you need to make sure it's not critical (so you need your own login system anyway) and you need to make sure it's not blocking (which involves iframe shenanigans).

Encosia · on Sept 21, 2010

I decided it's outside the scope of that post (which was already laboriously long winded), but you can/should use a fallback technique like this to mitigate the potential for Google downtime: http://weblogs.asp.net/jgalloway/archive/2010/01/21/using-cd...

Also, HTML5 Boilerplate has that built-in: http://html5boilerplate.com/

vog · on Sept 22, 2010

This technique doesn't address the right problem, which might be a misunderstanding in statistics.

That is, no matter how reliable the other source (Google etc) is, it is still below 100%. If you host the JS on your own, your site becomes slow only if your server hangs. However, if you host the JS somewhere else, your site becomes slow whenever your server or the other server hangs. The probability of the latter is always greater than the former, i.e. you don't really gain anything.

So this trick slightly improves the good cases, but increases the likelihood of the bad cases. That kind of trade-off isn't desirable. Usually, people design trade-offs for the exact opposite: scarifying the speed of the normal case (which should be more than fast enough anyway) in order to decrease the probability of the worst case.

(BTW, this is true for almost all long-living projects. The opposite strategy makes only sense in "car racing" like situations where you either win fast, or lose everything. However, hardly any website is designed to live only a few weeks or so.)

Lozzer · on Sept 22, 2010

Hosting jquery on your website increases the load on your website, this increasing the probability that your website will fail.

The more sites that use the CDN, the more likely someone coming to your site already has jquery in their cache.

If you're just trying to minimize downtime, regardless of cost then I absolutely agree with your analysis. However if you're trying to minimize something more complex involving downtime and cost, then maybe there's a point where it makes sense to use the CDN. Something very high volume like Twitter, for example.

points · on Sept 21, 2010

The fallback technique does nothing to mitigate the "It can connect but loading takes forever" case surely.

sounddust · on Sept 21, 2010

I've thought to myself, "I wish there was a timeout attribute on the <script> tag," about once every few months for the past 10 years. Is there any good reason you can't manually specify how long you want the browser to wait for an external file to load?

stellar678 · on Sept 22, 2010

There's no reason you couldn't code this up in JavaScript:

  load jquery from google
    - onload, set flag jquery_loaded
  settimeout load_locally_if_not_flag 5sec

toolate · on Sept 22, 2010

Don't forget that JavaScript is blocking. You'd need to load the jQuery from Google dynamically by inserting script tags.

By the time you've added that, timeouts and fallbacks the amount of inline JS would make hosting jQuery on a CDN pointless.

blasdel · on Sept 21, 2010

You can at least control the degree of blocking in the latest Webkit: http://webkit.org/blog/1395/running-scripts-in-webkit/

eli · on Sept 21, 2010

I believe FFox 3.6 supports async attribute as well

Encosia · on Sept 21, 2010

You're correct.

That fallback technique mitigates the majority of failure cases though (NoScript, overbearing firewall, blocked regions, etc). The CDN itself being slowly-down is vanishingly rare.

techiferous · on Sept 21, 2010

"The CDN itself being slowly-down is vanishingly rare."

Based on...?

It happened to me once a few months ago. It was down for hours. The negative impact was very real and painful and in my opinion outweighed the other advantages of hosting using Google's CDN.

Encosia · on Sept 22, 2010

Pingdom tested Google, Microsoft, and Edgecast's jQuery CDNs every minute for a couple weeks and found all of them averaged between 100-150ms to download jQuery[1]. Google's was actually the slowest of those, averaging a turtle's pace of ~130ms from all of Pingdom's datacenters. They're all so close that the Google CDN's overwhelming caching advantage should be preferable though.

More anecdotally, I've been running a few Pingdom type tests myself for a longer period, using uptime tools on few of my servers and mon.itor.us. Except for that brief outage the morning of May 14, 2009[2], I haven't monitored a net-wide outage or even a 250+ ms slowdown.

I'd be genuinely interested in any concrete data to the contrary.

[1] http://royal.pingdom.com/2010/05/11/cdn-performance-download...

[2] http://www.zdnet.com/blog/btl/cloudy-day-google-falters-pack...

techiferous · on Sept 22, 2010

June 18th, 2010 was when Google's jQuery went down for me (and many other people).

points · on Sept 21, 2010

I don't know, anecdotal evidence here, but the majority of cases I see pages taking a long time to load due to 3rd party js etc, it's waiting for actual data, rather than anything else.

The OP said:

"and I noticed the render time of my site (and a few others) jump up anywhere between 5x and 100x."

Which would indeed suggest that it did load, but at significantly reduced speeds.

Encosia · on Sept 22, 2010

These highly-available, distributed CDNs hosting static content don't have the same failure characteristics as something like an advertising script or Twitter widget. Where the latter do often hang the page (frustratingly), the popular CDNs that host jQuery aren't prone to that under any but the rarest of circumstances.

Maybe unrelated, but you'd be surprised how often the "Waiting for domain.com..." in your browser's status bar is misleading. Interactions between externally referenced scripts, images, and scripts that use document.write can produce "interesting" results in most browsers.

ciupicri · on Sept 21, 2010

But aren't these kind of events pretty rare?

vog · on Sept 22, 2010

As long as their probability is greater than 0%, they inevitably do more harm than good.

ciupicri · on Sept 22, 2010

The same thing can be said about other services and I doubt that Joe's hosting service availability is better than Google's. Though I admit that if Google fails and at the same time your site doesn't, it sucks.

mike-cardwell · on Sept 22, 2010

If your website is up, your locally hosted jquery is up.

If your website is up, your CDN hosted jquery is not necessarily up.

That was his point.

spokey · on Sept 21, 2010

It seems to me that unless the likelihood of a cache miss is fairly small you need to balance the probability of a cache hit against the expense of an extra HTTP call, as opposed to bundling the JQuery libraries directly with your custom JS with some JavaScript minimization trickery (and two HTTP calls if you're using both jquery and jquery-ui).

I have no doubt that the likelihood of a cache hit here is growing, but I wonder what the likelihood of an actual hit is? These data show that 4.7% of the top 1000 Alexa sites use some version of JQuery. What you'd need to consider is the likelihood that your visitor has (a) visited one of the those 47, (b) that is using the same version of JQuery as you are, and (c) has done it recently enough that the (relatively large) files are still locally cached. I suspect that for most sites that works out to much more than 4.7%, but is it more than 50%? If not aren't half of your users getting a slower response as a result?

(Moreover, and I don't know if or how this effects the JQuery CDN, but doesn't it seem like many sites drag because of delays in loading the Google Analytics JavaScript files? Wouldn't this pose an even greater problem if you're using Google to serve JQuery, since your UI depends upon it?)

photon_off · on Sept 21, 2010

I started off using Google CDN to host my jQuery file, then later ditched it because about 20% of the time there would be a noticeable delay in retrieving it (if I'd cleared my cache).

There's really no reason not to just host jQuery yourself. Use GZip, and set a far-future expires header. Ensure the jQuery file is named by version, so that if you update the version the cached filename will be different. That's all you need to do, really.

One last note: The benefit of putting script tags at the bottom of the body is very similar to having the scripts cached in the first place. Just in case you didn't know, putting script includes at the bottom of the page lets the browser render the page progressively as it retrieves the HTML text [generally very, very quickly]. Scripts in the HEAD block rendering, as the browser needs to be load each script file sequentially, in case there are dependencies. [Note: not exactly true, it will grab several in parallel and execute them in order, but there's still a delay.]

Whether or not the scripts are cached, very fast page rendering will make the page appear to have loaded quickly. Likely, the user will not require javascript by the time the scripts are loaded anyway, if they aren't already cached.

_harry · on Sept 21, 2010

You can also speed up load time using LAB.js http://www.labjs.com

It loads and executes all scripts in parallel. Just be careful and hide any elements that rely on javascript before the scripts finish loading.

mnutt · on Sept 21, 2010

Those are all good points, although as a different method you can offset some of the load time of putting scripts in the head tag by flushing the head as soon as possible:

http://yehudakatz.com/2010/09/07/automatic-flushing-the-rail...

(not a new idea, just new to rails)

MicahWedemeyer · on Sept 21, 2010

On the other hand, if you don't want to mess with minimization/packaging trickery, this gets you a nice quick win.

mike-cardwell · on Sept 21, 2010

Where is the messing? Where is the hassle?

wget https://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.mi... -O main.js

cat local.js >> main.js

mike-cardwell · on Sept 21, 2010

I'm not willing to hand over the security of my websites and privacy of my users to a third party, in exchange for my first page load to be fractionally shorter for a small number of my visitors.

Googles jQuery hosting is now a highly desirable target and I don't want to be included in the victims if it does get attacked. We learnt earlier this year how Google can be hacked.

Encosia · on Sept 21, 2010

If Google's CDN were hacked (as unlikely as that is), it's almost certain that you'd find out about it far sooner than if your own server were hacked. There would be a huge controversy and then it would be quickly fixed, probably in the course of hours or minutes, just like with the Twitter CSRF issue this morning.

Conversely, the Internet is absolutely littered with compromised sites that have been modified to inject malicious scripts.

The situation is similar to Linus' Law.

mike-cardwell · on Sept 21, 2010

Without using Googles CDN, they have to hack my website

With Googles CDN, they have to hack either my website, or Googles CDN.

Whichever you choose, it wont make any difference to how quickly you notice a local hack. Increasing the attack space doesn't make you more secure.

mikemol · on Sept 21, 2010

That's not a very high cost. They don't choose to hit you, they let their scripts and botnets look around for old and vulnerable software.

Have you looked at your raw httpd logs? When I look at mine, and grep away known-cookies, I see that I'm frequently scanned by hundreds of IPs looking for vulnerabilities in common software packages.

And that's just the stuff that shows up in logged HTTP queries. I don't want to think about how likely it is that tools like nessus are constantly being scan-run against IP ranges that I sit within.

Ok, sure, you can believe you're going to be more on top of things keeping your site secure than a high-value target like Google. I don't know how the target value of your site, but I doubt it's as high as the server the jQuery plugin you're afraid of pulling remotely sits on--and you can bet that Google knows they have high-target-value externally-facing assets, and are watching them even harder and with more eyes than you would.

mike-cardwell · on Sept 21, 2010

The thing we're discussing here is whether jquery.js is stored on my server with the rest of my website, or some other third party server. I'm not sure how the things you've said above apply to this discussion?

mikemol · on Sept 22, 2010

You were critiquing the security cost of hosting on your own server verses that other server. It was pointed out to you that the admins of that other server would likely learn of (and react to) a breach on their end at a lower latency than you would for your server.

You implied that the security cost for hosting on your server was actually lower, because you weren't as much of a target. My reply was an attempt to point out to you at a technical level why that was a specious argument; your servers are likely being scanned by the same botnets that are scanning mine with automated exploit attempts against old and vulnerable software, and common errors in securing a server.

It's going to be far easier and cheaper for them to take a shotgun-scanner approach against a large class of average systems than to apply manual, concerted effort against a small set of high-value targets like CDN nodes.

The cost to the attacker to attack your system with automated tools is near nil. They'll attack, and if they get in, that's gravy. Using "we're not a target" as a security model makes about as much sense as putting an unpatched Windows box in your home router's DMZ.

mike-cardwell · on Sept 22, 2010

I'm already hosting my own website on my own server. That attack space already exists. You seem to be misunderstanding this.

We're only talking about moving one of my files from my current website to an entirely different third party service over which I have no control...

Do you not understand this? Spreading my website over multiple services controlled by multiple people decreases the security... Obviously...

mikemol · on Sept 22, 2010

I think the part I may have misunderstood was where you said, "With Googles CDN, they have to hack either my website, or Googles CDN.", and I interpreted that as an exclusive condition, rather than an inclusive one. Probably the "either" that did that.

With that misunderstanding corrected, I believe you're generally correct on the security argument. There's still some plausible variation in terms of server security policy and implementation of things like intrusion detection, (Is it safer to keep all your money in your home, or is it safer to keep most of it in a safe deposit box in a bank?) but that's not the key problem I thought I noticed in your argument, and not one worth devoting energy into.

mikemol · on Sept 21, 2010

er. "known-cookies"? That was supposed to be "known-good queries." I think it's time to break for lunch...

_delirium · on Sept 21, 2010

One thing that doesn't seem to have numbers, though I think the data would be sufficient to give them, is what the caching probability is like after taking the fragmentation into account. If I reference the Google CDN URL for jQuery 1.4.2, how many of the top 200,000 sites reference that? I assume it's rather less than the 6,953 that reference any version, but how much less?

Encosia · on Sept 21, 2010

The split is about 50/50 right now. I've run the crawler three times in the last ~5 months and observed the transition from 1.3.2 to 1.4.2 moving along quite nicely though. After my first run, 1.4 adoption was so anemic that I was worried 1.3.2 was going to be jQuery's IE6. At this rate, it looks like 1.3.x should be a small minority by the time 1.5 rolls around.

CWIZO · on Sept 21, 2010

HTTP errors – About 10% of the URLs I requested were unresolvable, unreachable, or otherwise refused my connection. A big part of that is due to Alexa basing its rankings on domains, not specific hosts. Even if a site only responds to www.domain.com, Alexa lists it as domain.com and my request to domain.com went unanswered.

At first, that may seem like an awful lot of potential error. However, the one thing all of these inaccuracies have in common is that none of them favor the case for using a public CDN.

I would have to disagree with the last paragraph there. I think that if one is so incompetent that his page is not available without the "www.", that there is a very strong chance that such person hasn't heard of a CDN. So domains that are not working without the "www." are, in my opinion, favouring the non CDN way.

estel · on Sept 21, 2010

I've started to see some best-practice-if-you-ignore-user-expectation guides out there which say that allowing the domain to ignore the www is Not A Good Idea. I don't really know why this is the case, though.

eli · on Sept 21, 2010

I've seen people worry that if incoming links are split between foo.com and www.foo.com, it will affect Google's ranking of your site.

I don't think this is true and, anyway, the right solution would be to redirect one to the other.

mike-cardwell · on Sept 22, 2010

Google treats http://www.example.com/foo.html as a different page to http://example.com/foo.html and it is completely right to do so because they could be different pages.

You can use rel="canonical" to get around this, or a http redirect.

alanh · on Sept 22, 2010

Best practices concerning www, including rel="canonical" and redirects:

http://alanhogan.com/no-www

Encosia · on Sept 22, 2010

To be clear, I did not adjust any of my numbers to include an extrapolated extra 10%. Any numbers you see in my post are based on direct observation of a script tag's src attribute.

dtby · on Sept 21, 2010

if one is so incompetent that his page is not available without the "www."

Does this level of competence require that all services be provided on the root domain or just HTTP?

misterbwong · on Sept 21, 2010

The one reason we don't use Google's CDN for our public website: Some of our business users block sites by domain or IP, so they allow our site but block google's CDN. It's a PITA to get a rule added to a client's security setup.

joshuacc · on Sept 21, 2010

Do you know why they block Google's CDN?

misterbwong · on Sept 21, 2010

I don't know their institutional reasoning, but I believe they are on a whitelist based system. They're not so much blocking Google as they are allowing us through.

Encosia · on Sept 22, 2010

That's a scenario that the local-fallback technique handles well. The CDN reference will immediately fail for those overly-firewalled users, jQuery will be undefined in the next script block, and the fallback can detect that and inject a script element referencing a local copy instead.

redstripe · on Sept 22, 2010

I was playing around with the google map API one day when it started throwing very strange errors. Upon further investigation, I found the library URL was returning an html captcha page - not very useful to a browser expecting a javascript file.

Even google screws up simple stuff sometimes. So I think I'll pass on using their CDN for something as small as the jquery library. You're optimizing the wrong thing if you're worried about this.

endlessvoid94 · on Sept 21, 2010

I ran into a problem with https -- will google allow me to securely reference their CDN?

seiji · on Sept 21, 2010

Works for me: https://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.js

endlessvoid94 · on Sept 21, 2010

damn, my solution was to host it locally. how did i blatantly not try this?

seiji · on Sept 21, 2010

Everybody could use protocol relative URLs with the google CDN to simplify things:

  <script src="//ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.js"></script>

endlessvoid94 · on Sept 21, 2010

I learn something new every day.

dasil003 · on Sept 21, 2010

holy shit! Is that universally supported?

ktsmith · on Sept 21, 2010

Yes, but there's bugs when used with stylesheets in IE7 & IE8

http://www.stevesouders.com/blog/2010/02/10/5a-missing-schem...

threepointone · on Sept 21, 2010

Many thanks you just saved me from some ugly code I was using to check for http/https; this method is FAR more elegant.

estel · on Sept 21, 2010

Witchcraft!

(That's awesome)

jread · on Sept 21, 2010

Another option for using a CDN that will let you maintain better control and outage visibility, is to sign up for a paygo CDN account yourself. GoGrid and Speedyrails resell Edgecast CDN and Softlayer resells Internap CDN both of which are very good performing CDNs with both origin and pop pull models. The cost for these services would only be about $0.57 per 100k jQuery hits (assuming 24KB minified version is used).

Encosia · on Sept 22, 2010

One of the biggest underlying benefits of using a shared, public CDN for this is that you can take advantage of cross-site caching. As more and more people use it, the potential for that is greater and greater.

However, it only works if sites are referencing exactly the same URL; just referencing the same file is unfortunately not good enough. So, using private CDNs like those don't confer quite the same benefit (though they're a great idea for hosting site-specific assets, of course).

Encosia · on Sept 21, 2010

My original post about using Google's CDN to host jQuery generated a lot of discussion here, so I thought you might be interested in this one too.

il · on Sept 21, 2010

I'm interested in the technology behind your crawler, you could potentially use it to discover many more things, like which sites use popular APIs etc. What language is it written in? What do you use for the backend/DB? How fast is it? I'm working on a project involving similar large scale crawling and I would love to know more.

Encosia · on Sept 21, 2010

It's a C# console application that logs to a SQL Server Express database. It's fairly primitive and the source isn't anything I'd want to advertise. I'd be happy to share it with you if you don't mind C# and want to give it a whirl yourself though.

spokey · on Sept 21, 2010

You might be interested in http://trends.builtwith.com/

It doesn't give you a crawler or API, but they have done a lot of the analysis you're talking about and put it together in an explorable interface.

il · on Sept 21, 2010

That site is nice, but they also want to charge me close to $2000 for a list of sites using a single technology, I could definitely do this myself for much less.

Anyone want to build a free version with me?

spokey · on Sept 21, 2010

To be honest, I didn't dig into that site deeply enough to notice that. I pulled this out of an email I was sent just yesterday.

Sure, I'd be interested in collaborating if there's not one out there already.

The spider part is easy, you just need a web client (e.g. Ruby's or Python's Mechanize, Java's HTTPClient, even just wget or curl) coupled with an HTML parser (Hpricot, Nokogiri, Tidy, etc., or even some basic regular expressions). One can readily hack something rough together in an hour or two. Gabriel might have a lot of the data and certainly the code in order to produce DuckDuckGo, but he may have good reasons to keep that private.

The harder part, and the part that I wonder if builtwith is doing correctly, is to do the technology detection. Things like JavaScript libraries or CSS frameworks might be fairly easy to detect, but it is not trivial to reliably detect some of the server side technologies. I recently put together a script to survey the operating system and web server in use at a large number of domains from Alexa's top million list (similar to what Netcraft does) and there are plenty of servers that make that difficult, let alone determining whether a site is built with Ruby, Java or PHP. There are HTTP headers that could tell you, but not everyone uses them. There are certain signatures that give a pretty good clue, but those aren't always present and can be downright misleading. (I've seen sites that migrated from ASP to Java Servlets, for example, that kept .aspx URLs to avoid breaking links.)

If I remember correctly someone posted a JavaScript framework survey based on a similar spidering approach on HN a while back, you might be able to find it at searchyc.