The methodology is questionable on several fronts:
- you can't use the load event to compare advert-heavy vs. advert-free load times. The load event does not fire when the page is usable by the viewer, it fires when every last request is done, in particular even requests fired from various advert iframes. However, those requests may not impact page usability. The load event may even never fire - any offscreen 1px spacer gif that times out will cause that! I don't think there is a built-in event for "done-enough", but the load event is certainly misleading.
- opening the developer tools causes various side-effects; those can cause the page to load more slowly than usual. You shouldn't benchmark with developer tools open (unless you're explicitly targeting usage with devtools open...)
- comparing load times across browsers as reported by the browsers themselves may be valid, but it's not obvious. You definitely want to check that carefully.
- Measuring "peak" CPU usage is almost meaningless without considering how long the CPU is used.
- Measuring the chrome extension process CPU usage and memory usage isn't very helpful, because running this kind of extension causes CPU usage and memory usage in every content tab. Both of these statistics for chrome in this use-case are meaningless. You'd need to measure the memory usage and CPU-time of the entire chrome process tree to get meaningful results. Even in Firefox without e10s it's not valid to measure just the main process CPU and memory usage because plugins are in separate processes (and things like flash or h264 decoding can definitely use CPU and memory).
The only thing this page really makes a decent case for is that Chrome loads pages faster than Firefox - but even there, it's not clear we're dealing with an apples-to-apples comparison.
> you can't use the load event to compare advert-heavy vs. advert-free load times
I don't see why not. It's exactly what they want to show, how much quicker pages can load if you don't have to load the ads. And I personally am not satisfied with a page that has loaded to the extend of usability. Either a site loads fast or it doesn't.
> opening the developer tools causes various side-effects
As long as it's done for all the tests it should not matter. Whatever "side-effects" are introduced will be applied for all tests. Of course this means that the values themselves can't be used to compare the result against other tests, but the goal here was anyways to compare between the different ad blocks and not to generate values for general statistics.
Some pages will be becoming usable before the DOMContentLoaded event is fired. Some will become usable shortly after DOMContentLoaded is fired. Some will become usable when load is fired. Some will become usable several seconds after that. Some will never become usable. (Cynical aside: the ones with excessive ads might fit in that classification.) There is no event that measures readiness of a page, nor can there be; there can only be a collection of reasonable heuristics.
However, we don't need a perfect measurement, just a reasonable one.
For example, you might try to wait for all stylesheets to have loaded, and then run requestAnimationFrame until the latency is below some threshold (say, 100ms) to represent a somewhat-loaded-and-not-too-laggy page.
The point is, you want some measure that plausibly excludes ad-related resources since clearly you don't care how long it takes to load those. It'd also be polite to be transparent about the limitations of your measure.
This is where a human with a stopwatch may come in handy. The clock will start when the page request is sent. Stop the clock when, in your opinion, you think the page is ready to read. Alternately, if you feel the page is loading too slowly, and have reached the limit of your patience, press this button instead.
The event does not have to come from within the vanilla browser, without human intervention. It could be from a plugin you create to collect human feedback on load times.
Of course, the human element will significantly decrease the number of trials you can do within a fixed time period.
> Stop the clock when, in your opinion, you think the page is ready to read.
Even that is not enough. A number of sites (especially ad-heavy sites on mobile devices) look ready to read, but as soon as you try to scroll, are jerky and semi-unresponsive. It's particularly a problem on ad-laden sites of small news publications. Blocking ads sometimes mitigates this, though not always, as some sites insist on using Javascript to implement "fancy" scrolling which inevitably breaks.
(Semi-related sidenote: please, for the love of God, don't use Javascript to hijack browser scrolling. It's almost certain you will not get it perfectly right and some users will suffer. It's almost never worth it. This is the #1 reason I end up having to switch to Reader Mode in Firefox).
The maybe you stop the clock when the user navigates away from the page. If it takes 30s to read the content without ad-loading and 40s to read it with ad-loading, then by subjective user time, the page is 10s slower. That may be because the ads take 10s to load, or because the distraction causes the user to read at 75% normal speed, but it's still a measurable result from a single variable change.
I'd still question the usefulness of this rather arbitrarily number.
The only reason I can think of where I'd be interested in how quickly people can navigate a site without having loaded all the content yet, is when optimizing a site for people with really slow/bad internet connections. But that seems like a whole other topic that has barely any influence on the goal of the comparison.
You don’t need perfect measurements to make decisions.
Comparing times between browser-defined events in the same browser opening the same site on the same day is about the most apples-to-apples as you can get.
I'm not sure any of that matters. As long as the same thing was measured in the same way each time, I think the results are relevant.
For your first point, the measurement wasn't "usability", it was how long it took for the load event to fire.
Second, having dev tools open should affect all results the same way. The absolute number isn't important, it's the relative ordering, which should be the same whether dev tools are open or not.
The rest of your points are just nitpicking. Maybe he didn't use the absolute best way to measure CPU or memory usage, but I'm willing to accept they're good enough proxy measures that the results are relevant.
It's not a good enough measurement - it's a ridiculously poor measurement. You're trying to measure how fast an adblocker is by measuring how long it takes... to load the ads.
How is that in any way helpful? If anything, the proper, comparable load time for the adblocked version is infinite - they never complete loading the ads.
If you're going to run this comparison, you need to at least compare the same set of resources being loaded - after all, if you're willing to omit ads entirely, you clearly don't care how long it takes for them to load.
We don't need a perfect measurement, but this measurement is extremely biased. DOMContentLoaded would be better, but still far from good - you want to measure the load time of the resources the adblocker would not have blocked.
Knowing the time saved by removing the ads is a useful number to know, IMO. I'm willing to give the benchmarker the benefit of the doubt that he used the same block lists for each blocker, although spelling it out would have been a good idea.
This is exactly what ETW was designed to help you measure.
Running WPRUI with first level triage will measure CPU along with a bunch of other stuff. Enabling ReferenceSet under memory Resource Analysis will measure memory usage.
Chrome has an ETW provider, I poked around a bit and couldn't find any for FF (it looks like they register one for the JS engine?)
I'm not familiar with how to solve the "page loaded enough but not done" problem, but IMO if you measure end-to-end times it's probably a close-enough proxy. Maybe measure several times and report on those statistics.
(Edit: Taking a trace with/without the adblocker enabled should allow you to exactly count # of sampled cycles spent in the adblocker - easy to compare in WPA, as long as there are events emitted indicating page load start/stop).
(Edit2: Using UIforETW along with Chrome 46+ will let you see ETW events:
Also keep in mind they tested all extensions with their default filter lists configuration while it reads like EasyList and EasyPrivacy would be efficient enough. As applying filter rules is basically the main work of those extensions and most extensions can use the same lists, they should have set all to the same lists where possible.
- you can't use the load event to compare advert-heavy vs. advert-free load times. The load event does not fire when the page is usable by the viewer, it fires when every last request is done, in particular even requests fired from various advert iframes. However, those requests may not impact page usability. The load event may even never fire - any offscreen 1px spacer gif that times out will cause that! I don't think there is a built-in event for "done-enough", but the load event is certainly misleading.
- opening the developer tools causes various side-effects; those can cause the page to load more slowly than usual. You shouldn't benchmark with developer tools open (unless you're explicitly targeting usage with devtools open...)
- comparing load times across browsers as reported by the browsers themselves may be valid, but it's not obvious. You definitely want to check that carefully.
- Measuring "peak" CPU usage is almost meaningless without considering how long the CPU is used.
- Measuring the chrome extension process CPU usage and memory usage isn't very helpful, because running this kind of extension causes CPU usage and memory usage in every content tab. Both of these statistics for chrome in this use-case are meaningless. You'd need to measure the memory usage and CPU-time of the entire chrome process tree to get meaningful results. Even in Firefox without e10s it's not valid to measure just the main process CPU and memory usage because plugins are in separate processes (and things like flash or h264 decoding can definitely use CPU and memory).
The only thing this page really makes a decent case for is that Chrome loads pages faster than Firefox - but even there, it's not clear we're dealing with an apples-to-apples comparison.