Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Edge platform team employee here. A couple thoughts on methodology, focusing on our scripted test (under "stay productive longer" at https://blogs.windows.com/windowsexperience/2016/06/20/more-...), which is most comparable to Opera's test; nobody seems to challenge our Netflix results:

* We did not enable ad blockers, because we were testing the efficiency of the network stack, browser rendering pipeline, etc.; if you cut off ads, you're effectively skipping half the assignment. This is a valid way to improve the user experience, but not a valid way to measure browser efficiency, especially because it will disproportionately impact some sites (like the news sites that Opera's test focused on) and have no impact on other sites (like Netflix, where Edge demolishes the other browsers). It's basically skipping a lap of the race and then bragging that you finished faster.

* It's also worth noting that the ad blockers are often detected by many news sites and will actually prevent the main page content from loading at all. Not sure if Opera's test accounted for this.

* Our test was designed to mimic real-world behavior: Watching YouTube (foreground and background), shopping on Amazon, browsing the Facebook news feed, searching on Google, opening email in Gmail, and reading Wikipedia. To reduce variability, we used WebDriver to instrument the tests (supported by all four browsers tested) and made sure each task was timed rather than just a loop of consecutive tasks (which could disadvantage or advantage browsers depending on other factors like pageload performance or network conditions, in a way that doesn't reflect user experience, which is likely to linger on a page once it's loaded). We then used the Maxim 34407 power instrumentation built into the Surface Book (which is why we chose the Surface Book for this test) to measure actual instantaneous power consumption at the hardware, sampled once per second and then averaged across the duration of the test. We feel strongly that this is a highly scientific and defensible test setup which mimics typical user behavior and, significantly, measures the same markup and the same duration, on the same hardware, in every browser.



How did you control for the specific ads the browsers were served? Ads are highly dynamic. Unless you (can) include the differences in your methodology you better eliminate that influence.


Ding, ding, ding and to be fair this is the problem in both methodologies. The only way to accurately perform this test is to spider a bunch of sites, save the contents to a locally hosted HTTPD and ensure all third-party JS calls are resolved locally and test both against the exact same sites as they were spidered at a point in time. You can simply not account for changes which may happen to the markup, ad network beacons, ads, metrics code, or even network routing, all of which could influence the test in one way or the other if not run from an identical static cache in a controlled environment. If I were doing this test I'd then skip the whole scripting automation part and simply add a meta-refresh to every page in the cache to sequentially take the browser through the content, giving each page something like 10 seconds to load and render. Simple, simple, and far more accurate.


You don't need to go to such complicated lengths. Just perform enough tests (as in, a statistically large enough amount) and a distribution will form. That also captures the variability of real world network effects.


What about different ads served to different browsers? Someone running, say, Opera will have a different ad profile than a Chrome user even when completely blank cookie-wise.


Maybe have them give the same user agent?


It's hardly complicated. I've put such tests together in an afternoon. In fact, whatever is added in complexity is gained by the fact fewer tests are necessary. Via this mechanism you can also remove any questions about compression, use of HTTP/2, etc., which could impact the tests based on server-side choices when it comes to serving data to either platform. Equal always equals better.


But those metrics are important, if servers serve more optimized pages to Edge users for some reason that a freaking important fact to know. This is about real world data and real experiences and how it affects actual real users. You can normalize the tests to the point where there is absolutely zero difference between the browsers, of that I'm sure, but that will not reflect any actual cases that real users experience.


That would be a whole different test then. Not about efficiency of the browser itself but about what the browser gets served.


The test was which browser gives you the better battery life while browsing the internet like a normal user.

Why is a different question and it's not that relevant, I often don't care why something I USE works better I just care that it does.

If it something I BUILD then I would care much more but again this is a whole different issue.


Within those specifications, the objection about the ad-block become irrelevant. If the browser justs works better, then users don't care and can simply enjoy more battery time.

The case for more normalized tests is to find out which browser is factually better designed/written.


But this is not repeatable.


Ads are not that much of a problem ads will even themselves out and if for some reason MSFT Edge users receive less ads or ads that are less resource intensive it's also an important metric.

I don't see anything that would somehow create a bias in favor of a specific browser as far as ad networks goes, if anything the stigma/stereotyping of IE/Edge users would probably mean that ad networks are more incentivized of sending the baity apps towards those browsers.

As for the network part well again that's an important metric if certain browsers perform better at adverse network conditions it's an important factor to know, you do not want to give them the best case scenario every time.

Giving a page a fixed amount of seconds to load is also completely the wrong approach you want to see how browsers behave when they can't load a page properly or when it takes more time than usual, maybe some browsers expend more resources by resubmitting the entire request, maybe some browsers do not parse the DOM tree from scratch when some of the requests stall, maybe some browsers have less resource intensive placeholders for DOM elements, maybe some browsers are better at adjusting the DOM preprocessor for network congestion than others.

So no I can't really see how would your approach would be any better, the approach that MSFT took was quite good, netflix, wikipedia, youtube, facebook etc. with what seems to be realistic user behaviour. What you want to do is to put in test that would produce fair results for fairness sakes that's not how you evaluate anything because it would not yield you any real world data.


That would be as far from real world as you can get.

What you need here is a big enough sample.


You run enough tests to even things out and if you still have a bias towards certain browsers then it's a statistically important metric on it's own.


You'd probably need to run multiple tests to account for the variance of ads served.


The way I read the blog is that the adblocking is enabled to show that even _with_ an expensive extension enabled it still beats Edge, but that was in my opinion not the gist of the post.

The post was about that microsoft is not transparent about the methodology/setup/scripts/target websites used. A third party should be able to support Microsoft's claims.

Is it possible for the setup to be published to your github.com/microsoft so that it may be executed automatically? Heck, go the extra mile put it in a CI and publish the data on regular basis :)


Adblock isn't an "expensive extension". Ads are the expensive thing, so blocking them saves power.


It's not quite that simple [0][1], or at least it wasn't always. AdBlock Plus in particular used to rely on a massive style sheet that was injected into all tabs and frames, which degraded really quickly on pages with a lot of iframes (which, as it happens, is typically pages with lots of ads).

I believe Firefox partly resolved the problem on their end later on, although I can't tell you the exact status of things. Nevertheless, there's definitely some precedence for claiming that ad blocking is an expensive operation. Intuitively, you'd think that network based blocking would be enough, but it won't work against same domain ad sources (a typical example being facebook ads), while css selectors are able to capture a bit more depth. Nevertheless, I don't know exactly what has happened since, although I do recall Firefox making some adjustments on their end that improved the situation.

[0] http://www.extremetech.com/computing/182428-ironic-iframes-a...

[1] https://blog.mozilla.org/nnethercote/2014/05/14/adblock-plus...


No, it really is that simple[1].

The only reason I can think of for someone not using uBlock Origin is because they've never heard of it.

[1] https://github.com/gorhill/uBlock#performance


I got rid of it because it made using the web nearly impossible and I got annoyed with constantly having to manage exceptions just to do things like view my Twitter analytics page. Downloading ads is less annoying than uBlock.


What filter lists were you using? uBlock Origin's behavior is identical to Adblock Plus given the same filter lists are used for both. uBlock uses a few more filter lists by default although I still think it's on the conservative end of the spectrum in terms of what gets blocked.

I've only ever had to whitelist two things in uBlock in all my time of using it and that's pretty good considering that I use basically all of the lists that aren't the language specific ones, the anti-anti-adblock one (which requires a user script) or the merged (ultimate) lists.

But yes, my browsing habits are clearly different from yours and what works for me may not for you. I understand that but what uBlock does is not any different than any of the other adblocks that are out there. It just happens to do it more efficiently and with a better UI than the rest of them. The only difference in behavior that you might encounter is likely to be related to uBlock Origin's strict blocking[1]. In this case exception filters are very easy to create since you literally just have to press the disable permanently and it will forever be disabled for that site.

I've said this before the the best way to use uBlock Origin if you've never used something like NoScript/uMatrix before (or couldn't be bothered with the whitelisting approach) is to try and use what it calls "medium mode"[2]. Using its dynamic filtering in this way should net you with the largest gain with the least amount of effort. If you're looking for something with more control than you may want to look into uMatrix since I think the interface will is nicer for that sort of control.

[1] https://github.com/gorhill/uBlock/wiki/Strict-blocking

[2] https://github.com/gorhill/uBlock/wiki/Blocking-mode:-medium...


Click ublock icon, click power icon, done.


yeah, so why have it installed at all?


So you can browse the web on your own terms.

The power button only disables it for one site at a time (or even just a single page, if you Ctrl-click it).


Clicking the power button only disables it for that specific site. Every other site would still have blocking enabled so that sounds like a pretty good reason to have it installed.


So that it blocks the 99.9% of unwanted crap.


Try dynamic filtering (in medium mode)[1] and raise that number even higher.

[1] https://github.com/gorhill/uBlock/wiki/Blocking-mode:-medium...


Also Opera has built-in blocking. It's not a JavaScript-heavy extension to begin with. ;)


Until we know exactly what websites/videos etc are loaded (also on the Opera benchmark!); you, me and all consumers do not know if this is true :)


> nobody seems to challenge our Netflix results:

Of course, this is Lotus vs Office all over again. You're using the OS advantage (Windows 10 and its DRM platform) to show that you can do things more efficiently.

You chose the technology platform (I'm guessing Play Ready), you forced GPU vendors to comply, and now you give competitors a bad reputation for daring to use their own (or a competitor's) technology.


Or, to put it another way: the have built a well integrated technology stack and are now demonstrating that it has superior performance on a metric we all care about.

Apple has a different approach (own the entire stack) but they're going for the same thing and it has made them a lot of cash.


Netflix is a rather specific example — don't they use Silverlight (as in "the only site anyone in the world cares about that uses Silverlight"?)

It's like Apple stressing how well their browser works on -- oh wait, Apple doesn't produce a proprietary plugin. Well, let's say the iTunes page, if it were specifically tuned to work well in Safari.


Edge and Safari run Netflix under HTML5's protected media path. Silverlight is only used for backwards compatibility.


Doesn't the Netflix test really test GPU decoding vs software decoding?

I'm not 100% sure on this, so any clarification welcome, but I thought that Netflix content contracts DRM requirements allowed GPU decoding via Playready, but not on some others like Widevine. Would the Netflix test still show such an advantage for Edge if there weren't any DRM?


It seems to me that if you're running on the same device, the gpu should not be a variable.


IIRC everything except Edge and Safari are forced to use Silverlight for Netflix streaming. I'm not exactly sure why this is, but it probably has something to do with the mess that is video codec licensing.


Chrome uses widevine (DRM) for Netflix which uses HTML5. Really the more significant difference is that Chrome gets capped to 720p. Edge actually gets the 1080p streams (as does IE11 on Windows 8+ and the Netflix Windows Store app).


Seems arbitrary, what's the reasoning for that? Isn't this exactly what everyone warned about with DRM in html5?


Your guess is as good as mine. Blame Netflix, Microsoft or the people pushing DRM. Maybe it's some combination of all of the above or none of them. I'm not sure anyone really knows why we have this behavior.

It's not necessarily DRM in HTML5 which is the issue here although I suppose the fractured nature of it may be partially to blame. It's the fact that it's seemingly arbitrary that Chrome gets capped to 720p and no one knows why. Did Microsoft pay a big chunk of change to Netflix for exclusivity or something along those lines? Do the content creators prefer Microsoft's DRM implementation over Widevine's?


Netflix plays fine in Chrome (maybe Firefox too) using HTML5, it was opt-in at one point not sure about now. I watch in Linux all the time, no Silverlight there.


Starting with Firefox 47, it works fine as well (macOS here), but you need to switch your user agent to Chrome (before you go to Netflix). I'm guessing Netflix hasn't updated their compatibility checker.


It's for DRM purposes


Given that basically everyone should be blocking ads (if for no reason other than that they are a vector for malware), native ad blocking is really an energy saving mode, a user experience improvement, and a crucial security measure all rolled up into one feature.

If the focus of this test is meant to be on the user experience, as the tone of the Microsoft blog post seems to suggest, then said native ad blocker is a killer feature that really might mean Opera users can "stay productive longer".

If not, then sure, Edge's underlying implementation is probably faster. It would be interesting to see a comparable benchmark of Edge with the best available ad blocking turned on.


Wow this is all fascinating! If it is not too much trouble, would you mind releasing the script that you used in your testing so that independent parties may verify your results.


It would be interesting to see this test run again with ad blockers enabled on each browser (the latest version of Edge has support for extensions).


It may not be fair to enable adblocking if you want to compare other engineering components, but these results still mean users in the real world will see more battery time with Opera than with Edge.


This is

* not based on observed user behavior

* not reproducible by you (recorded/replayed network where possible)

* not reproducible by peers

* in stark contrast to reproducible tests provided by others

You describe it as "highly scientific" and jump straight to a marketing campaign. Honestly, who's skipping a lap of the race?

It's pretty clear the data generated is useful from an engineering point of view. It can help identify problem areas. It's also absolutely clear that the conclusions being marketed are blatant misrepresentations of the work.


Could you disclose/open source your methodology?


So where is your script? Or are we just supposed to trust you?


you don't have to do anything.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: