Windows 7 patch for Meltdown enabled arbitrary reads and writes in kernel memory

lmilcin · on March 27, 2018

This is what happens when devs are presented with a very complicated problem, extremely short deadline and enormous amount of pressure.

gruez · on March 27, 2018

>extremely short deadline

they had 6 months.

maltalex · on March 27, 2018

> they had 6 months.

You say that like 6 months is automatically a lot of time. "She had 6 months to give birth". Yeah, only it takes 9, so 6 is short.

Consider the scope and depth of the issue and the fact that they probably couldn't involve too many people on this effort.

zeth___ · on March 27, 2018

Did she try having the pregnancy in parallel?

Doesn't sound like she was trying at all.

jahewson · on March 27, 2018

Twins - double the bandwidth but the latency stays the same.

johnhenry · on March 28, 2018

0.22 vs 0.11 bpm is actually a big improvement despite the latency.

TeMPOraL · on March 28, 2018

"Never underestimate the bandwidth of a station wagon full of babies hurtling down the highway"?

bayangan · on March 28, 2018

Truly a quote for ages

efdee · on March 28, 2018

It isn't if you need the baby in 6 months.

avereveard · on March 28, 2018

just get one from any outsurcing firm then

johnhenry · on March 28, 2018

Pre-caching.

tiuPapa · on March 28, 2018

She should have used Rust for a fearless pregnancy.

walterbell · on March 27, 2018

Other operating system maintainers had only days/weeks in Jan 2018.

loufe · on March 27, 2018

We're all throwing darts in the dark here with regards to the resource they gave the problem and its difficulty. I think the real takeaway is just that it could be something other than just stupidity through and through.

mannykannot · on March 27, 2018

Unless you can be sure their response solved their problem without introducing others, it's not evidence of sufficient time.

rtpg · on March 28, 2018

So I agree with the principle that a lot of time is a lot of time.

Inversely, though, I'd argue that Meltdown is a relatively small problem! It's strictly around memory usage, cache, and calling patterns. There's not a lot of systems at play, though there's the hard "figure out which order of instructions gets the state machine in a dangerous state" problem. There's a lot less coordination involved than, say, a system call bug that would subtly return the wrong answer half the time and you know that programs sometimes rely on this and others crash because of it.

Some things are hard, other things are hard but at least they're basically math, and math has a bit more determinism involved. Imagine if UX design or debugging strategies could always be broken down into state machines!

avip · on March 27, 2018

Great analogy. As an ex-manager always said: "3 women don't deliver in 3 months".

[E I see walrus01 already got that]

pizza234 · on March 28, 2018

This is Scott Adam's (Dilbert) hilarious take on it: http://www.dilbert.com/strip/2007-09-03

walrus01 · on March 27, 2018

corollary adage: nine women cannot gestate and give birth to a baby in one month.

DrScump · on March 28, 2018

See Brooks' little-known sequel: The Mythical Woman-Month

friedButter · on March 28, 2018

But 9 women can give birth to one baby per month on average

ptaipale · on March 28, 2018

Not really for a period longer than 9 months. Of course, you can have even 9 women deliver 9 babies in one month, but none in the following 10-12 months.

stagbeetle · on March 28, 2018

Retort: That's where you're wrong! If we hooked up nine mothers to one single faetus, we could get the job done in 9 months.[0] The same way, if we hooked up our dev teams to a lead that could delegate the work properly, we could pump out a Meltdown patch in around a month and a half.

http://www.pnas.org/content/early/2012/08/28/1205282109?sid=...

vageli · on March 28, 2018

> Retort: That's where you're wrong! If we hooked up nine mothers to one single faetus, we could get the job done in 9 months.[0] The same way, if we hooked up our dev teams to a lead that could delegate the work properly, we could pump out a Meltdown patch in around a month and a half.

> http://www.pnas.org/content/early/2012/08/28/1205282109?sid=....

Except this whole train of thought falls apart once you consider the difficulty of "hooking up 9 mothers to a single fetus". In the same way you down play the difficulty of coordinating multiple teams for a solution around breaking research. Show me a working solution of the former and I'll accept the corollary.

mygo · on March 28, 2018

you're looking at the problem all wrong, maltalex. Just hire a developer who is already 3 months pregnant.

chris_wot · on March 27, 2018

The Linux kernel developers came up with a decent solution, how can it be that they can do this and the Microsoft developers cannot?

rxhernandez · on March 27, 2018

Is Windows written the exact same way as Linux? Never underestimate the amount of technical debt that can be holding a team down.

acct1771 · on March 28, 2018

Ding-ding-ding, this is the non-bs answer.

tinus_hn · on March 28, 2018

Is that an excuse though?

chris_wot · on March 28, 2018

It appears not, if such a bad bug can get through all the way to release.

Raymonf · on March 28, 2018

"Linux" had a bug in which you could log into a system by pressing backspace 28 times a few years ago. And by Linux, I meant GRUB[1], and in turn, (many) Linux systems.

We're comparing Linux and Windows, an operating system that contains 3.5 million files[2] (of course, not just the kernel in this case). That isn't really fair. Code is as perfect as humans can make it, and it certainly does not help that there's so much to take into account.

[1] http://hmarco.org/bugs/CVE-2015-8370-Grub2-authentication-by...

[2] https://arstechnica.com/gadgets/2018/03/building-windows-4-m...

pecg · on March 28, 2018

This GRUB bug you are talking about, is not a kernel problem though; on a side note, I'm going to read on the links you provided as I want to see if encrypted root partitions could also be compromised, I suspect no.

chris_wot · on March 29, 2018

That's not quite on par with this Windows bug, but I take your point.

nl · on March 28, 2018

The Linux kernel developers hate their solution, and they only used it because they can't think of a better one. It causes enormous increases in complexity and kills performance in many cases.

They revived previous work on this as part of the KAISER work in November 2017, and still had major bugs with it in February 2018 (ie, 4 months later). That's pretty similar to the 6 month timeline mentioned here.

https://lwn.net/Articles/738975/

https://arstechnica.com/gadgets/2018/01/whats-behind-the-int...

amluto · on March 28, 2018

Linux kernel developer here. I don’t know know how MS’s Meltdown solution differs from Linux’s, let alone whether I should hate it.

MS (I think) uses IBRS to help with Spectre, and IBRS is not so great. Retpolines have a more fun name at the very least :)

sewer_bird · on March 28, 2018

I think he means Linux kernel developers hate (their own) solution for Meltdown.

caf · on March 29, 2018

The essential element of the solution for Meltdown is the same in every x86-64 OS: unmapping the kernel when in usermode. This is widely hated because it makes kernel entries and exits much slower, and blows away the TLB if your hardware doesn't have PCID support.

nl · on March 29, 2018

Yes, this.

It sucks, but what else can one do?

cesarb · on March 28, 2018

The Linux developers had a head start in the form of the KAISER (later KPTI) patch set, development of which had AFAIK started before Meltdown was discovered and reported in private to Intel.

gruez · on March 28, 2018

>AFAIK started before Meltdown was discovered

source?

cesarb · on March 28, 2018

According to https://googleprojectzero.blogspot.com.br/2018/01/reading-pr... Spectre was initially reported to Intel on 2017-06-01, and Meltdown a bit later.

After a quick web search, I found https://patchwork.kernel.org/patch/9712001/ which records the initial submission of the KAISER patch set at 2017-05-04. The repository at https://github.com/IAIK/KAISER has an older version of the patch set dated 2017-02-24, indicating that work on it had started even earlier.

Finally, the timeline at https://plus.google.com/+jwildeboer/posts/jj6a9JUaovP mentions a presentation from the authors of the patch set at the 33C3 on late 2016. Note that this page puts the submission of the KAISER patch set at 2017-06-24, but I believe that to be wrong; searching the web for "[RFC] x86_64: KAISER - do not map kernel in user mode" finds several mail archives with that message, and they all agree that the date was on May, not June.

That is, even if Microsoft had been immediately warned by Intel (or by Google), the Linux kernel developers would still have had a few extra months of head start, by basing their work on the KAISER patch set. Was it luck, or a side effect of the Linux kernel being used for academic research?

Cookiesaurusbex · on March 28, 2018

They said discovered AND reported, not just discovered. It is entirely possible someone discovered it much earlier and didn't report, but we won't ever know if evidence is never found.

xucheng · on March 29, 2018

KAISER is developed to mitigate another less severe vulnerability.

From the meltdown paper:

> We show that the KAISER defense mechanism for KASLR [8] has the important (but inadvertent) side effect of impeding Meltdown. We stress that KAISER must be deployed immediately to prevent large-scale exploitation of this severe information leakage.

freehunter · on March 27, 2018

Because Linux is a completely different OS with completely different code and a completely different set of problems.

chris_wot · on March 28, 2018

That's entirely weak.

freehunter · on March 28, 2018

I think the WINE people are probably looking for your help, for some reason they still seem to think there's a few differences between the two. They'll be happy to know they've been wasting their time.

chris_wot · on March 28, 2018

[flagged]

stagbeetle · on March 28, 2018

[flagged]

sctb · on March 29, 2018

We ban accounts that attack other users like this. Please stop.

https://news.ycombinator.com/newsguidelines.html

michaelmrose · on March 28, 2018

This is not appropriate discourse here.

cesarb · on March 28, 2018

Another relevant factor I just thought of: the Windows kernel has more constraints, due to binary-only drivers which have to keep working. The Linux kernel could fix any incompatible driver at the same time, since they're all in the same git tree (out-of-tree drivers are not expected to be compatible with newer kernels).

lmilcin · on March 28, 2018

I agree. The philosophy is different. Linux is focused on having the right thing working, at the cost of compatibility (sometimes). Windows is (or at least was) focused on extreme compatibility and the actual features of operating system seem to be slapped onto the features of the previous version of the OS.

This seemed to work well for Windows audience in the past, also for Linux audience, due to the fact that they have different uses and audiences.

People seem to have segregated into those users that just want stuff working and those that want powerful operating system that allows them to do whatever they want.

At least that was until sometime the Windows 10 came...

rbobby · on March 28, 2018

> only it takes 9

What if we outsourced the QA to India?

yorby · on March 28, 2018

that's a really bad analogy... Note: I think that those devs should be up for death row /s

dmix · on March 27, 2018

A single person with a fixed 9 month biological timeframe is the analogy you chose to compare 6 months of a billion dollar company's software development time by potentially hundreds of developers (for better or worse) but importantly for an extremely critical class of bugs and therefore that's "just how it is"?

Come on, software is hard but when you fix a vulnerability and expose a far far worse one, and had months to plan, execute, and test it, then you are most certainly justified in being criticized.

It's not like we're saying the code is shoddy and needs work, which is entirely excusable in a short timeframe. It's that they've left users far worse off in the end then from where they started.

amluto · on March 27, 2018

If MS had allocated 1000 devs to fix this issue quickly, the result would have been an utter disaster.

dmix · on March 27, 2018

Of course 1000+ developers working on one single solution waterfall style in a short timeframe is a terrible idea. That's not how software works... and we all know that. You know that.

Jumping on the next worst thing does not excuse them either. Nor is taking another analogy to the other extreme helpful at all in this discussion.

A solid pool of talent with complete flexibility resource-wise and a strong critical-level mandate is nothing like a single person with a fixed biological timeframe, with relatively limited resources, no matter which way you'd like to spin it.

justinjlynn · on March 27, 2018

If they had allocated 1k devs into n teams to develop different approaches and review and test each other's code and approach. No, the result would've been a better patch and probably not that piece of hot garbage.

rhizome · on March 27, 2018

I suggest reading The Mythical Man Month sometime, you're not accounting for the complexity of running such an "n-teams" scheme.

justinjlynn · on March 28, 2018

I have read it and yes, I have. Assuming they had that many developers qualified to work on the problem, they'd also likely already have been employed on other projects. Therefore the management infrastructure would already be in place. The state would need to change, but yeah, the government would already be there and qualified.

hinkley · on March 28, 2018

My own personal dogma is that your CI/CD system hasn't achieved its goal until everyone on the team can spool up a given build of the code and try to reproduce an error for themselves without interrupting anyone else to do it.

The person who discovers the bug may not come up with the best repro case. The person best equipped at fixing the bug may not be best person to track it. Being able to spool up new people on a problem for cheap keeps the whole experience lower stress and generally improves your consistency with regards to success.

If the cost of someone trying a crazy theory is linear in man-hours and O(1) or even O(log n) for wall clock hours you're going to look like a bunch of professionals instead of a bunch children with pointy sticks.

From what I understand, Microsoft has never gotten there. They got too big to fail a long time ago. And certainly wouldn't have for Windows 7.

justinjlynn · on March 28, 2018

Not only that but the teams would be working independently by design. 9 people can't make a human in one month, but 9 people can make 9 children in 9 months. You can then choose amongst them. So, yeah, I have no idea why you're bringing in mythical man month stuff here.

stagbeetle · on March 28, 2018

Anything sufficiently complex can be broken down into simpler pieces. This includes most developer generalists.

TeMPOraL · on March 28, 2018

Anything sufficiently complex can be broken down into simpler pieces plus the glue holding those pieces together.

In human organizations, that glue itself gets incredibly complex and expensive, as number of pieces grow.

stagbeetle · on March 28, 2018

I disagree that glue is expensive and complex. When you build a ply-wood tower in school to see who's holds up the most to compression, you don't douse your entire structure in glue. You get points off, because it adds so much to weight!

People are the plywood, fragile, finicky, and useless if left to their own devices. Management is the middle school kid who needs to take the wood he's been given and make something that will hold up to all the weight that'll be put on top it. In order to do this, he's been given a hot glue gun and enough glue to mummify the entire thing if he so chooses. Most of the kids will rush bullheadedly (or should I say uncaringly) into gluing the sticks together into something that "looks like it should work." They use too much glue, the structure isn't optimized for load handling, and when the day of truth comes, it crumbles down when the bucket that's supposed to hold the weight, destroys it!

What is glue? Whatever management wants it to be. It can be a team leader or a hastily configured IRC channel. In my experience (this includes organizing, delegating, and making sure that 40 devs-et-al get what's needed done), if you choose your sticks right, taking the time to make sure they're not hiding any structural faults, you can make the job 65% easier. If you lament that choosing sticks if difficult, I reply with "it's just practice."

The main issue I've seen, has been the all too common "there are no good managers." Especially in technology. The remedies for this? There's no bandaid. Each manager has to realize his personal shortcomings and fix them. But, to throw up his hands and say "the more people working on a project, the slower it'll get done," is a nice way to say "I can't handle all these people, but I'll excuse that away by saying it's inevitable. It's even industry 'common sense!'"

amluto · on March 27, 2018

All but a very small number of those teams would have spent quite a while reading manuals, reading code, and learning how the kernel entry code and pagetable handling code worked. Then they'd come up with something, but there would be a severe shortage of reviewers.

Not to mention that the whole problem would most likely leak once that many people knew about it.

thatfrenchguy · on March 27, 2018

There's probably no 1000 good VM engineers out there, in the world.

dmix · on March 27, 2018

Yet it didn't take 1000 of them to fix it on other OSes. Why are we even debating 1000 devs anyway? That is hardly the point and throwing more and more bodies at a programming problem is hardly ever a solution, nor one I proposed in my original comment.

It's ultimately a matter of talent, resources, and proper management. Which is hardly an insurmountable problem for a major tech company which decades of experience solving world-is-ending bugs.

chris_wot · on March 28, 2018

I put it down to lack of openness to a wider review than just Microsoft engineers.

golergka · on March 27, 2018

https://en.wikipedia.org/wiki/Brooks%27s_law

Nine month analogy is widely known in software development.

bitL · on March 27, 2018

Software is all about capturing as many income streams with as fewest people as possible.

exikyut · on March 27, 2018

While also generating the most number of jobs possible.

sova · on March 28, 2018

What kind of jobs? Architects developing biotecture, or janitors cleaning up vomit and firefighters putting out dog shit that's burning?

sddfd · on March 27, 2018

Are you sure that that particular team inside Microsoft had full six months?

Intel had 6 months.

gerdesj · on March 27, 2018

I'm pretty sure they had around six months give or take a day or so of oh shit in Intel. Of course, Intel may have actually simply broken the glass on a dusty old plan of action "In the event of ..."

zer00eyz · on March 27, 2018

Disaster plans are funny things.

I have had the misfortune of having to pull them out twice in my career - in both cases they offered little in the way of guidance for the particular situation that came up.

The set of unknown unknowns that are typically missed make most of them unless in all but the most narrow of cases, because many companies write them, and then forget them. Especialy if they are as large as intel.

gerdesj · on March 27, 2018

Very true, although I'm glad to say I have not had to break out one of my own yet for real. My first experience of a full on DR test was pretty humbling - NetWare servers backed up by the Unix troops via Legato. It turned out that the backups were good but restored at a pathetically slow speed (no reflection on the Unix systems but I suspect the Novell TSAs were a bit shag at the time). We updated "time to restore" estimations and moved on, after adding one or two other results of lessons learned.

Do test your plans (this is not aimed at you personally zer00eyz - you probably know better than most).

There are a lot of unknowns but the basic model of a real DR plan is pretty sound these days, if you can afford it or wing it in some way. An example:

Another site, a suitable distance away. On that site there is enough infra to run the basics - wifi, a few ethernet ports, telephony etc. There should also be enough hypervisor and storage for that. Some backups are delivered there as well as on site. Hypervisor replicas are created from the backups (or directly) depending on RPO requirements and bandwidth available. The only thing that should be able to routinely access the backup files is the backup system (certainly not "Domain Admins" or other such nonsense". Ensure that what is written is verified.

Now test it 8)

.... regularly

zer00eyz · on March 28, 2018

Ok now I have to share a story...

The company in question had a rather large on site server room (raised floor, fire suppression) and a massive generator to deal with any power issues as well as redundant connectivity. This room was literally the backup incase their "real" data center went off line.

The problem is that the room was "convenient" so there were plenty of things that lived ONLY there (mistake one) -

When the substation for the office went, and the generator started everything looked fine. The problem was that no one had ever run the generator for that long... after a few hours it simply crapped out (over heated, problem two).

A quick trip to home depot got them generators and extension cords that let them get the few boxes that were critical back up - however one box decided to not only fault, but to take it's data with it.

This is when I got a rather frantic call "did I still have the code from the project I did?" - they offered to cut me a check for $2000 if I would go home right then and simply LOOK for it.

Lucky for them I had it - and the continuity portion of the DR plan got revisited.

In hind sight after I said I had the code, I probably could have asked them to put another zero on the end of the check and they would have done it just to be a functioning business come 6am.

gerdesj · on March 28, 2018

I didn't even have to show some leg to get you to recant the dit.

Thank you - I'm happy to listen to (nearly) everything.

"I probably could have asked them to put another zero" - ahem that's not the IT Consultant's Way exactly. We have far more polite ways of extracting loot. We are not lawyers and should have morals.

TeMPOraL · on March 28, 2018

Reminds me of that The Expanse quote:

"I have a file with 900 pages of analysis and contingency plans for war with Mars, including fourteen different scenarios about what to do if they develop an unexpected new technology. My file for what to do if an advanced alien species comes calling is three pages long, and it begins with 'Step 1: Find God'."

campuscodi · on March 27, 2018

Microsoft had 2-3 months, tops.

gerdesj · on March 28, 2018

Source, proof of assertion?

bonzini · on March 28, 2018

As far as I know he's right. The news was given first to Amazon and Microsoft sometime in August. Consider one month for testing and preparing for release, that gives three months to build a solution for all supported operating systems. Two months to do it for the most recent version and one month for backporting to the older ones sounds about right. Maybe a few weeks more, but that's it.

campuscodi · on April 1, 2018

Intel's own press releases. I'm not gonna go digging into old links just because some random dude on the Internet can't use Google.

Aloha · on March 27, 2018

If the problem is complex enough, six months may be a short deadline.

azinman2 · on March 27, 2018

Especially since it comes as a surprise, and there’s already an existing train on its way to the next station with its own timetable.

cakes · on March 27, 2018

that surprise could include "management says we don't have to do anything :/" <5.99 months pass>, management: "we have to patch this and it needs to be done yesterday."

TeMPOraL · on March 28, 2018

That would be extreme, but I can entirely imagine it taking many weeks for the true importance of this problem to correctly propagate across all management levels.

yashap · on March 28, 2018

I wonder how long the actual devs fixing it had? From what I hear from friends who work there, Microsoft is a sprawling bureaucracy with many layers of management, where decisions are far from quick. I'd imagine that after Intel/whoever let Microsoft know about the exploits, it went through many levels of prioritization, negotiation about which team would work on it, not being brought into sprints because of other features already being worked on, etc. Most likely there were people with minimal knowledge of the relevant tech making all these prioritization decisions.

Wouldn't shock me at all if there was very little actual dev work done for the first few months, and then it was all super rushed at the end. Quite possibly the devs with the required knowledge didn't even know this was in the pipeline for months. That's par for the course at every decently large company I've worked at (i.e. 100+ devs), and at a beast like Microsoft I imagine it'd be way worse.

lmilcin · on March 28, 2018

I remember that Microsoft was able to deliver critical fixes practically overnight. This assumed that once you see the problem the fix is pretty straightforward.

Unfortunately Spectre and Meltdown aren't straightforward and go to the very heart of how the OS works. It's not at all easy to fix this when you have enormous amount of software working on top of it depending on every little quirk your solution provides.

yuhong · on March 28, 2018

Yea, it is probably the biggest change to the Windows kernel in a security update.

Tobba_ · on March 28, 2018

This is what happens when you don't have a QA department.

bonzini · on March 28, 2018

This is something that you find through code review, not testing. Apart from regression testing, but that presupposes that you encountered the issue before.

efdee · on March 28, 2018

Are you implying that Microsoft doesn't do QA?

Tobba_ · on March 28, 2018

If they do, whatever issues they're occupied with finding would call for an exorcism.

gwbas1c · on March 28, 2018

These kinds of things should be part of an automated test suite. Specifically, the kind of tests that were written years ago.

Honestly, Microsoft is really big into automated testing. I'm surprised this slipped through.

astrange · on March 28, 2018

I don't think there are any OS kernels that practice test-driven development - most of them don't even have code coverage working. It's also very hard to test for a problem you haven't thought of yet.

caf · on March 27, 2018

A really simple test you can compile with cygwin - if it doesn't crash, the bug is present:

  #include <stdio.h>
  
  int main()
  {
          volatile unsigned long *ptr = (volatile unsigned long *)0xFFFFF6FB7DBED000;
  
          printf("%lx\n", *ptr);
          return 0;
  }

mgerdts · on March 28, 2018

I seem to be unable to find a patch that will make it so that this doesn't run. Windows Update says that I have all required patches. I first tried KB4088875. That didn't cause this program to fail. Then I tried "2018-03 Preview of Monthly Quality Rollup for Windows 7 for x64-based Systems (KB4088881)", which was only a recommended update. That didn't help either.

mysterypie · on March 28, 2018

Same for me. I tested a Windows 7 x64 system which has all security patches, but caf's "really simple test" above still runs, which seems to indicate that the bug still exists. Same as you, I applied KB4088881, which was the only pending update, but it made no difference.

Also, I tried the command from the orginal article:

pcileech.exe dump -out memorydump.raw -device totalmeltdown -v -force

This creates 5GB file which does look like a raw memory dump. I'm not sure how to interpret this; I don't know what the behavior should be with or without the bug.

mgerdts · on April 6, 2018

In the off chance that anyone stumbles across this in the future, KB100480 fixes this.

https://portal.msrc.microsoft.com/en-us/security-guidance/ad...

CVSS 3.0 base score of 7.8.

BrianG61UK · on March 30, 2018

I finally found the fix. It's KB4100480. It makes the little test crash as it should.

BrianG61UK · on March 30, 2018

So there is no patch that fixes it?

BrianG61UK · on March 29, 2018

Same here.

I'm worried.

testplzignore · on March 27, 2018

I tested this on a Win7 x64 system with the 2018-01 (KB4056897) and 2018-02 (KB4074587) patches. It segfaulted. Hmmm.

testplzignore · on March 28, 2018

Ahh, I was using a 32-bit gcc. 64-bit gcc shows it :)

  $ x86_64-w64-mingw32-gcc meltdown.c -o meltdown.exe
  $ ./meltdown.exe
  1371183207

BrianG61UK · on March 29, 2018

But can you find a March patch that makes the correct 64 bit version segfault? I can't :-(

zalooopa · on March 28, 2018

123123123

keir-rex · on March 28, 2018

Grammar police warning: The comma in “if it doesn’t crash, the bug is present” actually makes the intention more difficult to understand.

kazinator · on March 28, 2018

The comma placement "if clause1, clause2" is extremely common. In the above sentence, there is no other place it can go, other than nowhere at all.

"if, it doesn't crash ..." nope

"if it, doesn't ..." nope

"if it doesn't, crash ... " nope

"if it doesn't crash, the " yep!

"if it doesn't crash the, bug ..." nope

"if it doesn't crash the bug, is ..." nope

"if it doesn't crash the bug is, present" nope.

When it is present, it does help to separate the if and then, particularly in the absence of the word "then".

Without the comma, the prefix "if it doesn't crash the bug" can be scanned as a viable clause, only to find that the suffix becomes a fragment.

sova · on March 28, 2018

You brute-forced comma placement. I tip, to you, my hat.

ralphie02 · on March 28, 2018

Thank you for this and everyone who has downvoted an incorrect and misleading statement

CiPHPerCoder · on March 28, 2018

"If x, y" is a shorthand for "If x then y" in spoken language.

caseymarquis · on March 28, 2018

That's the grammatically correct place to put the comma. Fairly sure you're actually grammatically required to have a comma there.

johnday · on March 28, 2018

well_done · on March 27, 2018

Well, let's take a breath and be grateful that even MS can mess something like this up.

One dev or 1000, who cares, whatever they chose did not work particularly well. Are "they" to blame? Yes. Are the engineers to blame? Probably not. Is management the culprit? We don't know.

What's left? Next time your customers bug you about some random downtime caused by an overworked datacenter intern, don't feel stressed. Take the time to remember that even if you would've had billions of dollars, years of experience and thousands of employees, you could've messed up, just like MS did :)

gerdesj · on March 27, 2018

"Take the time to remember that even if you would've [sic] had billions of dollars, years of experience and thousands of employees, you could've messed up, just like MS did :)"

When you are the direct, contracted, IT support for a company then you do have responsibilities. You might be considered responsible for timely delivery of patches - a fair argument in court I think. Mitigations might involve helpdesk logs as well as contracts.

well_done: Your tone comes across as BOFH. I'm possibly a simple PHB who owns an electric cattle prod that is wired up to the mains (three phase) but I prefer to get sign off for a project via work committed and not threat done.

freehunter · on March 27, 2018

>Your tone comes across as BOFH

I didn't read it like that, and I'm usually the first to read things negatively. I read it as motivational: don't feel bad about your own mistakes, even the big guys with tons of money and a lot of really smart people mess up sometimes. So cut yourself some slack and just do the best you can.

gerdesj · on March 28, 2018

"Next time your customers bug you about some random downtime caused by an overworked datacenter intern, don't feel stressed. Take the time to remember that even if you would've had billions of dollars, years of experience and thousands of employees, you could've messed up, just like MS did :)"

I missed the :) which might sound a bit naff now but was probably intended to deflect comments like mine. Hit taken. However I did invoke BOFH which is (I hope) normally seen as an indication that a comment is not to be taken too seriously.

EDIT: BOFH => Negative - nope, not here.

freehunter · on March 28, 2018

You're correct, I read "BOFH" as a negative. I apologize since you did not mean it in that way. I never thought that BOFH would ever be considered the good guy in the story :)

tonysdg · on March 28, 2018

> an electric cattle prod that is wired up to the mains (three phase)

"Have you seen the boss's new toy?" "Yeah. Coincidentally, I'll be working remotely moving forward. Good luck!"

well_done · on March 28, 2018

Coming across as BOFH was not my intention at all, maybe I should refrain from using emojis to convey meaning ;)

I just thought that if you're working in a high pressure environment (and this applies to virtually every coding shop I've ever known) and get trouble from all sides all the time it feels reassuring to know that in fact, not you are the problem, neither is your employer, stuff like this just happens even to the best.

amluto · on March 27, 2018

Yeesh. I didn’t know that Windows still uses the self-referential page table trick. This makes me very nervous, especially since they seem to keep it mapped in the user page tables. This seems likely to poke a big hole in ASLR if nothing else. It’s a huge target for write-what-where exploits.

ryuuchin · on March 27, 2018

They changed it in Windows 10 (RS1 IIRC)[1].

[1] http://www.alex-ionescu.com/?p=323

caf · on March 28, 2018

The tl;dr is that they're still using the self-referential page table trick, however the PTE_BASE is now randomised at runtime with dynamic fixups.

amluto · on March 28, 2018

If an attacker can’t find it by probing the smallish number of choices using one of many MMU layout fixes, I’d be quite surprised.

caf · on March 28, 2018

Does this bug also imply they're not using SMAP, since otherwise any kernel access to the page tables should have tripped it?

caf · on March 29, 2018

Replying to myself, I guess not, because the kernel doesn't modify the user pagetables using the user pagetables' own self-referencing entry. Rather it has the kernel %cr3 loaded and modifies them using a mapping of the user pagetables in those kernel pagetables.

caf · on April 3, 2018

..which, on further reflection, raises the question: if those page tables aren't loaded when in kernel mode, why do they even need a self-referential entry at all?

staticassertion · on March 27, 2018

> This seems likely to poke a big hole in ASLR if nothing else. It’s a huge target for write-what-where exploits.

How? Is it mapped to a static location?

monocasa · on March 27, 2018

It is in Windows 7.

staticassertion · on March 27, 2018

Good to know, thanks.

nialv7 · on March 28, 2018

What's wrong with using self-referential page table? I thought everyone uses that?

nikital · on March 27, 2018

According to the article, the self-mapping PTE is randomized in the latest Windows 10.

tambourine_man · on March 27, 2018

Security is hard. I don't think I'd have the stomach to work on kernel or encryption code.

The worst I can do here in userland is crash or delete data. And that's pretty bad already

HankB99 · on March 27, 2018

I dunno. I think a program that produces subtly wrong results is the worst. It reminds me of a project I once did. It involved producing reports from tens of thousands of records including summing some of the fields. I was constrained to work on Windows so I put the date into an SQL server database and used MS Access to produce very nice looking reports. I reviewed the reports and everything "looked OK" so I handed them to the users for approval. They users were accountants. They added up the partial sums and pointed out that the results were only approximately correct. It turns out that MS Access is not so good at arithmetic. I restructured the reports to perform the arithmetic in the SQL queries and just use MS Access to format it for a pretty page. I also checked the arithmetic before handing the next revision over.

Better testing (as in more than a superficial glance) would have caught this before review but there always exists the possibility that subtle bugs can sneak past even well thought tests.

Just my own experience and opinion.

NamTaf · on March 28, 2018

Agreed. I was doing an derailment investigation a number of years ago which involved digging through event recorder logfiles. The event recorder is such that it writes an entry every second but only updates GPS 20 seconds, so it writes GPS coordinates against every 20th entry.

What I discovered was that the event recorders on certain locomotives updated GPS at the 20th second rather than the 0th second. This meant that the GPS entry next to each line was in fact offset by 20 seconds - i.e. the entry for 02:11:40 was in fact what was sampled at 02:11:20. I think they must've held the GPS coordinate in memory somewhere but updated it AFTER writing that second's entry, so they wrote 02:11:20 whilst holding 02:11:00's GPS, then updated it, but then written that update at 02:11:40, etc. This was a fault with the design of the event recorders, not just one loco, as it occurred on each of that type that I looked at.

This confused me so much because it looked right - it was in the right general location, it was updating, etc. - but for a solid few days I did a bunch of analysis thinking the train was in a different location to where it really was. I eventaully picked up on it when subtle things kept not adding up and verified it by watching another loco come to a stop but then see the GPS keep moving for a bit afterwards until it settled.

I agree with you, subtly wrong results are the worst.

amluto · on March 27, 2018

I think there is a spectrum of styles of people who work on this stuff. At one end, you just write some code and see if it works. At the other end, you read specs very carefully, think about what you need to do, and do it. No matter which approach you take, you still get blindsided every now and then.

bzbarsky · on March 28, 2018

Yep. The most fun is when the spec has built-in security bugs. Especially when it only manifests in the interaction with a different spec.

make3 · on March 27, 2018

you likely write a lot more code than security/kernel-hardening professionals though. they just likely spend more of their time reviewing and researching than coding

blinkingled · on March 27, 2018

I guess upgrade to Windows 10 is the message Microsoft is trying to get out here?

Laying off all those QA and testing people will have some downside - MS seems to be letting older versions take the hit.

Silhouette · on March 27, 2018

Perhaps. However, Microsoft has published very clear guidance on how long previous versions of Windows would receive support for, specifically including the period for security updates. Doing what you're describing is effectively reneging on that deal, and that sends a very different kind of message.

blinkingled · on March 28, 2018

Sure, I have a hard time believing MS would purposefully screw their Win 7/8 enterprise customers - but the issue at hand suggests it's at the very least a byproduct of their new strategy of focus on Win 10 and the decision to do with less QA/Testers by involving more end users to participate in testing. Thus Win 7 users end up with slower, less tested patches and no hardware support backports. To be fair only the less tested patches sound terrible.

contravariant · on March 27, 2018

That or switch to unix.

Bad coding on one product isn't exactly the most convincing strategy to get people to try a different product.

flukus · on March 27, 2018

What sort of QA would be doing memory access tests? Most I've worked with can't even do automation scripts.

acdha · on March 27, 2018

That’s like saying most developers are WordPress install monkeys. It may be locally true but it’s not globally so. Good QA people – and I’m sure Microsoft and similar major players have plenty of them – are just as skilled as the developers but working at different goals. In addition to security, they’ll be working on scalability, detailed correctness tests, fuzz testing and other automation techniques, etc.

If they can be replaced with a script, your employer has a management failure and is wasting a lot of money on short-term savings.

computator · on March 28, 2018

Has anyone actually confirmed this bug? The author seems to be an expert in low-level DMA security, but it would be nice to see independent confirmation. Reading through the comments so far, it doesn't seem so. The closest anyone comes is this: https://news.ycombinator.com/item?id=16693599

I was hoping to see someone who said:

(1) I tested a Windows 7 X64 machine without the Meltdown patch (pre-December 2017) and couldn't read arbitrary memory.

(2) Next I tested with Microsoft's Meltdown patch KBnnnnnnn (Jan or Feb 2018) and could read arbitrary memory. The system is insecure.

(3) I then tested with Microsoft patch KBnnnnnnn (March 2018) and can no longer read arbitrary memory. They fixed it.

BrianG61UK · on March 30, 2018

I finally found the fix. It's KB4100480. It makes the little test crash as it should. Phew!

caf · on March 28, 2018

I did use a modified version of my short test program to actually test modifying the page tables to read a chosen physical address, which worked just fine. The bug is real.

BrianG61UK · on March 30, 2018

And KBnnnnnn that fixes it is which KB?

lifeisstillgood · on March 27, 2018

My take on this is a little bit dumb, but, once upon a time, many moons ago, I thought I understood the CPU I acted upon, I could peek and poke and look up what was where. I mostly wanted faster, but what i got was more complex.

Is there a way to just get faster without the complexity. What would a new cpu architecture and OS look like if we started again? is there room for open hardware to save us all?

exikyut · on March 28, 2018

Superscalar processing was unfortunately a major step forward in terms of performance.

The answer in https://stackoverflow.com/questions/8389648/how-do-i-achieve... is interesting; I'm particularly fascinated by the temperature warning (the answer author's CPU got to 76C in testing).

My CPU's sitting at 30C right now. It maybe climbs to 48C if Chrome's being stupid, and 50+C if I'm doing something mildly taxing. I've never made it go beyond 60C IIRC.

So, modern CPUs are so efficient that they're simply just never hitting their maximum throughput. I think that's pretty incredible.

The sad thing about CPUs that don't use modern (superscalar, multi-stage, microarched, etc) design is that they just can't keep up.

And people's OCD about speed and (more frequently) parallelization nowadays drives what they'll buy. Something had better have a killer feature if it isn't fast or highly parallel.

So it's possible, but a huge headache. Whatever you built would likely be highly purpose-specific.

notriddle · on March 28, 2018

https://millcomputing.com/ ?

rwallace · on March 28, 2018

https://riscv.org/

userbinator · on March 28, 2018

TempleOS.

avhon1 · on March 27, 2018

The article says that this vulnerability was patched in March 2018, so at least there's that.

0x0 · on March 27, 2018

Is the March 2018 update even out still? I thought they pulled it because it reconfigured all the NICs in the system, losing static IP configuration in the process...?

gerdesj · on March 27, 2018

I'm still working on a bloody huge list of customer updatathons for Meltdown and Speccy. Now I have to go back around a load of them that I have already patched and find the Win 7s and 2008r2s and update those before I continue.

Oh well, it gives me something to do of an evening 8)

Someone1234 · on March 27, 2018

Why are you manually patching workstations? WSUS allows central management (inc. zone deployment), but even in Windows 7's default state it should apply these updates without intervention.

2008R2 I can see doing it by hand, but Windows 7 clients is odd. Particularly as it seems to be taking you two months to apply urgent patches.

gerdesj · on March 27, 2018

Some of my customer VMs are Windows 7 - Veeam proxies for example. I also take backups quite seriously. Yes this is all a bit manual in some cases.

(Nearly) All of them are on the end of an IPSEC VPN that I can get at from home via the office web proxy and another VPN or via magic. Some of them have 192.168.0/24 or 192.168.1.0/24 - those are on the end of OpenVPN. I wrote this: https://doc.pfsense.org/index.php/OpenVPN_NAT_subnets_with_s... . You have no idea what networking is about until you've had to do that sort of nonsense a few times 8)

I don't have the luxury of one WSUS to manage, we have loads of the bloody things. Some customers have pretty skilled local IT depts, some have somewhat vocal users that accuse you of resetting their passwords after spending hours doing way more than they have paid for and would not understand what you are on about in the first place. I love them all equally as any parent would ...

When I get bored of watching Windows Update I run apt update && apt upgrade && reboot on a few machines and keep a weather eye on the monitoring system. When I get really bored, I run up yaourt on my laptop or my office PC. When I've got a newly installed Win system or two to patch, I fire up a few emerge -Uvh --deep --newuse --keep-going @world sessions (I'm not really joking here) or run up genkernel.

Yes, there is the default state designed by .... bbzzzzrrt .... soz, lost it, and then there is reality. Could I also remind you that there is rather more to patching than WSUS:

* Firmware - Dell, HPE and Co have had to do rather a lot of work here and had to start again in Jan when Intel dropped the ball * Hypervisors - I generally see VMware - that's a lot of patching and don't forget that some of them were buggered, so need excluding. * VM vHardware versions - yep, all those little lovelies have their own hardware types to worry about * "My fooking factory runs 24x7 - what are you going to do about it?" .... "Yes but you didn't go for the full cluster version sign I'll see what I can do" ...

You think I'm odd! No mate, my little company are well aware of automation and use it where we can but we are pragmatic and have to deal with a lot of reality.

We could of course bind our customers to our iron will and enforce our policy and stuff. They would not work on weekends or other odd hours. They would not insist on doing things their way and they absolutely would pay us on time - they generally do 8)

0x0 · on March 27, 2018

Wow, that's crazy. About as bad as it gets for local privesc!

pishpash · on March 27, 2018

So why does this only affect Windows 7?

muricula · on March 27, 2018

Parts of the memory management code were rewritten for windows 10 to do fun things like randomize the location of the page tables.

This fairly significant change wasn't backported to Windows 7.

Then when they went to backport the meltdown fix to windows 10 they set a 'this page is user accessible' bit in the page tables by accident.

cptskippy · on March 27, 2018

Probably because after Windows 7 they started a kernel rewrite , known as MinWin, to extricate the Win32 Userland tendrils that had crept into the Kernel since the NT days.

https://en.wikipedia.org/wiki/MinWin

container · on March 28, 2018

The article doesn't seem to indicate that MinWin started after Windows 7. If there has been a fundamental kernel change effort after Win7 (I'm not aware of one), maybe it has a different name?

cptskippy · on March 28, 2018

You're right, for some reason my brain said Vista came after Windows 7. Given that 7 came after Vista my theory makes no sense.

rocqua · on March 27, 2018

Any indication of whether this was actually exploited? I really don't want to do a full key-rotation routine.

Also, does windows 7 map the entire address space into kernel memory? That is, would this have enabled direct memory access to other processors.

MarkSweep · on March 27, 2018

My understanding of the article is that the page table itself was writable. So you an attacking process could map in the entire memory of the computer and read everything, regardless of what was in the kernel's version page table.

caf · on March 27, 2018

The attacking process could also put whatever code it wanted into the kernel, and so give itself full access to everything on disk as well.

daveheq · on March 28, 2018

This sounds contradictory:

"Only Windows 7 x64 systems patched with the 2018-01 or 2018-02 patches are vulnerable. If your system isn't patched since December 2017 or if it's patched with the 2018-03 patches or later it will be secure."

"I discovered this vulnerability just after it had been patched in the 2018-03 Patch Tuesday. I have not been able to correlate the vulnerability to known CVEs or other known issues."

yuhong · on March 27, 2018

The fun thing is that even MS admitted it break non-PAE kernels and pre-SSE2 processors (in the most recent one). I have been fighting a similar bug in one of the Jet 4.0 patches for a while now.

kerng · on March 28, 2018

Sounds like Microsoft found and patched it independently already. Afterwards someone else (blog author) found it too, maybe by reversing the patch from patch Tuesday.

ams6110 · on March 27, 2018

Predictable. Fixing old bugs introduces new bugs.

Arwill · on March 27, 2018

I wonder for how long Windows as a software can continue to grow. I looked at the list of services, and its crazy. So much exotic functionality, and so many of what i don't ever need. Then the file system, there are even hidden folders managed by windows itself, that just grow and take up space. All that adds to complexity, and increases the probability for bugs. I wish there was a version of the OS that just shed all that unnecessary functionality and returned to basics. Something like a minimalist Linux distro, but able to run all games and office.

ksk · on March 27, 2018

Hmm, but it appears that windows has fewer security bugs than Linux. Is there any data showing otherwise?

(TBH, this is already unfair, comparing the kernel with an entire OS)

https://www.cvedetails.com/top-50-products.php

https://www.cvedetails.com/product/47/Linux-Linux-Kernel.htm...

https://www.cvedetails.com/product/32238/Microsoft-Windows-1...

https://www.cvedetails.com/product/17153/Microsoft-Windows-7...

https://www.cvedetails.com/product/22318/Microsoft-Windows-8...

Erlich_Bachman · on March 27, 2018

Even before that it's already unfair to compare a closed-source product to an open-source system. Bugs are much easier to find in an open-source system. It doesn't even by itself mean that there are more of them. If you look at the big picture, it's not like Windows is known for it's security.

ksk · on March 27, 2018

>Even before that it's already unfair to compare a closed-source product to an open-source system. Bugs are much easier to find in an open-source system

Ironically, wouldn't that make it even more unfair for Windows? Shouldn't all the 'millions of eyeballs' looking at the linux code be making it more secure?

>If you look at the big picture, it's not like Windows is known for it's security.

True, but security bugs are easier to reason about, than feelings.

whatshisface · on March 28, 2018

We can't measure the number of security bugs, we're measuring how many get fixed. Fewer eyeballs on Windows would imply fewer discoveries, and fewer bugfixes as a result.

ksk · on March 28, 2018

>We can't measure the number of security bugs, we're measuring how many get fixed.

The number of bugs found should be trending towards zero since millions of people have the opportunity to improve the source code and prevent the bugs from being introduced in the first place. There are ofcourse other advantages to having the source be open, but if there is no security advantage to open source, that's going to put a dent in some of its marketing.

>Fewer eyeballs on Windows would imply fewer discoveries, and fewer bugfixes as a result.

Why would fewer people be looking at Windows compared to Linux? Security Researchers don't really discriminate. Or did you mean just the MS developers? Hmmm, I don't know how many windows bugs were found through external sources vs internal. Perhaps someone has already done that analysis..

Erlich_Bachman · on March 28, 2018

> Why would fewer people be looking at Windows compared to Linux?

As I wrote in the original post, this is because Linux is open-source. There are few people looking at Windows, simply because there is no source to look at, and as a result there are 10 times less people in the world who potentially even can look at it and check for bugs. That's why. With Linux you need basic systems programming skill and ability to code simple exploits. With Windows you either need to be working there (and be assigned to this task) - or reverse-engineer, which is a much rarer and complicated skill.

ksk · on March 28, 2018

Yes, in theory all that is correct.

shakna · on March 28, 2018

> The number of bugs found should be trending towards zero

Only if no new features are ever introduced.

Erlich_Bachman · on March 28, 2018

> Shouldn't all the 'millions of eyeballs' looking at the linux code be making it more secure?

Yes, this is exactly what happens, from my experience.

-> more people look at code

-> they find (and fix) more bugs

-> the system is more secure, because all bugs are found and fixed, instead of being kept inside the code and being sold on hacker forums and agency surveillance projects.

You also know that Linux is not just one codebase from 20 years ago, it constantly changes and adds new features? Of course there will be new bugs (like any other recent OS).

ksk · on March 28, 2018

>-> they find (and fix) more bugs

Where is the evidence that this happens? Do you have data (Open vs closed) showing more security bugs were found through developers, versus external sources?

>-> the system is more secure, because all bugs are found and fixed, instead of being kept inside the code and being sold on hacker forums and agency surveillance projects.

Why would a hacker fix a linux bug for free, but chose to sell a windows bug? That doesn't make sense to me.

sseth · on March 28, 2018

Might be worth reading this article :

https://media.blackhat.com/us-13/US-13-Martin-Buying-Into-Th...

It is really pointless using the count of CVEs as a measure of how vulnerable a product is.

ksk · on March 28, 2018

AFAIK, every single form of aggregation that reduces variance, biases your data set.

>It is really pointless using the count of CVEs as a measure of how vulnerable a product is.

I read the article, and that is certainly the opinion of the author here.

Security is a large field. You can reduce it to number of bugs. You can reduce it to the development process used to create the product. You can reduce it to methods of defending against future vulnerabilities. You can reduce it to methods of tackling bugs. You can reduce it in along any axis. I don't think using CVEs as a measure is pointless. I find them to be useful.

gerdesj · on March 27, 2018

"Hmm, but it appears that windows has fewer security bugs than Linux. Is there any data showing otherwise?"

Yes, I think you are being a bit of a noddy comparing a kernel with an entire OS. That said, all software has bugs. Blimey, how on earth can you compare the paltry 3000000000000 odd source files of Windows tucked up in GIT with the gazzilions of source files that comprises a modern Linux based system (let alone the BSDs etc).

I will simply mention here that when I update an LTS Ubuntu or Debian box I run "apt update && apt upgrade && reboot" (or use a GUI if I'm bored) and it takes a few seconds to minutes to update the entire system. Everything. That includes Java, Flash, Office suites, graphics drivers, USB drivers, printer drivers, CAD suites, database servers, web servers, PHP, Python, Perl, Rust, Go, ... need I go on. Everything. The same happens when I use pacman or yourt, or emerge, or yum, or rpm or whatever.

I'm personally CREST accredited, so have a fair idea about security and prefer to spend my time doing stuff and not waiting for updates to install (if I can even find them) - you?

exikyut · on March 27, 2018

FWIW WinXP is officially quoted as 45 million lines of code (https://www.facebook.com/windows/posts/155741344475532), everyone's decided Win10 is 5-10 (some say 15-20) million more.

I've been meaning to SLOCcount Linux sometime, actually!

Having said that, I don't think it'll be 45M LOC. The kernel is 20M LOC (https://www.linuxcounter.net/statistics/kernel). Chrome is 18M (excluding blank lines/comments) (https://www.openhub.net/p/chrome/analyses/latest/languages_s...). LibreOffice is 9M LOC (https://www.openhub.net/p/libreoffice).

And then I found out that KDE is 60M LOC!! (https://www.openhub.net/p/kde)

GNOME is 9M (https://www.openhub.net/p/gnome).

But I'm guessing those two stats are comparing just the base desktop environment in GNOME's case with all the productivity apps (including KWrite et al) and system libraries (including QtWebKit et al). This must be kept in mind.

TL;DR, an incredibly basic system with just a word processor and web browser, and maybe a minimal windowmanager on top, would be 47M. Adding KDE in makes it 107M - but you're almost never going to use all of it (whereas with Chrome and LibreOffice some large proportion of that 18M and 9M is loaded into RAM and potentially targetable).

gerdesj · on March 28, 2018

Mate, the sheer amount of LoC in any modern system is nearly uncountable. I have been a serious Gentoo aficionado for many years. My lap has been burnt for hours simply compiling Firefox or LO. They are both massive and they are only two apps.

If you want to SLOC Linux then download it https://www.kernel.org/ and help yourself. Why not look here as well https://www.freebsd.org and others - those are my mates, and good ones.

ksk · on March 27, 2018

Sorry, did you have a point? I honestly have no idea what you are saying.

gerdesj · on March 27, 2018

Sorry, I did guild the lily somewhat, this was my essential point: "Yes, I think you are being a bit of a noddy comparing a kernel with an entire OS."

chris_wot · on March 28, 2018

You have no idea of the complexity of things until you delve into Windows Side-by-Side...

kevindqc · on March 27, 2018

99 little bugs in the code, 99 little bugs.

Take one down, patch it around...

127 little bugs in the code.

stefan_ · on March 27, 2018

I guess we know now that from the 3000000 files or what it was they boasted about, not a lot of them are some sort of unit test..

egeozcan · on March 27, 2018

How can you test against an unknown bug?

IncRnd · on March 27, 2018

It wasn't an unknown bug. This was a fix for a known bug. The point is that there was no test in place for memory isolation between users or processes.

FYI, security testers call this type of testing negative testing, which is different from functional testing that is done to test an app works "properly". However, for an OS this test is not negative testing but functional testing if the OS is designed to enforce user and process isolation.

asdsa5325 · on March 27, 2018

Testing to make sure you can't read from somewhere you are not supposed to read from seems like a pretty obvious test for an OS.

tedunangst · on March 28, 2018

There are lots of places you're not supposed to read from. Does any operating system have 100% test coverage of addresses that aren't supposed to be readable?

PeterisP · on March 28, 2018

IDK, there's a quite narrow whitelisted known range of addresses that should be readable by your process; you could (and should) certainly have an simple automated test that simply tries to read everything with the expectation that it should succeed only in known cases.

stefan_ · on March 27, 2018

This one apparently meant that something intentionally mapped for kernel access only was now accessible from userland. That is exactly the tedious, boring issue someone manually testing a bunch of applications will never find, but unit tests are designed to.