Long-Term Consequences of Spectre and Its Mitigations

mwcampbell · on Jan 17, 2018

> I think it would be a grave mistake to simply give up on mixing code with different trust labels in the same address space. Apart from having to redesign lot of software, that would set a hard lower bound on the cost of transitioning between trust zones. It would be much better if hardware mitigations can be designed to be usable within a single address space.

I wonder what software redesigns he has in mind. As far as I can tell, best practices are already trending toward only one trust zone per address space. Some might argue that that's the whole point of multiple address spaces. I suspect that Spectre will accelerate this trend.

I do know how difficult this kind of change can be. The example I have in mind started before Spectre, and is unique to one platform. On Windows, developers of third-party screen readers for the blind are going through a painful transition where they can no longer inject code into application processes in order to make numerous accessibility API calls with low overhead. This change particularly impacts the way screen readers have been making web pages accessible since 1999. For the curious, here's a blog post on this subject: https://www.marcozehe.de/2017/09/29/rethinking-web-accessibi...

candiodari · on Jan 17, 2018

According to a blind friend of mine, the web, despite the constant touting of it as being great for accessibility, has been a total disaster for accessibility. As you point out, windows applications have much better accessibility (and Microsoft still cares) than most webpages.

mwcampbell · on Jan 17, 2018

I'm not surprised by your friend's observations, but that's rather beside the point of my comment. I was simply giving one example I know of a case where redesigning a system to work efficiently across address space boundaries is difficult. I'm curious about other examples, particularly in mainstream applications.

nils-m-holm · on Jan 17, 2018

I have slightly impaired vision, so I need a 30pt font to be able to read comfortably, and the web is already a disaster in terms of accessibility. There are lots of sites that I cannot use, because of overlapping content, unreadable text, hidden buttons, etc.

hollerith · on Jan 17, 2018

Same here. The web of 1996 was very accommodating to people (like me) who can read from screens, but need or strongly prefer a larger-than-normal text size, the web of 2018 vastly less so. Some developments that made things harder for me were:

Mobile Safari's decision not to reflow the paragraphs when the user does the inverse of the pinch gesture ("zoom"?), which seems to have encouraged the makers of the other browsers to give less priority to reflow even though they did not completely give up on it; at large text sizes, that decision required the user to scroll horizontally left and right for every line of text;

Google's purchase of Blogger followed by its changing Blogger so it will not show the reader any text unless Javascript is enabled, which seems to have encouraged other sites to do the same; this made it impossible for me to continue to visit those sites in Lynx (a browser that ran inside a terminal window, but did not support Javascript);

HTML5 in general was a disaster for me. For example, at large text sizes, the elements that remain at a fixed position relative to the window regularly end up taking up most of the window's real estate, leaving little room for the text I want to read.

Clarification: those have been my frustrations when trying to read what is or easily could have been static content; this comment is not about web apps.

paulie_a · on Jan 17, 2018

While I doing my solution would work for 30pt fonts, for years I have disabled all fonts and sizes outside of my preferred font and narrow size range

Fnoord · on Jan 17, 2018

Does Firefox "Reader View" feature help you in any kind of way? Or is it that reading isn't the issue, but browsing around is?

nils-m-holm · on Jan 18, 2018

Reader view often excludes figures that are important for understanding a text, so this unfortunately is not an option for me.

I have set my font size to 32pt and do not allow web sites to use smaller fonts or typefaces other than my preferred one. Also, I have set text color to green on black, which is easiest to read for me. You would not believe how many web sites do change background color, but do not set text color, leaving me with bright-green on white text. :/

Then, a large font causes components to overlap, rendering text and buttons inaccessible. Disabling style sheets does not always help, either (and turns every modern web site into a complete mess). Semantic web? LOL.

Javascript, as others mentioned, is a huge problem, because it somehow seems to be able to bypass my font and color settings. I have it turned off all the time now and just do not visit sites that require it. Well, more time for more interesting things! A silver lining in every cloud! :)

dotancohen · on Jan 17, 2018

The Web is great for accessibility in the traditional sense of the word. Information is accessible. The HCI sense of accessibility is completely dependent on the Web page authors. The hoards and hoards of them, who are using a technology that was specifically designed to be used by people with no training in the medium.

dcow · on Jan 17, 2018

Unfortunately when differently-abled people are simply another user segment there is absolutely no way to get business resoures to address them because they sadly aren't your target market. That's been my experience amyway. IMO it's one of the biggest issues with agile: MVPs aren't accessible (among many other things).

flukus · on Jan 17, 2018

I think MVPs are another casualty, having a server spit out html is surely the minimum viable approach?

roca · on Jan 17, 2018

The kernel BPF filters come to mind. Any case where people are currently using interpreters to execute untrusted code ... Truetype hinting programs, DWARF debug info ...

ithkuil · on Jan 17, 2018

It's easy to overlook the fact that the act of parsing structured input is equivalent to executing code in a VM, and in many cases it can lead to the same class of issues that running code can, especially side channel attacks.

infogulch · on Jan 17, 2018

I have never considered parsing to be equivalent to executing a VM. Does this mean that if you found the right gadgets in the implementation of JSON.parse you could attack a browser just with that?

Xylakant · on Jan 17, 2018

I'm not aware of an instance where browsers were attacked via that vector, but for example Ruby On Rails was vulnerable to attacks via the JSON Parser: https://www.exploit-db.com/exploits/24434/ (many other frameworks too, I'm just picking this one because I'm familiar with it)

userbinator · on Jan 17, 2018

On Windows, developers of third-party screen readers for the blind are going through a painful transition where they can no longer inject code into application processes in order to make numerous accessibility API calls with low overhead.

All these isolation changes are ostensibly for "security", but I suspect DRM is at least part of the motivation; corporations want to be able to silo content more and restrict the free flow thereof. To a user, a screenreader is a benevolent helper; to them, it's a malicious "attack", a way of extracting and consuming content that they may not want.

dblohm7 · on Jan 17, 2018

No, we're doing it because of all the problems associated with injected DLLs.

Source: I'm the dev who built the foundation for this transition in Gecko.

roca · on Jan 17, 2018

This is absurd. Browsers offer many ways to scrape DOM content that are far more effective than using a screenreader. No doubt there are people who wish DOM content was harder to scrape, but "make screenreaders suck" would be completely ineffective for that, and accusing browser developers of deliberately doing it is slanderous.

taeric · on Jan 17, 2018

Many of the screens people interested with prior to web 2.0 were much more flat in abstraction. You can simply look at early HTML to see how much simpler a document first format looks. The raw number of embedded div elements greatly complicates most screen reading.

paulie_a · on Jan 17, 2018

Except pre 2.0 web was design by tables.

taeric · on Jan 17, 2018

Not really. It was certainly a way to do things. But it was less used than the current plethora of divs. Probably easier to understand programmatically, oddly.

paulie_a · on Jan 17, 2018

I built websites in the 90s and tables nested 5-10 deep was a common site

taeric · on Jan 17, 2018

I did too, but I don't remember it nesting to 10 that often. I grant that I likely just don't remember it. I recall most were like HN, embedding a couple of tables.

My argument, though, is that 10 is basically just getting started with how nested most div based layouts are. The new "grid" world is a bit of an improvement. Probably a large one, honestly.

andreiw · on Jan 17, 2018

One thing curiously missing from this article is ARM’s laudable in-depth analysis - https://developer.arm.com/support/security-update, and their efforts (https://developer.arm.com/support/security-update/compiler-s...) to bring in architecture-neutral compiler intrinsics to address variant 1.

roca · on Jan 17, 2018

Perhaps I probably should have mentioned that, but I think the array index masking approaches are going to prevail.

andreiw · on Jan 17, 2018

That’s assuming the only thing you want to prevent is speculative bounds overrun. Even with masking, you can still leak the secret in the array from the path not taken? Do you see evidence of gcc or clang gravitating to the MS approach?

In many ways, spectre is one more kind of attack on code that doesn’t properly separate validating untrusted input from acting on that input, except unlike overruns and TOCTOU races, this is microarchitectural.

Animats · on Jan 17, 2018

The article is by someone with no involvement in the CPU business. We need to hear from CPU architects and manufacturers. This is a fundamental CPU design defect and needs to be fixed in silicon.

fyi1183 · on Jan 17, 2018

I'm not a CPU architect, and I'd agree with you that Spectre variant 2 should be fixed by CPU designs, simply because software is helpless against it. Luckily, fixing it shouldn't be too expensive, it just requires tagging the BTB with the trust zone.

But Spectre variant 1 is really a consequence of the CPU working correctly. For a large number of branches, perhaps most, we want loads to proceed during speculative execution. This is because the code accesess the same or closely related data on both sides of the branch, so priming the caches during speculation is very valuable even when the branch is mispredicted.

I remember reading a study of different binary search implementations which is probably the clearest example of this: when the data is laid out in a heap layout (with child nodes next to each other in an array) the branchy variant of the code performs better than the branchless variant due to this cache priming effect.

What CPU designers could and should probably help with is providing instructions to cheaply mark the (comparatively few!) cases where this speculative execution behaviour leaks secret information.

cesarb · on Jan 17, 2018

> What CPU designers could and should probably help with is providing instructions to cheaply mark the (comparatively few!) cases where this speculative execution behaviour leaks secret information.

How can we, as software developers, find these cases in our multi-megabyte code bases, and how can we be sure we haven't missed any?

fyi1183 · on Jan 17, 2018

You could ask the same question about any class of security bug, so unsurprisingly I'd answer more or less in the same way.

For example, if you're paranoid, make your compiler be conservative, in the same way that you might address buffer overflows by using a language/compiler that inserts bounds checks everywhere.

If you're less paranoid and/or more worried about performance, invest in static analysis tools or languages with augmented type systems. After all, you only have to worry about Spectre variant 1 when handling attacker-controlled data. Tracking type info like this is already done by existing static analysis tools.

Finally, if you're not handling attacker-controlled data at all - which is the case for a lot of performance-sensitive code - you really don't want to (and don't have to) do anything about Spectre variant 1.

By the way, this is really the big difference between the two Spectre variants, and why it's a shame that they fall under the same name. Variant 2 affects all code with indirect jumps/calls, even code that doesn't ever touch attacker-controlled data. That's a huge difference between the variants.

Anyway, the bottom line is that you shouldn't punish the performance of all code over a class of security bugs that a lot of code isn't affected by. Buffer overflows haven't stopped us and shouldn't stop us from writing performance sensitive but security uncritical code in unsafe languages either.

Animats · on Jan 17, 2018

You could ask the same question about any class of security bug, so unsurprisingly I'd answer more or less in the same way.

No. the problem here is that the code isn't wrong. The CPU is wrong. Whether or not the CPU will leak data depends on the make and model of CPU. Most MIPS CPUs and many ARM CPUs don't have this problem. Some AMD x86-type CPUs may not. It has to be fixed on the CPU side.

This could introduce Intel to a world auto manufacturers know well - recalls. Intel has been there before, with the floating point bug.

fyi1183 · on Jan 18, 2018

You don't actually have an argument though. Why is the CPU wrong? Because you say so? And btw, you're wrong about this not affecting ARM or AMD. It affects everyone with speculative execution (we're only talking about Spectre variant 1 here - if you're confused about that, please go back to my first comment in this thread).

Look: When other side channel leaks were found, e.g. people recovering RSA or AES keys from plain cache timing without speculative execution, maybe there were people similarly arguing that it's the CPU's fault. They lost that fight, too. Today, the uncontested consensus is that cache timing leaks are the code's fault, for good reason.

Because what are you going to do, stop building caches? Obviously not, they exist for very good reasons. The same is true for speculative execution. What do you expect CPU people to do? Rip that out entirely? Be real. (Please, seriously think about that: what is it that you actually want CPU people to do? Don't just handwave!)

This kind of discussion is why Linus Torvalds regularly flames security people.

roca · on Jan 17, 2018

Intel has spoken, and for Spectre variant 1 they have said we need to fix it in software (they recommend inserting fences "in appropriate places"). https://newsroom.intel.com/wp-content/uploads/sites/11/2018/...

cm2187 · on Jan 17, 2018

Fixing it in silicon will do nothing to the hundreds of millions of machines that are already deployed and many for as long as ten years (particularly servers).

flukus · on Jan 17, 2018

All you'll get from manufacturers is PR speak, most qualified CPU architects would also not be able to speak publicly due to corporate media policies.

phkahler · on Jan 17, 2018

>> browsers are trying to keep the problem manageable by making it difficult for JS to extract information from the timing channel (by limiting timer resolution and disabling features like SharedArrayBuffer that can be used to implement high-resolution timers), but this unfortunately limits the power of Web applications compared to native applications.

I don't see a problem with that. "Web applications" are inherently untrusted code. If it were not for untrusted code these attacks would not be an issue, so it doesn't seem unfair for a mitigation to negatively affect them.

tomp · on Jan 17, 2018

I consider any computer platform that cannot run an "untrusted" application in a manner that doesn't endanger its user (within certain limits - e.g. it's practically impossible to limit what kind of internet traffic the application can do, or what kind of scams it can make the user click through), a failed computer platform.

In particular, browsers could always run JS in a separate process that's appropriately virtualized (i.e. has limited access to host information and resources).

taeric · on Jan 17, 2018

This leaves a big hole. Many malicious packages will solicit trust from the user.

That is, we seem to be plagued by misplaced trust moreso than untrusted applications.

The analogy to civil engineering is we trust building makers. Few of us enter buildings we don't trust to stay up around us.

koheripbal · on Jan 17, 2018

What happens when these issues are addressed at the hardware level. Are users on new chips going to continue to live with performance nerfs to protect those who haven't upgraded, or will patches and fixes detect some CPU feature that IDs it as a "fixed" CPU... of course spoofing that will have its own security implications.

moyix · on Jan 17, 2018

It's interesting to pair this with Adrian Sampson's (an academic who works on hardware architecture) thoughts, particularly his musings about other vectors:

> The second thing is that it’s not just about speculation. We now live in a world with side channels in microarchitectures that leave no real trace in the machine’s architectural state. There is already work on leaks through prefetching, where someone learns about your activity by observing how it affected a reverse-engineered prefetcher. You can imagine similar attacks on TLB state, store buffer coalescing, coherence protocols, or even replacement policies. Suddenly, the SMT side channel doesn’t look so bad.

http://www.cs.cornell.edu/~asampson/blog/spectacular.html

dataflow · on Jan 17, 2018

Could someone please explain to me why there is so much focus on Spectre vulnerabilities in Javascript and not really any on HTML/CSS, when it seems that a server could also be able to cause the client to perform speculative execution via pure HTML? Or is it not possible for some reason? The focus on Javascript as though it's somehow special is rather baffling to me, making me wonder whether I'm really understanding the fundamental issues. (?)

_m7bj · on Jan 17, 2018

>The focus on Javascript as though it's somehow special is rather baffling to me

One of the most common ways major ad networks get compromised to the extent that they serve malware to hundreds of thousands of web users (this happens at least once a year) is that they hotlink to JS libraries, that hotlink to JS libraries, that hotlink to more JS libraries.

If you use a script blocker, it's not that uncommon to see that once you get down far enough, scripts are being loaded from bare IP addresses rather than domain names. Every now and again, someone compromises one of these deep-nested hotlinked JS files and maliciously modifies the javascript, and random sites all over the web dutifully serve the malware.

It's not that I don't trust the first-party website owners, more like I don't trust their friends friends friend.

dhimes · on Jan 17, 2018

This is so annoyingly true. So when you start to allows scripts because you need the website to work, you reload and then see a bunch of new scripts were loaded that you didn't see before. It's a total shitshow.

EDIT: I would love a list of minimum required scripts for certain sites. It's painful to fight through what I need- and I really resent it when I am a PAYING FUCKING CUSTOMER.

gaius · on Jan 17, 2018

Because JS is JIT compiled in a way that HTML isn’t - you can guess what machine instructions a+1 will compile into, its not so clear how a table layout say, will actually execute

IshKebab · on Jan 17, 2018

How on earth would you exploit spectre using pure HTML?

em3rgent0rdr · on Jan 17, 2018

HTML is code too. Just write the right sequence of HTML that will train the branch predictor to speculative jump to some address in the HTML representing malicious machine code.

krapp · on Jan 17, 2018

This may be a dumb question but how would one get branch prediction from HTML if HTML doesn't have conditional statements? There shouldn't be any branches to predict.

em3rgent0rdr · on Jan 17, 2018

The browser's html engine will still do conditional jumps based on the HTML. One attack [1]:

"Decompressors, especially in HTTP(S) stacks. Data decompression inherently involves a large number of steps of "look up X in a table to get the length of a symbol, then adjust pointers and perform more memory accesses" — exactly the sort of behaviour which can leak information via cache side channels if a branch mispredict results in X being speculatively looked up in the wrong table. Add attacker-controlled inputs to HTTP stacks and the fact that services speaking HTTP are often required to perform request authentication and/or include TLS stacks, and you have all the conditions needed for sensitive information to be leaked."

[1] http://www.daemonology.net/blog/2018-01-17-some-thoughts-on-...

coldacid · on Jan 17, 2018

There are plenty of conditionals in the rendering engine of a browser, especially with regards to managing layout.

AnimalMuppet · on Jan 17, 2018

That would be a browser-specific exploit, though. In fact, it might be specific to the exact build of the browser.

krapp · on Jan 17, 2018

Fair enough.

IshKebab · on Jan 18, 2018

Even if that were feasible (I doubt it is in practice), how do you do the timing analysis and send the results somewhere?

faragon · on Jan 17, 2018

In my opinion, the worst long-term consequence will be that even having newer CPUs with the issues fixed in hardware, we'll have a performance impact because of code compiled to work with both old and new CPUs. Just like the case of having a new CPU with fancy features unused because of code compiled to be backwards compatible.

josefx · on Jan 17, 2018

Intels C compiler could generate code that detects CPU features at runtime years ago, I think the current GCC can do the same. Binaries only have to become a bit more bloated to store both versions of the compiled code.

faragon · on Jan 17, 2018

Runtime checks cost CPU cycles as well.

em3rgent0rdr · on Jan 17, 2018

The runtime check only needs to be done one during program execution.

faragon · on Jan 17, 2018

So you mean having two executables in one? A la Apple "fat executables"?

jabl · on Jan 17, 2018

No, it's on a per-function basis. On program startup it does the necessary checks (CPUID etc.) and sets up the function pointers appropriately (see the IFUNC mechanism in the linker).

See e.g. https://lwn.net/Articles/691932/

faragon · on Jan 17, 2018

That's OK for code e.g. you know it could benefit from SIMD usage. However you can not tag every function of user code for safe/unsafe mode. Also, optimizations would increase the mess (inlining, unrolling, etc.). Generated code would be a "Frankenstein".

davidcuddeback · on Jan 17, 2018

Thanks! I was trying to find some information on this, but didn't know the right search terms.

davidcuddeback · on Jan 17, 2018

Probably something like a table of function pointers for "hot" code that gets setup at program start. But compiler writers are way more clever than I am at this sort of thing, so I'm actually curious what solutions they came up with.

brndnmtthws · on Jan 17, 2018

I doubt Intel will be lowering their prices, or refunding anyone a portion of the price of their previously purchased CPUs, that's for sure.

Look what happened after the VW diesel scandal ('dieselgate'): VW had to pay for repairs, and pay buyers (my friend bought one of the cars and got about $6k IIRC). Some people even went to jail.

Intel (or any other CPU maker) will probably not suffer similar fates. This situation is a bit different, because they may not have known about the problem. Still, everyone who bought a CPU is going to get a 10-30% performance haircut because they made a mistake. And Intel isn't going to have to pay for it.

acranox · on Jan 17, 2018

Volkswagen deliberately engineered their cars to falsify government emission tests. What intel did was negligent. Volkswagen was malicious. These are very different. I don’t see them in remotely the same boat.

AnimalMuppet · on Jan 17, 2018

"Negligent" is even too strong.

Per dictionary.com, the legal definition of negligence is "the failure to exercise that degree of care that, in the circumstances, the law requires for the protection of other persons or those interests of other persons that may be injuriously affected by the want of such care. "

What Intel did was not recognize that a specific attack possibility existed. Nobody else recognized it either, for a decade. That's not negligence. That's failure to be omniscient.

jacobush · on Jan 17, 2018

But haven't there been references thrown around that show they knew at least a couple of years and could also have known for 10 years, if there weren't busy not understanding what their bottom line depended upon them not understanding.

leoc · on Jan 17, 2018

Obligatory: https://millcomputing.com/topic/meltdown-and-spectre/

fulafel · on Jan 17, 2018

Does anyone know how things are going in GPU land? Don't they support concurrent separate protection domains these days too?

deepnotderp · on Jan 17, 2018

No OoO speculation though.