Hacker News new | past | comments | ask | show | jobs | submit | more FiloSottile's comments login

What’s wrong with P curves?


Names are critical to enable discussion.

The "marketing" page is where documentation is. Summaries that don't require reading a whole academic papers are a good thing, and they are the place where all the different links are collected. Same reason software has READMEs.

Logos... are cute and take 10-60 minutes? If you spend months on some research might as well take the satisfaction of giving it a cute logo, why not.


Speaking as a cryptography implementer, yes, these drive us up the wall.

However, crypto coprocessors would be a tremendously disruptive solution: we'd need to build mountains of scaffolding to allow switching to and off these cores, and to share memory with them, etc.

Even more critically, you can't just move the RSA multiplication to those cores and call it a day. The key is probably parsed from somewhere, right? Does the parser need to run on a crypto core? What if it comes over the network? And if you even manage to protect all the keys, what if a CPU side channel leaks the message you encrypted? Are you ok with it just because it's not a key? The only reason we don't see these attacks against non-crypto code is that finding targets is very application specific, while in crypto libraries everyone can agree leaking a key is bad.

No, processor designers "just" need to stop violating assumptions, or at least talk to us before doing it.


Processor designers are very unlikely to do that for you, because everyone not working on constant time crypto gives them a whole lot of money to keep doing this. The best you might get is a mode where the set of assumptions they violate is reduced.


> No, processor designers "just" need to stop violating assumptions, or at least talk to us before doing it.

No, you don't get to say processor designers need to stop violating your assumptions. You need to stop making assumptions about behaviour if that behavior is important (for cryptographic or other reasons). Your assumptions being faulty are not a valid justification, because that would mean no one could have ever added any caches or predictors at any point because that would be "violating your assumptions". Also lets be real here: even if "not violating your assumptions" was a reasonable position to take, it is not reasonable in any way to make any kind of assumption about modern processors (<30 years old) processors not caching, predicting, buffering, or speculating anything.

If you care about constant time behaviour you should either be writing your code such that it is timing agnostic, or you could read the platform documentation rather than making assumptions. The apple documentation tells you how to actually get constant time behavior, rather than making assumptions.


> you should either be writing your code such that it is timing agnostic, or you could read the platform documentation rather than making assumptions

Have you even read the paper? Especially the part where the attack applies to everyone’s previous idea of “timing agnostic” code, and the part where Apple does not respect the (new) DIT flag on M1/M2?


No, the paper targets "constant time" operations, not timing agnostic.

The paper even mentions that blinding works, and that to me is the canonical "separate the time and power use of the operation from the key material" solution. The complaint about this approach in the paper being is that it would be specific to these prefetchers, but it seems this type of prefetcher is increasingly prevalent across multiple cpus and architectures so it seems unlikely to be apple specific for long. The paper even mentions new intel processors have these prefetchers and so necessarily provide functionality to disable them there too. This is all before we get to the numerous prior articles showing that key extraction via side channels is already possible with these constant time algorithms (a la last months(I think?) "get the secrets from the power led" paper). The solution is to use either specialized hardware (as done for AES) or timing agnostic code.

Trying to create side channel free code by clever construction based on assumptions about power and performance of all hardware based on a simple model of how CPUs behave is going to just change the side channels, not remove them. If it's a real attack vector that you are really concerned about you should probably just do best effort and monitor for repeated key reuse or the like, and then start blinding at some threshold.


> processor designers "just" need to stop violating assumptions

"Security" rarely (almost never) seems to be part of any commercially-significant spec.

Almost as if by design...


Wouldn't that "just" allow someone to see if a key was present (and any information that informs) but dramatically help prevent secret key extraction?


I don’t think the security community is also going to become experts in chip design, these are two full skill sets that are already very difficult to obtain.

We must stop running untrustworthy code on modern full-performance chips.

The feedback loop that powers everything is: faster chips allow better engineering and science, creating faster chips. We’re not inserting the security community into that loop and slowing things down just so people can download random programs onto their computers and run them at random. That’s just a stupid thing to do, there’s no way to make it safe, and there never will be.

I mean we’re talking about prefetching. If there was a way to give ram cache-like latencies why wouldn’t the hardware folks already have done it?


I almost gave you up an upvote until your third paragraph, but I have to now give a hard disagree. We're running more untrusted code than ever, and we absolutely should trust it less than ever and have hardware and software designed with security in mind. Security should be priority #1 from here on out. We are absolutely awash in performance and memory capacity but keep getting surprised by bad security outcomes because it's been second fiddle for too long.

Software is now critical infrastructure in modern society, akin to the power grid and telephone lines. It's a strategic vulnerability to neglect security, and it must happen at all levels of the software and hardware stack. Meaning, trying to crash an enemy's entire society by bricking all of its computers and send them back to the dark ages in milliseconds. I fundamentally don't understand the mindset of people who want to take that kind of risk for a 10% boost in their games' FPS[1].

Part of that is paying back the debt that decades of cutting corners has yielded us.

In reality, the vast majority of the 1000x increase in performance and memory capacity over the past four decades has come from shrinking transistors and increasing clockspeeds and memory density--the 1 or 5 or 10% gains from turning off bounds checks or prefetching aren't the lion's share. And for the record, turning off bounds checks is monumentally stupid, and people should be jailed for it.

[1] I'm exaggerating to make a point here. What we trade for a little desktop or server performance is an enormous, pervasive risk. Not just melting down in a cyberwar, but the constant barrage of intrusion and leaks that costs the economy billions upon billions of dollars per year. We're paying for security, just at the wrong end.


Turning off bounds checks is like a 5% performance penalty. Turning off prefetching is like using a computer from twenty years ago.


Turning off prefetching while running crypto code would be a performance gain before you can implement the algorithms safely without even more expensive and fragile software mitigations. Just give me the option of configuring parts of the caches (at least data + instructions + TLBs) as scratchpad and and a "run without timing side-channels pretty please" bit with a clearly defined API contract and accessible (by default) to unprivileged userspace code. Lots of cryptographic algorithms have such small working sets that they would profit from a constant-time accessible scratchpad in the L1d cache if they get to use data dependent addresses into it again.


Happily there are mechanisms to do just that, specifically for the purpose of implementing cryptography (I commented at the top level and don't want to just spamming the url).


I agree that hardware/software codesign are critical to solving things like this, but features like prefetching, speculation, and prediction are absolutely critical to modern pipelines and broadly speaking are what enable what we think of as "modern computer performance." This has been true for over 20 years now. In terms of "overhead" it's not in the same ballpark -- or even the same sport, frankly -- as something like bounds checking or even garbage collection. Hell, if the difference was within even one order magnitude, they'd have done it already.


> I fundamentally don't understand the mindset of people who want to take that kind of risk for a 10% boost in their games' FPS[1]

Me either. But, lots of engineers are out there just writing single threaded matlab and python codes with lots of data-dependencies and just hoping the system manages to do a good job (for those operations that can’t be offloaded to BLAS). So I'm glad gamer dollars subsidize the development of fast single threaded chips that handle branchy codes well.

> In reality, the vast majority of the 1000x increase in performance and memory capacity over the past four decades has come from shrinking transistors and increasing clockspeeds and memory density

I disagree, modern designs include deep pipelines, lots of speculation, and complex caches because that’s the only way to spend that higher transistor budget for higher clocks and compensate for the fact that memory latencies haven’t kept up.

> Part of that is paying back the debt that decades of cutting corners has yielded us.

It will be tough, but yeah, server and mainframe users need to roll back the decision to repurpose consumer focus chips like the x86 and arm families. RISC-V is looking good though and seems open enough that maybe they can pick-and-choose which features they take.

> I almost gave you up an upvote until your third paragraph, but I have to now give a hard disagree.

I’m not too worried about votes on this post; this site has lots of web devs and cloud users, pointing out that the ecosystem they rely on is impossible to secure is destined to get lots of downvotes-to-disagree.


How is RISC-V going to solve anything here?


It isn’t a sure thing. Just, since it is a more open ecosystem, maybe the designers of chips that need to be able to safely run untrusted code can still borrow some features from the general population.

I think it is basically impossible to run untrusted code safely or to build sand-proof sandboxes, but I thought the rest of my post was too pessimistic.


It is significantly less complex yet without compromising anything. This means a larger portion of a chip's design effort can be put elsewhere such as into preventing side channel attacks.


I don't really see how the design of RISC-V avoids the need to have a DMP


>I don't really see how the design of RISC-V avoids the need to have a DMP

Because it does not. I also do not see where, if at all, such claim was made.


Perhaps you should explain how this design effort spent preventing side-channel attacks is spent, then?


Anything specific?


> download random programs onto their computers and run them at random

To be clear that includes what we're all doing by downloading and running Javascript to read HN.

Maybe I can say "don't run adversarial code on my same CPU" and only care about over-the-network CPU side-channels (of which there are still some), because I write Go crypto, but it doesn't sound like something my colleagues writing browser code can do.


Speak for yourself; I've got JavaScript disabled on news.ycombinator.com and it works just fine.


Is this exploitable through JavaScript?

In general from what I've seen, most of these JS-based CPU exploits didn't strike me as all that practical in real world conditions. I mean, it is a problem, but not really all that worrying.


> Is this exploitable through JavaScript?

Why wouldn't it be?


How is JavaScript going to run a chosen-input attack against one of your cores for an hour?


If you leave a tab open that's running that JS..


Because JS/html provides APIs to perform cryptography(I can't recall whether the cryptography specs are part of ES or HTML/DOM) - if you try to implement constant time cryptography in JS you will run into a world of hurt due to the entire concept of "fast JS" being dependent on heavy speculation, and lots of exciting variations in timing of even "primitive" operations.


No, the attack would be implemented in JS, not the victim code (though, that too, but that's not what's interesting here).


Ah, you’re concerned about person using js to execute the side channel portion of the attack, not the bit creating the side channel :)


FYI malicious JS executing in victim users' browsers is a huge concern. All sorts of vulnerabilities can be exploited via JS in this way -- every local side-channel like Spectre/Meltdown, worse things like Row Hammer, etc.


Unfortunately somebody has tricked users to leaving JavaScript on for every site, it is a really bad situation.


Security and utility are in opposing balances often. The safest possible computer is one buried far underground without any cables in a faraday cage. Not very useful.

> We’re not inserting the security community into that loop and slowing things down just so people can download random programs onto their computers and run them at random. That’s just a stupid thing to do, there’s no way to make it safe, and there never will be.

Setting aside JavaScript, you can see this today with cloud computers which have largely displaced private clouds. These run untrusted code on shared computers. Fundamentally that’s what they’re doing because that’s what you need for economies of scale, durability, availability, etc. So figuring out a way to run untrusted code on another machine safely is fundamentally a desirable goal. That’s why people are trying to do homomorphic encryption - so that the “safely” part can go both ways and both the HW owner and the “untrusted” SW don’t need to trust each other to execute said code.


> The feedback loop that powers everything is: faster chips allow better engineering and science, creating faster chips. We’re not inserting the security community into that loop and slowing things down just so people can download random programs onto their computers and run them at random. That’s just a stupid thing to do, there’s no way to make it safe, and there never will be.

Note that in the vast majority of cases, crypto-related code isn't what we spend compute cycles on. If there was a straightforward, cross-architecture mechanism to say, "run this code on a single physical core with no branch prediction, no shared caches, and using in-order execution" then the real-world performance impact would be minimal, but the security benefits would be huge.


I’m in favor of adding some horrible in-order, no speculation, no prefetching, 5 stage pipeline architectures 101 core which can be completely verified and bulletproof to chips.

But the presence of this bulletproof core would not solve the problem of running bad code on modern hardware, unless all untrusted code is run on it.


We considered this design and rejected it because it requires an ultimately unwinnable cat-and-mouse game of fingerprint resistance.

Things the log can use to partition the users include their Tor version, HTTP library behavior, contact periodicity... the list goes on and on, and keeps restricting the use cases where it can be deployed securely even just in theory.

Witness cosigning is secure even if the way you fetch the proofs is completely attacker-controlled, which is closer to user expectations of regular digital signatures.


For my application of the history trees (tansparency log) there are additional measures that help to ensure a global consistency view eventually. The authorithy would need to publish the final tally at the end of the vote which also would contain a tree root. The tree root would be distributed in print, in official anouncements thus any user can check that the tree root shown on the client coresponds to that with anounced tally and thus the authorithy would not know who does the check. Thus it would be unreasonably hard for adversary to deploy a deception attack at such scale.

Additional measure is that the TOR integration with the help of Arti project is deployed with the client. That ensures that every client does make the requests in the same way. It is surelly important to not disclose the server any local data or make identity revealing requests within the same session like giving away clients local commitment index before the server has shown their current commitment. Using anonymous channel for ensuring global consitency for sure is not universal but for some applications it is doable if the problem is approached holistically particularly in situations where anonymous channel is already needed within the protocol.

> Witness cosigning is secure even if the way you fetch the proofs is completely attacker-controlled

Opting for this approach makes sense if the protocol doesn't initially require an anonymous communication channel. However, if the protocol already uses it, introducing an additional assumption for trusted witnesses adds complexity.


Purely coincidental! ;-)


We're working on it! RFC 6962 specifies inclusion and consistency proofs, but indeed it's missing test vectors. Keep an eye on https://c2sp.org and https://c2sp.org/CCTV.


This is one of the projects I've been most excited about in the last few years. It let me backport to Certificate Transparency some of the modern designs ideas that came after it.

Beyond the Let's Encrypt announcement and the ct-policy thread (which includes a technical and advantages summary), here are a few resources that might be interesting.

- Design document, including architecture and tradeoffs: https://filippo.io/a-different-CT-log

- Implementation: https://github.com/FiloSottile/sunlight

- API specification: https://c2sp.org/sunlight

- Website, including test logs and feedback channels: https://sunlight.dev/

If you’re thinking “oh we could use something similar” please reach out! Sunlight is retrofitting some of the modern tlog designs on a legacy system. With a greenfield deployment you can do even better! I’m working with the Sigsum project on specs, tooling, and a support ecosystem to make deploying tlogs easier and safer.


Cool!

tlog = transparency log, but not neccessarily for X509 certificates?


Exactly! It's a growing ecosystem including things like https://transparency.dev, the Go Checksum Database, https://www.sigsum.org, SigStore, and even key transparency solutions like WhatsApp's.

One thing you end up needing to deploy tlogs is a way to reassure clients the tree is not forked, and for that you mostly need witness cosigning, where a quorum of third parties attest that a signed tree head is consistent with all the other ones they've seen. I've worked with the Sigsum project and the Google TrustFabric team on an interoperable specification for witnessing (which Sunlight interoperates with), and I am now working to develop a public, reliable ecosystem of witnesses.

Once you have witnessing, running a log can be as easy as hosting a few files in a GitHub repo or S3 bucket, updated with a batch script. I am very excited to make it possible for any project to get better-than-CT accountability for ~free.

(You might want to catch my RWC 2024 talk about this once it comes out!)


Do you know of any cryptography implementation that sets the Data Independent Timing flag? We've been trying to figure out what others are doing about it, because as far as I can tell nobody is.

Anyway, not sure why relying on C/C++ would have helped us here.


The point is not about relying on C/C++, it's about using existing implementations instead of re-inventing the wheel all the time. This is a cultural thing when it comes to Go and it has bitten them multiple times, like when they tried not to use the system's libc on MacOS or when they had issues when dealing with memory pages on Linux.

Good to know there's someone in charge specifically of the cryptographic stuff for Go at Google though.


Go has good reasons not to bring C/C++ into every build, starting from the ability to cleanly cross-compile.

I can't comment on the rest, but the security track record of the crypto libraries is stellar compared to pretty much any other library (and it already was before my tenure).

(BTW, I am not at Google anymore, although I still maintain specifically the crypto libraries.)


Not really, a number of our crypto implementations are pure Go. In fact, we always have a pure Go fallback that you can select with "-tags purego". As of Go 1.23 we will be systematically testing it, too, because it enables other compilers like TinyGo. They might be slower, but with the notable exception of AES (because implementing AES in constant time without AES-NI is hell) the pure Go implementations are just as secure.

Moreover, some of the assembly cores are a couple dozen lines for the hottest loops. I guess you could call the whole Go package a safe wrapper around that unsafe code, but I am used to think of a wrapper as not the place where the substantial logic is.

It's also meaningfully different from AWS-LC, discussed here, which has the entire cryptographic operation (like a signature or encryption API) implemented in C. (It's still great progress to move the TLS and X.509 implementations to a safe language, as that's where most memory safety bugs are!)


Sorry, just to be clear, I don’t know anything about Go’s crypto implementations, I was purely responding to the parent who claimed they were wrappers around asm.

I think we’re making two different points. I am talking about at a very high level, when people say “yeah it’s safe but there’s unsafe under there” that that is always the case at some point in the stack. Even a pure Go or pure Rust program ends up needing to interact with the underlying system, whose hardware isn’t safe. There is still some code that has to reach outside of the ability of the language to check that it conforms to their abstract machines in order to do things at that level.

I don’t disagree that minimizing the amount of unsafety is a good general goal. Or that because there’s unsafe inside, that the code is not overall safe. Quite the opposite! I’m saying that not only is it possible, but that it’s an inherent part of how we build safe abstractions in the first place.

(Oh and to be honest, I wish Rust had gotten the same level of investment around cryptography that Go has had. Big fan. Sucks it never happened for us. I’m glad to see continued improvements in the space, but like, I am not trying to say Go is bad here in any way.)


There is also the compiler. A language may claim to be implemented without any unsafe code in the standard library, but all that happened is the unsafe code got hidden in the compiler, in how it generates code, in intrinsics etc.


They have different use cases: with PAKEs you encrypt a connection, not a file. You can’t use PAKEs to encrypt backups. Or, rather, you can but then the two sides just have to store the key, making it not fit for e2ee use cases. It’s password authenticated key exchange, not password derived keys.

(Well, the WhatsApp solution actually uses a PAKE to talk to the HSM, but the HSM is still necessary.)


Exactly!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: