It's one of the possible definitions. You could as well define next(x) = min(y such as y>x) and define 2 as next(next(0)) (0 can be defined as either x such as for all y y+x = y, or as x such that for all y x<y, depending on what set you are working with). Proving that 2 is 1+1 would be a theorem then. Of course, that works only for integers :)
It depends on where you are coming from. If your context are ordinal numbers then you are right and a typical definition of 0 is {} and of s(x) is x∪{x}. But if you working with finite fields for example then you only have an addition operation. "Successor" does not make much sense there, since 1+1+…+1=0 for the right amount of additions (you are calculating modulo a prime). Since 0+1:=1 is trivial you usually start with 1 and define 2 as 1+1, 3 as 1+1+1…
You don't usually define 2 at all when talking about abstract structures (it makes little sense to call the polynomial 2 as being "the 2" of the field of rational functions, for example). 2 is something that exists only in N, and talking about it in other structures makes sense only when you are referring to a ring homomorphism Z->F or something similar.
That definitely is a more clean way to look at the matter, yes. Nonetheless I've seen the definition I stated above a few times and the merit is that you do not have to take an implicit indirection every time you state something like "2≠0". And as a ring homomorphism ℤ->R for any ring R with identity element is already completely defined and in effect identical to the definition "2:=1+1…" for every positive whole number, it is really just a different way of formulating the same idea.
My point is that when talking about the definition of 2, it is enough to restrict ourselves to the natural numbers - since that is where 2 is coming from. If we then want to extend this symbol to other places - the meaning of such an extension will be given by a map, not by requiring a different definition. That is - there is only one 2, everything else is ψ(2) for the homomorphism ψ:Z->R.
What bodyfour is proposing is not just getting a write-notification -- it's in tandem with being able to specify write ordering. If you need to be able to specify write ordering, you also want to be able to have a write notification.
fsync does not work because it returns when everything (in the same thread? or whatever) has been written to disk, and doesn't let you wait for a particular block to have been written.
> fsync does not work because it returns when everything (in the same thread? or whatever) has been written to disk, and doesn't let you wait for a particular block to have been written.
I believe fsync is per-file (or, really, per-file descriptor). I wasn't familiar with this specific problem, but I believe the problem is that for fsync to work, it has to issue commands to the hardware to flush hardware buffers to disk. Apparently that isn't well targeted in Linux, and has the effect that calling fsync on a file can slow down other threads and processes that happen to be writing to that same disk, even if they're writing to a different file.
> What bodyfour is proposing is not just getting a write-notification -- it's in tandem with being able to specify write ordering. If you need to be able to specify write ordering, you also want to be able to have a write notification.
It's still not enough: with write ordering and notification but no instruction to actually write the data soon, the kernel can buffer it indefinitely.
If you want the data on disk, use fsync(). If you don't care, don't. If the problem is that you can't afford the latency imposed by the multiple fsync() calls required to ensure correct data ordering for your application, fine. But that's not the problem the OP talks about. That was about fsync() hammering the I/O subsystem. You can solve that problem by fixing the fsync() implementation.
> fsync does not work because it returns when everything (in the same thread? or whatever) has been written to disk, and doesn't let you wait for a particular block to have been written.
If you really want that, could you mmap(2) the file and use msync(2)?
For this as well as the other cases described (e.g., wanting write ordering), I don't know what the intended use case actually is, but is it possible that there's another way to organize the data that's still correct and performs well without changing the POSIX interface? It seems likely, given the number of different programs out there that manage to get by with it, and there's a rather significant cost to adding a new interface.
(One option is to write everything you need into a temporary file, fsync() it, then synchronously rename it to "commit" it. That still requires two fsync's, but never more than that. You can generalize this for multiple files using a temporary directory.)
FWIW, I typically work on illumos systems. On fsync(), ZFS records only an intent log record. That alone helps, since it's not stopping the world to write out everything that's been buffered. For particularly latency-sensitive applications, we use a separate intent log device on an SSD. (Regardless of write ordering and filesystem optimizations, an SSD is necessary in order to guarantee something is on stable storage with latency better than spindles can provide.) This configuration works very well.
> I don't know what the intended use case actually is,
Generally speaking, if you've got a database system of some sort and want to write data to a file.
> It's still not enough: with write ordering and notification but no instruction to actually write the data soon, the kernel can buffer it indefinitely.
That's not really the problem -- being able to say "do this write operation after this other write operation" would let you pump modifications into some file at a faster rate than if you had to wait for every fsync. Suppose you have a modification that needs to be done. Right now, you might say, write a batch of blocks, wait for them to complete, and then write another block elsewhere (a new "superblock" or whatever the terminology you prefer is). Well, you'd rather send all the blocks simultaneously and say "the superblock write should happen _after_ these other blocks'". (Another option is to checksum the new blocks referred to by the superblock, but that requires pulling them up to the CPU and checksumming them.) (And there are other options that are more complicated with other trade-offs -- it would be nice if you could just send multiple blocks to write, with a partial ordering specified.)
So, even if you had no fsync at all, you'd be able to pump modifications into a database file faster than before. Without some kind of fsync, you couldn't confirm they'd ever been written. With a fine-grained fsync or "flush and notify on a per block basis" call, you can confirm that a certain subset of changes have been written. Generally speaking it's nice to be able to send in a bunch of changes without flushing because when you have multiple noncontiguous block writes to choose from that you'd like to perform simultaneously, they can get thrown on disk with better throughput.
> If you really want that, could you mmap(2) the file and use msync(2)?
>> I don't know what the intended use case actually is,
>Generally speaking, if you've got a database system of some sort and want to write data to a file.
But database systems have been around for years without such an interface, and can't they basically saturate a storage subsystem?
You can always saturate a storage subsystem -- add more clients (assuming you don't saturate the CPU, the CPU's memory bandwidth, or the network interface -- any of which can happen if you put a high-end storage device on otherwise typical hardware). But what you get is higher than the minimal possible latency.
For example, suppose you send a bunch of write operations to the disk and then send an fsync. Well, if those write operations happen one after the other (figuratively) (because there's a bunch of them), their actual completion time would on average be half that of the actual waiting time all of them must suffer through.
Now suppose you've got the ability to do fine-grained fsyncs on particular write operations, efficiently. It would still be useful and result in improved latency if the disk or OS knew that getting block A on disk didn't matter to the process until block B was also on disk, and took advantage of that fact. And it would be extra-useful if the disk or OS knew that block B had to be written after block A, because then you would be able to save a round-trip or save on CPU bandwidth necessary for marking or checksumming blocks to a sufficient degree that you can determine upon startup whether they were completely and correctly written.
> , an SSD is necessary in order to guarantee something is on stable storage with latency better than spindles can provide
Interestingly enough, if we ignore current hard drive firmware, I don't agree with this. In the context of sending arbitrary random sequences of writes to blocks, sure. But in the context of having a database or filesystem that wants low latency writes? My guess is that you could accomplish this if you track the location of the drive head and spindle. The last time I tried anything like this, though (talking to /dev/sdb, a new 7200 RPM WD Black laptop drive, from userland) I could only get about 1.5 ms, per block write + fsync (iirc -- the numbers 1.3 ms and 2.0 ms ring a bell too). I didn't try writing near the middle of the disk, though, so it could have been drifting the drive head off the track each time for some reason. There's just no hope in general, given current rotationals, when a rotational drive takes ~250 microsecs just to read a 4KB buffer from memory. They just don't care.
If you actually did take advantage of physical information to hold down write latency, garbage collection and keeping startup times low would be a pain (but hey SSDs have gc worries too), there'd definitely be throughput and capacity trade-offs.
No. OpenSSL RAND_bytes and Java SecureRandom aren't simply libraries that "call /dev/urandom"; they are full-fledged CSPRNG designs, and must themselves avoid all the possible bugs a CSPRNG can have, in addition to their usual reliance on urandom not itself being vulnerable.
The NativePRNG algorithm for SecureRandom XORs /dev/urandom with a SHA1PRNG seeded from /dev/urandom, so as long as the XOR is correct, it should be no less secure than reading from /dev/urandom directly.
I would definitely avoid Java's SecureRandom. It means too many different things depending on what platform you're on. Meanwhile, XORing SHA1PRNG against /dev/urandom seems cryptographically nonsensical.
> XORing SHA1PRNG against /dev/urandom seems cryptographically nonsensical.
Untrue! Assuming that the two (/dev/urandom and SHA1PRNG) are not correlated, the resulting output will be at least as secure as the most secure of the two. This means that (for example) if SHA1PRNG is found to be breakable, SecureRandom will still be at least as secure as /dev/urandom, and vice versa.
This is a rabbit hole I don't want to go down and so I will concede the point about NativePRNG, while sticking to my guns on "avoid the Java SecureRandom interface".
You can't on the one hand say that OpenJDK Unix SecureRandom uses urandom so it's OK while on the other hand saying that SecureRandom is preferable because it works on platforms without urandom. That's the problem with SecureRandom: it's hard to know exactly what it's doing, as the Android team discovered last year.
(On Windows, I'd use CryptGenRandom, although it inspires even less confidence than Linux /dev/random).
I don't see any contradiction. SecureRandom uses urandom where it's available, and the next best alternative where it's not. Again, I'd like to know what your suggested solution is. The way I see it, you can either:
1) Don't write cross-platform code
2) Use the language implementation
3) Write your own cross-platform code
Assuming (1) isn't an option, it comes down to using SecureRandom or writing your own version of SecureRandom, which strikes me as plain crazy.
If SecureRandom was simply "pull bytes from urandom" on Linux and "pull bytes from CryptGenRandom" on Windows, I wouldn't care enough to argue. But it's not. It's not even "pull bytes from urandom" on Linux; depending on the specific details of your platform, it can be dramatically different than that.
I'm absolutely not recommending that people write their own version of SecureRandom; I'm advising the opposite. Avoid userspace CSPRNGs. Use the system CSPRNG.
So I should write my own cross-platform interface to the system PRNG? Doesn't that strike you as more prone to error than relying on SecureRandom, which at least has the advantage of having many eyes on it.
And urandom is not cross-platform, so if I were going to write a cross-platform library, how would you suggest doing it? By writing an interface to urandom, then an interface to CryptGenRandom (doesn't that require an FFI?), and then manually going through all of the platforms Java can potentially execute on until I can be sure I've covered all my bases?
I'm pretty sure that's going to be more than 5-10 lines of code.
Two immediate questions: forward secrecy and CryptGenRandom's state relative to other WinAPI processes. As in, I'm very clear how the Unix security model protects /dev/random, and less clear about Windows. Some of my C.G.R. thoughts are probably dated. But it's the system CSPRNG, and I think, just use the system CSPRNG.
The forward security issues have been fixed for a long time (since Vista I believe): they now use SP800-90A's CTR_DRBG. But the generator continues to be unprotected in user mode, seeded from kernel.
That makes more sense. So which of the following is true:
1) I shouldn't be using OpenSSL to generate keys without somehow injecting bytes directly /dev/urandom
2) The article is wrong, using OpenSSL's CSPRNG is fine.
3) I still don't get it.
The article doesn't care how you use the OpenSSL commands; it's concerned with code you write that might need a CSPRNG. If you're writing code, don't use OpenSSL's CSPRNG.
So code that I write that generates keys using OpenSSL isn't indirectly depending on OpenSSL's CSPRNG?
Sorry for all the questions. I just want to make sure I'm doing it right and I suspect I'm not the only one that is confused by the article's assertions.
The article (I'm its author) is about programming; it doesn't have strong opinions about how you e.g. configure nginx.
As for keys: it depends on the kinds of keys you're generating. If you're building on OpenSSL's primitives --- which, don't --- it'll be hard to get an RSA key without invoking the OpenSSL CSPRNG. But it's not at all hard to avoid OpenSSL's CSPRNG for AES.
My project depends on bitcoin-ruby, which uses OpenSSL's EC_KEY_generate_key to generate keys. EC_KEY_generate_key, as far as I can tell, uses OpenSSLs internal PRNG. If I understand you correctly, this is unsafe and it would be better to derive a key from urandom.
Reliance on OpenSSL's CSPRNG isn't a hair-on-fire problem; if it was, your hair would literally be on fire right now, because lots of things do. I just don't think it's a great idea for new code to perpetuate the habit.