Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There are a lot of very interesting issues there. For example, surely you have to look at the entirety of both keys at least once, yes; but you don't have to do it on every comparison, for example in mergesort where you can store a longest common prefix between pairs of adjacent keys and avoid comparing them byte by byte most of the time, or even storing them. The algorithm is tricky both to implement and to analyze, and I don't think anyone has done it, because there are better options for long keys; but I conjecture that it actually gets mergesort down to a real O(N log N) instead of the O(N log N log N) of the usual mergesorts.

I suspect that isn't the reason this kind of analysis isn't usually done, though. If you measure the running time of a standard comparison sorting algorithm on real data on almost any general-purpose computer from 1955 to 1985, you will find that, above very small sizes, it is quite precisely proportional to N log N, where N is the number of records. The extra factor of log N doesn't show up. Why is that?

It's because almost all of those computers, even bit-serial designs like the RCA 1802 and the LGP-30, aren't bit-addressable; the smallest amount of data an instruction can manipulate is a character, typically 6-9 bits, but often 32 bits. And almost none of them could manage more than 4 gigabytes of data, because that would have been too expensive. So in practice the average time to compare two separately stored keys does not vary by a factor of 100 or so over the data sets the machine could process, as you would expect from looking at key unique prefix length. It might vary by a factor of 2 to 4, but more often was actually constant. MIX in particular had 5-character memory words, but only 4000 words of memory, so its comparison time for internal sorting is quite precisely constant for any realistic data.

After 1985, caches and massive parallelism complicate the picture a bit, initially for largish machines (though Stretch too had cache, such monsters were exceptions) and finally for almost all types of computers, though perhaps not yet the numerical majority of computers, which are probably mostly 8051s and Cortex-Ms and things like that.

Anyway, back to the issue at hand: assembly language exposes all the computational costs by default, which is what you want if you're going to prove theorems about them, while high-level languages obscure them by default, so more of your theorems will be incorrect. And that's Knuth's primary concern.

That level of single-minded pursuit of excellence is the reason we can learn things from books Knuth wrote in the 1960s that are still useful for programming computers that are literally a billion times bigger than the machines Knuth used as his model. It's also the reason he's not done yet, 55 years later, with what he thought would be a single book, done in a year or two.



>I conjecture that it actually gets mergesort down to a real O(N log N) instead of the O(N log N log N) of the usual mergesorts.

I don't think it does in the worst case. I'd bet money you could construct an adversarial input that would force you to look at more of each key than you'd have to in order to get down to "real" O(n log n).

>MIX in particular had 5-character memory words, but only 4000 words of memory, so its comparison time for internal sorting is quite precisely constant for any realistic data.

If we're being precise here, then it is certainly bounded above by a constant. But, this is also the same sense in which there is literally no such thing as a Turing machine (all physical machines, even one the size of the universe, are linear bounded automata, at best).

> Anyway, back to the issue at hand: assembly language exposes all the computational costs by default, which is what you want if you're going to prove theorems about them, while high-level languages obscure them by default, so more of your theorems will be incorrect. And that's Knuth's primary concern.

Except, no, you don't. Both you and I just got through explaining why.

> That level of single-minded pursuit of excellence is the reason we can learn things from books Knuth wrote in the 1960s that are still useful for programming computers that are literally a billion times bigger than the machines Knuth used as his model. It's also the reason he's not done yet, 55 years later, with what he thought would be a single book, done in a year or two.

No, it most certainly is not. I'm assuming you've read at least some of TAOCP. Knuth certainly introduced some mathematics and techniques for the analysis of algorithms in TAOCP, but literally none of the algorithms themselves were first published there. This is not a research monograph. It's a reference. The algorithms themselves were published and proven, including runtimes, before they ever made it into those pages.

And, yes, there is plenty that almost anyone can learn from these books and the subsequent fascicles, but it's nothing to do with the computational model. Not one result in TAOCP contradicts the previously published work. We can learn from it because there is simply so much there to learn from. There's a reason that when Steve Jobs told Don Knuth "I've read all your books," Knuth responded that jobs was "full of shit." [0]

---

[0]:https://www.folklore.org/StoryView.py?project=Macintosh&stor...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: