Knuth's Challenge: Analyze everything your computer does in one second

khedoros · on Sept 30, 2016

The code running on the CPU isn't the only thing the computer is doing in that one second, the X86{,_64} opcodes we could capture aren't necessarily exactly what the CPU is doing, and code being run by extra controllers and processors isn't likely to be accessible to anyone but the manufacturer.

In 1989, we'd've also been looking at code running on a single core with a single-task or cooperative-multitasking OS (for most home computers, anyhow), with simpler hardware that an individual could completely understand, and it would run at a speed where analyzing a second of output wouldn't be completely beyond the pale.

I've analyzed CPU logs from DOS-era programs and NES games. I certainly haven't analyzed a full second of the code's execution; I'm usually focused on understanding some particular set of operations.

drvdevd · on Sept 30, 2016

I think this could only be done from an outside perspective for any given unit of time (meaning e.g. somehow monitoring the actual physical state of every atom of a modern system) and could only be performed by other machines.

I think we've reached a sort of "chicken and egg" point in computing history where we can only understand any given device with the help of other machines, though not necessarily with AI or Machine Learning.

Maybe that sounds obvious, given that VLSI tools have been around for a long time, but I think the point is that there is no such thing is as full knowledge in this realm and we are already totally dependent on our computing devices to understand our computing devices.

It's an interesting thought - how would one implement this? I think the closest thing that comes to mind so far is the DARPA Cyber Grand Challenge work, some of which is open source (https://github.com/mechaphish) and a good starting point at the software level anyway, where the question was similar in some ways: describe what's happening in this code path at time t, but also take it a step further and generate a patch to fix a bug(s).

posterboy · on Oct 1, 2016

>I think this could only be done from an outside perspective

>only be performed by other machines

JTAG allows in-device debugging. Of course you need another machine, by some definition, to view the results or you would indefinitely apply recursion.

gaxun · on Sept 30, 2016

> simpler hardware that an individual could completely understand

Our hardware now is so much more complex, what has it gained us? The quick answer is performance, but is it true? What about correctness? Hard to prove either way, but my guess is we've gained a little bit on performance and lost on correctness.

rwallace · on Oct 1, 2016

I could say 'gained a little bit on performance' is a wild understatement, because the actual gain is several orders of magnitude, but that in itself would be a wild understatement. It could be taken to suggest that a second of work on a modern PC could be duplicated on a 1989 PC if you were willing to wait all day, but in reality it couldn't; very little of what I use computers for today could be done on the machines I had in 1989 at all no matter how long I was prepared to wait.

khedoros · on Sept 30, 2016

My computer certainly does more than the one 27 years ago did, in the senses of operations it performs in a second, capacity of data storage, increased intercommunication options, increased selection of devices that I can interface with it, and so on. Some of these are just a change in magnitude, and the changes in the software are just as important as changes in the hardware.

There are more parts of the computer (again, both hardware and software) that are undocumented. Taken as a whole, the system is more capable, but closed hardware and software makes me wonder about capabilities in my computer that serve someone that isn't me.

drvdevd · on Sept 30, 2016

Personally, I like to occasionally remember that there's not a single computing device in my life I really own. It can be either a comforting piece of knowledge or a frightening one depending on your perspective. I like to take the former perspective.

27182818284 · on Sept 30, 2016

Ehhh. In a sense sure, but lots of people consider they've owned a loaf of bread without knowing Wonder's recipe.

drvdevd · on Sept 30, 2016

I see what you mean, but also no one ever owned a loaf of bread that could be cut, toasted and buttered remotely by a totally unknown 3rd party either...

24gttghh · on Sept 30, 2016

Unless you go to the local diner for breakfast and don't know the line-cook...

drvdevd · on Sept 30, 2016

Good point. You walk into the diner implicitly trusting the line cook and the bread baker and so on. Very similar to the chain of trust implicit in the tech we use. Only I believe the chain of trust in the tech to be much longer and proven to have been repeatedly broken in recent decades. Also, line cooks have the luxury of tossing out loafs of bread that have moulded over.

24gttghh · on Oct 1, 2016

:) They can even cut off the mouldy bits, preserving the rest of the breakfast.

chriswarbo · on Oct 1, 2016

My first computer was an 14MHz Amiga 1200 in the early 90s. In some ways the performance difference isn't too noticable, e.g. the GUI was often more responsive than those I use today.

In other ways it's clearly different; e.g. waiting minutes for a JPEG to decode, as the scanlines slowly appeared one after another.

kpil · on Oct 1, 2016

This is something I have thought about.

I think that the expectations have changed though - my first computer was a C64, and loading and starting a game was a several minutes long wait. I don't remember that I was bothered that much by that, but when I tried playing an old game even 15 years ago, I couldn't believe how much time it took.

The same for my Amiga, starting it took a really long time, but I don't think I did mind. The Workbench was never slow though. Unfortunately both my A1200 and C64 have died so I can't test my patience anymore. I remember that on the A500, flood fill in Deluxe Paint was a visible process though :-)

I used the A1200 with a MC68030 expansions while going to the university up to about 1995 and it wasn't much slower to work with than the DEC Alphas that we had there - except for things requiring raw CPU power. Most of the time waiting was for I/O, and my crappy small hard drive was probably faster than the NFS mounts anyway. The Alphas had 384 MB if I remember correctly though, which was just crazy.

The Amiga wasn't fast enough to play 16 bit mp3-files in stereo even with the 68030 cpu though.

In 1995 I replaced the Amiga with a PC with Linux on it, and computing was still amazingly fast. Installing slackware was a two week project however, mostly because I had to download everything in the university and use floppies and partly because the floppies were reused and flaky so I had to go back and redownload many disks.

The internet was crazy-slow outside the university until about 1998 when I was lucky enough to live in a block that got fiber for some reason. It was still slow at most workplaces for a another decade.

At around 1998 I got a job and a laptop for work. I installed Linux and Window Maker (or was it Afterstep?) and it was totally fine to work on. It might have had a Pentium with 32 or possibly 64 MB memory. All in all it was really fast, once it had booted, and I mainly used emacs and gcc. I remember that booting Windows on that machine was much slower.

As far as I can remember it was also possible to use a browser without having 1 GB or RAM at that time.

A full compile of our product took 6 hours though. It wasn't always necessary but it had to be done occasionally. A few years later it took 30 minutes to compile.

Today, I get irritated if I have to wait more than 30 seconds before I can test a line of code.

monocasa · on Sept 30, 2016

The reason that modern cores are in the billions of transistors is to get performance with correctness. Tons of speculative work that can be rolled back if some invariants don't hold to be true.

GFK_of_xmaspast · on Oct 1, 2016

> a little bit on performance

I don't know how anybody who used a computer in the 80s, 90s, or 00s could say this.

SapphireSun · on Sept 30, 2016

GPUs are definitely in the performance at the expense of correctness bucket. It's common for errors to occur in the LSB which isn't a big deal for colors, but might be a big deal for data.

gpderetta · on Sept 30, 2016

Aren't the last few generations of GPUs fully IEEE compliant?

wyldfire · on Sept 30, 2016

I assume that SapphireSun refers (primarily) to individual manufactured device faults not a problem where the design fails to comply with the floating point standard. EDIT: although the faults I'm referring to would be just as likely in the MSB so maybe not.

SapphireSun, these are mitigated in large part by ECC. It was offered by both major manufacturers and now at least one of them still offers it.

moyix · on Oct 1, 2016

This is totally doable. PANDA [1], e.g., takes recordings of full-system execution by recording non-deterministic hardware inputs in a modified QEMU. This is much more compact than a full instruction trace, but you can then replay the recording to produce instruction traces, memory traces, whatever.

We recently added some machinery [2] for using debug symbols to map each instruction back to its source line. So, assuming you can get debug symbols installed for every userspace program, every library, and the kernel, I think you could come very close to tying every instruction back to a source line.

There are caveats, though – the overhead of QEMU and the recording infrastructure mean you only end up getting around 200 million instructions / second, which is nothing compared to modern bare metal. You could capture a longer trace, though, and do the same thing to get up to the same amount of code executed as on real hardware.

If someone wants to try this I'd be very interested to see the results and happy to help answer any questions that come up!

[1] https://github.com/moyix/panda

[2] https://github.com/moyix/panda/blob/master/qemu/panda_plugin...

pm215 · on Oct 1, 2016

This is getting perilously close to "attempting to extract performance information from a program running under a simulator/CPU model", which is really tricky to do in a way that gives you results that apply to actual hardware. In particular, the performance characteristics of QEMU are wildly different from real hardware in several significant ways: (1) the emulated FPU is incredibly slow, so the results will overweight anything that's FP-intensive (2) there is no modelling of CPU caches, TLBs or branch predictors, so these major influences on real-hardware performance won't be visible (3) because the emulated CPU is very slow, interrupt handler and similar code which runs triggered by timers or other external events will take up more time in the trace than it would on hardware.

You'd probably find something interesting in the general sense from looking at what's going on in a QEMU emulation for a fixed time period, but you'd need to be rather wary about how applicable what you saw might be to a real hardware run. At minimum you'd want to cross-check against what perf on real hardware revealed.

roel_v · on Oct 1, 2016

This line stood out to me:

"It might be shocking if an erroneous program was discovered, but it could certainly happen."

Is anyone actually under the impression that the thousands or tens of thousands of components that interact with each other at any given time on a typical desktop computer would be "correct", even for generous definitions of that word (well, that are less generous than "most of the time generally do what the designers set out to, generally, accomplish")?

It seems to me that all components in computers of at least the last decade, not to mention their interactions, are so complex that they almost certainly are full of small and not so small errors; they are deployed as soon as the most obvious and obnoxious errors have been removed but there must be heaps of things going on that most people would agree on are "errors". I'm continually amazed that we manage to build (or rather, 'assemble') systems that most of the time work at all.

EdwardCoffin · on Sept 30, 2016

I think perhaps some alterations would really be necessary to make this analysis tractable. He wrote this in 1989, so how many doublings in performance have we had since then? The duration should be adjusted accordingly, so analyze everything the computer does in say a thousandth of a second, or even less.

Another thing is that he was surely envisioning a program written in C or the like - a straightforward translation from the program to executable with some optimization, and all of the program optimized to the same level. With JITs, one would have to take a few steps back to determine whether the code had been optimized because the JIT had determined it would be a good idea, not optimized because it just wasn't important, or possibly not optimized _yet_ simply because the JIT had not seen enough of the program to decide whether to bother.

There's also the idea of memory hierarchies influencing how things are done in ways not necessarily obvious to someone focusing on the code being executed at the moment. I think any memory hierarchies (I'm thinking here of L1, L2, L3 caches in modern processors) have a much greater impact on how optimal code is written now than it was back then. Perhaps the code one examined could be done better for itself, but was done less optimally in order to not have detrimental effects on other code in the program that was more important (like perhaps displacing stuff that other code had cached).

I'm not really sure that this exercise would be worth it today, except in special cases, trying to wring every last bit of performance out of a program after less tedious avenues had been exhausted. I can't say that I have had a whole lot of need for such performance.

This idea of close analysis of a part of one's program does remind me of something else though: the idea that one should run one's program in a source-level debugger and step through every line of code one can get to, trying to get 100% code coverage, and contemplate the state of the program at every step. I think this would uncover many latent bugs, hidden assumptions and the like in most programs. I guess what I am trying to say here is that correctness is more important than performance, and perhaps easier to do in today's world.

gaxun · on Sept 30, 2016

> I think perhaps some alterations would really be necessary to make this analysis tractable.

In this post, I presented the idea as if it was "easy", but Knuth seemed to be proposing it as a rather large undertaking. I skipped some parts of his original prompt for brevity, but since you bring this up, I can summarize a bit more here. I also found a copy of the address in PDF form online [0], if you want to read the whole thing. This is from the last few pages.

He compared this task to researchers who documented every square meter of a large tract of land to see how different factors affected plant growth. He also mentioned a study of 250,000 individual trees in a rain forest. It's not supposed to be easy.

Yes, we've doubled many times since then, but our power to analyze large piles of data has also improved dramatically.

> I'm not really sure that this exercise would be worth it today

I think it really depends on what kind of system you are going to analyze. He was probably thinking of big systems running a school or business back then. These days there are just so many more types of machines. Most are probably not interesting at all. Maybe some kind of life-or-death devices, though?

> correctness is more important than performance

One neat thing about this kind of lowest-level analysis is that you can probably check on both at the same time.

[0] http://www.sciencedirect.com/science/article/pii/03043975919...

EdwardCoffin · on Sept 30, 2016

You might find the following interesting.

In Carl De Marcken's Inside Orbitz email [1] he has the following item:

> 10. ... We disassemble most every Lisp function looking for inefficiencies and have had both CMUCL and Franz enhanced to compile our code better.

In 2001 there was a series of three panel discussions on dynamic languages [2] that are an absolute goldmine: about six hours worth of listening, with various luminaries discussing deep ideas and fielding questions from the audience. Knuth is cited several times on different topics. This is also where I learned about the idea of stepping through every line of code you can get to (Scott McKay brought this up in the panel on runtime [3]. You ought to be able to find the other two panels (compilation and language design) from that one. Anyway, they discuss a lot of idea behind performance, for example

a) code that is locally bad but globally good

b) optimizing for locality and predictability of memory access (David Moon, in the Compilation panel, I think)

c) speculation that performance improvement could be gained via having an efficient interpreter residing in cache, over optimized compiled code (Scott McKay again, in the panel on runtime - incidentally I think this idea is proven in Kdb+ - at least, I understand that is their secret to performance, or one of them)

[1] http://www.paulgraham.com/carl.html

[2] http://www.ai.mit.edu/projects/dynlangs/wizards-panels.html

[3] https://www.youtube.com/watch?v=4LG-RtcSYUQ

Edit: cleaned up formatting Edit 2: and more clean up

rbanffy · on Sept 30, 2016

Since Apple IIs were manufactured until 1993 or so, I guess I could use one for the challenge. It's perfectly doable.

grkvlt · on Oct 2, 2016

According to Wikipedia [1] they were discontinued on October 1, 2006; only 10 years ago!

[1] https://en.wikipedia.org/wiki/Apple_II

EDIT - Sorry, Wikipedia was wildly incorrect; I have updated the page

rbanffy · on Oct 2, 2016

It's so incorrect, in fact, it's worth banning whoever did it from ever editing anything else...

Someone · on Sept 30, 2016

"The computer will execute several hundred thousand instructions during that second; I'd like you to study them all."

That "several hundred thousand" has grown by about four orders of magnitude since then, even more if we consider multi-threading, GPU, CPUs in the network controller, etc.

Because of that, that task has become a lot more work. It is doable for 10^5-10^6 instructions (certainly for someone with Knuth's work ethic), but for 10^9-10^10 instructions, I guess even he would need to write tooling to do it.

A problem with tooling is that it may hide interesting behavior that the programmer writing the tooling isn't aware of in the 'misc' category of code executed. I fear that may defeat the purpose of this exercise.

Klathmon · on Sept 30, 2016

Couldn't you adjust the amount of time that you look at to preserve the "purpose" of the exercise?

Make it 1/10th of a second, or 1/1000th if needed.

hderms · on Oct 1, 2016

Interesting question as I'm guessing the ratios of instructions would probably be different owing at the very least to the vast difference in CPU power to IO that has grown over time.

ideonexus · on Sept 30, 2016

Fascinating question, reminds me of this one that made HN almost two years ago: “What happens when you type Google.com into your browser and press enter?”

https://github.com/alex/what-happens-when

Would love to see Knuth's Challenge setup in a repo for collaboration.

HN Discussion on the above link:

https://news.ycombinator.com/item?id=8902105

contingencies · on Sept 30, 2016

Sad that it has 57 issues and 27 pull requests, and begins with When you just press "g" the browser receives the event and the entire auto-complete machinery kicks into high gear... ... not exactly a low level explanation.

I was expecting non-keyboard input methods, keyboard scan codes, keymaps, input modes, fonts, glyph selection, screen resolution and rendering. Scan codes are mentioned in the next paragraph, but the rest are ignored (OK, they finally mention DOM rendering, but leave out the earlier screen updates). Two seconds later they are discussing ARP... really? What about virtualization, sandboxing, firewalling, link layer selection, use of multiple concurrent link layers for a given network-layer address range, non-ARP link layers, existing transport-layer sessions to the relevant hosts, VPNs, proxies, offline mode, caches and load balancers (possibly multiple layers), non DNS-based name resolution, differences between available IPv4 and IPv6 address classes (mobile IP), layer 2 transparent proxying (eg. proxy ARP/heartbeat), switch ARP caches, etc.

I also think this is an excellent interview question for any internet-related engineering role, as it really gives people a chance to show their degree of comprehension of many layers.

drchickensalad · on Oct 1, 2016

Know anything else that actually lists all that? I'd love to research each thing on its own independently but it's so hard to come up with the list without already knowing

contingencies · on Oct 1, 2016

I'd love to research each thing on its own independently but it's so hard to come up with the list without already knowing

I would recommend examining from different perspectives: network activity or options at each layer (wireshark), browser activity (firebug, debugger, or read the code), OS activity (debugger, or read the code), program activity (debugger, or read the code, or learn various system monitoring tools like filesystem monitors, kernel or library-based tracers, etc), virtualization systems (their hardware and network emulation), network systems (proxies, load balancers and caches of all configurations) security systems of all kinds. You are right that there is probably no holistic resource, because the question is sort of ridiculously specific.

zaptheimpaler · on Oct 1, 2016

You should add this to the repo!

contingencies · on Oct 1, 2016

The only option is to fork, as apparently it's unmaintained. I have enough projects right now.

gaxun · on Sept 30, 2016

You've just made my day. I've had some ideas brewing on this topic for years. It is now at the top of my reading list....along with the other suggestions in this topic.

> Would love to see Knuth's Challenge setup in a repo for collaboration.

Go for it!

drdrey · on Sept 30, 2016

TEMU is a tool based on QEMU that is (was?) aimed at doing exactly that, a dynamic analysis that captures instructions executed in every process (including the kernel): http://bitblaze.cs.berkeley.edu/temu.html

The problem of course is that this is not analyzing the physical machine but rather the behavior of programs in a virtual environment.

wyldfire · on Sept 30, 2016

> There are definitely people who profile their applications ... the entire system, though? Is there a utility I can use to log every instruction performed in one second for an operating system running on bare metal?

Kinda. oprofile and later perf use the hardware's performance counters. So, it's sampled and therefore it's not "every instruction performed" but the scope is indeed the entire system. If your hardware doesn't support it, then the kernel has a sampling feature on a timer. It's IMO representative and probably the sanest way to accept the challenge.

But Knuth questions make me wonder: how could we possibly reason about correctness from the samples? We'd be better off following the trail of breadcrumbs back to the source unless he means something terribly specific beyond "correct or erroneous".

> Can it be done with an emulator like QEMU?

Yes, that's probably an alternative if sampled isn't sufficient.

no_protocol · on Sept 30, 2016

> We'd be better off following the trail of breadcrumbs back to the source

I'm thinking this is what was intended. Did something get lost along the way with all the layers of translation from "human problem" to machine language? Here's another quote from the source passage, linked elsewhere in this thread:

  The sequence of operations will be too difficult
  to decipher unless you have access to the source
  code from which the instructions were compiled.
  University researchers who wish to carry out such
  an experiment would probably have to sign
  nondisclosure agreements in order to get a look
  at the relevant source code.

shauncrampton · on Sept 30, 2016

I like the idea of making a trace through a complex app (such as a browser) and the kernel and listing off all the open source developers whose code it passes through. Just how many people contributed to my 1s of Youtube cat video watching pleasure?

CalChris · on Sept 30, 2016

Scalar CPU clock rates were maybe 25MHz in 1989.

OOO speculative superscalar hyperthreaded multicore CPUs now are 2.5GHz.

ultramancool · on Sept 30, 2016

This just makes me miss SoftIce. Just hit ctrl+d and be immediately dropped into exactly what your machine is doing at any time.

Of course, just doing this would be illegal on most proprietary operating systems.

je42 · on Sept 30, 2016

This and similar questions is a very good question for a job interview (as a theoretical thought experiment in the interview)

citrin_ru · on Sept 30, 2016

Analyze all CPU instructions even for one second is very hard. Good place to start is Dtrace FBT provider - capture and analyze all called functions for one second.

Perhaps it will be some useless work, which should not be done at all. But CPU time is cheap and programmer's time is expensive.

DonHopkins · on Sept 30, 2016

Reminds me of how Stanisław Lem went virtual and wrote fictitious reviews of a non-existent books that were far too vast to actually exist in the real world, including one called "One Human Minute" [1]:

"One Minute", for a faux book by J. Johnson and S. Johnson: One human minute, Moon Publishers, London - Mare Imbrium - New York 1985. The book is alleged to be a collection of statistical tables, a compilation that includes everything that happens to human life on the planet within any given 60 second period.

Here are some real reviews of a real book of fictitious reviews of fictitious books [2]:

KristenR rated it really liked it. Shelves: science-fiction, short-stories, male-author.

This volume had 3 essays, each with an interesting concept.

One Human Minute: Lem has styled this piece as a book review...of a book that hasn't been written. One Human Minute is apparently a Guinness Book of World Records that is completely mundane, yet also amped up on steroids. Imagine a book that is full of tables upon tables and graphs and charts about everything that happens on earth per minute. How many babies are born, how many people get struck by lightning, how many people are tortured by electricity, how many orgasms are achieved per minute...

Definitely a philosophical piece, but seemed to be musing about the depravity of the human race. I'm not sure if I missed the point.

The Upside-Down Revolution: The evolution of military and warfare...written under the premise that the author has a history book from the future and publishes it in the present as science fiction. I lost interest partway through this one.

The World as Cataclysm: I have a fascination with astrophysics. I am fully aware that the bulk of it goes over my head and I have near zero retention, but that doesn't stop me from reading/watching anything on the subject that is remotely geared towards the layman. Simply Fascinating. This piece goes into the probabilities of extraterrestrial life. I don't know what Stanislaw Lem's qualifications are, but as I was reading this I was nodding...uh huh, that makes sense...hmmm, I sense a little research project on Lem.

Rich Meyer rated it: really liked it. Shelves: read-in-2014.

An interesting trio of science fiction-tinged essays by one of the great science fiction writers. The title refers to one about a book that covers what happens on the planet every minute, and how the book is updated and computerized until it becomes a power unto itself. The other essays follow the search for intelligent life and the chances for finding it, and a look at the history of warfare from the point-of-view of a book from 2150, and manages to make some pretty accurate predictions.

[1] https://en.wikipedia.org/wiki/Stanis%C5%82aw_Lem%27s_fictiti...

[2] http://www.goodreads.com/book/show/28771.One_Human_Minute

joakleaf · on Oct 1, 2016

... I wonder what the human brain and body does in one second.

betimsl · on Sept 30, 2016

How about use an FPGA for this? Use an architecture and ALU and implement a software which would monitor and output op code and ALU for any given time.

dmfdmf · on Sept 30, 2016

I'd like to see this applied to Windows 7 and find out why it has become mysteriously slow, even on fast computers, after Windows 10 came out.