Is EC2 really the right way to do these tests? I would have thought it would introduce unwanted variables related to network congestion and server load caused by other EC2 users.
You might get a better number on bare metal, and you certainly won't get a worse one, so in a sense it's not ideal.
On the other hand, a big chunk of the web is run from AWS. Demonstrating performance there implies performance in many places. It's also an easily reproducible environment, which is helpful for benchmarking.
Fair question. We had some reservations about using EC2 for the reasons you mention. But we wanted to avoid any perceptions that we'd somehow cooked the books, so decided to run the tests on "neutral" gear.
If you look at the report details, you'll see some of the lumpiness your question suggests, particularly related to the network. The Node instances were beginning to starve when database server CPU utilization was still pretty low.
We would certainly have gotten better numbers if we ran these benchmarks on bare metal or a closely-tended cloud infrastructure. But we're a pretty low-BS group, so we'll stand behind what EC2 gave us.