> We show that this variation is further magnified
under a hardware-enforced power constraint, potentially due to
the increase in number of cores, inconsistencies in the chip manufacturing process and their combined impact on processor’s energy
management functionality
and recommendations:
> • Characterizing node performance based on averaged performance distorts the true impact of manufacturing variation on
processors on the node and therefore should be avoided
Well, yes, they do speculate in the abstract, note potentially there. It looks like authors do believe the variation is caused by manufacturing variation. But nothing in the paper actually shows it. There's no attempt to determine the cause of observed performance variation in the paper. An empirical survey.
Interestingly, that recommendation could be interpreted "empirical studies based on averaged node performance" give a very distorted view on "the true impact of manufacturing variation on processors".
The entire study seems to presuppose that most of what is measured is processor manufacturing variation. There's further recommendations about removing the variation with processor binning, etc.
It's an interesting set of measurements, but the assumed source of the variation is dubious, and it's not clearly what, if any, actions it really supports.
I wonder how much of this is due to measurement error of temperature. Core frequency and voltage control are governed by some suspiciously round numbers like Tj(max) == 90C. But when the controller thinks Tj == 90C, what's the measurement error?
> We show that this variation is further magnified under a hardware-enforced power constraint, potentially due to the increase in number of cores, inconsistencies in the chip manufacturing process and their combined impact on processor’s energy management functionality
and recommendations:
> • Characterizing node performance based on averaged performance distorts the true impact of manufacturing variation on processors on the node and therefore should be avoided