From the abstract: > We show that this variation is further magnified under a ha...

dchichkov · on Jan 6, 2020

Well, yes, they do speculate in the abstract, note potentially there. It looks like authors do believe the variation is caused by manufacturing variation. But nothing in the paper actually shows it. There's no attempt to determine the cause of observed performance variation in the paper. An empirical survey.

Interestingly, that recommendation could be interpreted "empirical studies based on averaged node performance" give a very distorted view on "the true impact of manufacturing variation on processors".

mlyle · on Jan 6, 2020

The entire study seems to presuppose that most of what is measured is processor manufacturing variation. There's further recommendations about removing the variation with processor binning, etc.

It's an interesting set of measurements, but the assumed source of the variation is dubious, and it's not clearly what, if any, actions it really supports.

dchichkov · on Jan 6, 2020

Yeah. Maybe authors know something we don't. Or maybe were simply trying to get the paper accepted into a silicon-related conference.

rwem · on Jan 6, 2020

I wonder how much of this is due to measurement error of temperature. Core frequency and voltage control are governed by some suspiciously round numbers like Tj(max) == 90C. But when the controller thinks Tj == 90C, what's the measurement error?

mlyle · on Jan 6, 2020

A moderate amount. Some of the frequency curve involves the thermal diode, but much less so when TDP capping is used as in the paper.

But the temperature of the silicon itself has a lot to do with the performance you get at a given power level.

rwem · on Jan 6, 2020

Power limits have the same problem, don't they? Except it's maybe worse because you get the product of the error terms for Icc and Vcc?

mlyle · on Jan 6, 2020

Thermal diodes on dies have a reasonable error.

Voltages and currents are easy to measure relatively precisely.