> the scalability of the technique was limited because it used high-order rationals requiring 80 bits of precision. We address the sources of numerical difficulty and allow the same computation to be performed using 64 bits. Crucially, this enables computation on the GPU, and computing a 10-billion-triangle model now takes 17 days instead of 10 years.
While improving numerical stability is interesting and useful work, a GPU isn't strictly limited to 64 bit math. You can glue two 64-bit floats together to act like a much more precise float. It will be a big hit to efficiency, but nowhere near a 200x hit!
How do you propose glueing together two floating-point numbers? Concatenating the exponents and mantissas sounds good, but isn't implementable using the existing floating-point hardware because the operations are now intrinsically coupled between the two primitive numbers.
The other posters are right, double double and quad doubles.
The way to think of them is for a real number r, a double d(r) is the closest value storable in a double, but there may be some error between the real value and the floating-point approximation.
Store that in another double, called dd(r) = r-d(r), and it will have a smaller exponent, but most importantly, it gives you twice as many bits of precision.
Then carefully make +-*/ operations, and you're off to the races.
I've implemented them from scratch on many platforms over the years, often to do deep mandelbrot runs, since they have really good performance for the between double and arbitrary precision libraries.
But it's always the same idea: one double as normal, and another double to represent the difference between the value you care about and the double that represents the higher order bits.
Double-double and quad-double arithmetic have even been implemented for the GPU as research prototype libraries (gpuprec [0,1] and campary [2,3] among other).
While improving numerical stability is interesting and useful work, a GPU isn't strictly limited to 64 bit math. You can glue two 64-bit floats together to act like a much more precise float. It will be a big hit to efficiency, but nowhere near a 200x hit!