Really feels like an exercise in making a bad solution fly (Generating a huge triangle mesh before rendering it). If you use distance functions(SDF) to your Julia set then with raymarching you can compute the surface directly without taking the detour via a huge mesh, dunno exactly what Julia set they are using here but other sets have been directly rendered for the last 10 year in realtime (adding pathtracing would probably set back the perf a tad but still be faster than this detour).
Seems the point of the paper is an improvment upon a technique for computing a quaternion Julia from 2015. They found a way to compute the rendering algorithm for the mesh within the limits of 64 bit precision, enabling massively multi-threaded computation. I admit it is a lot of pure math for a casual read, but I think the intent was rendering the set as demonstration of why their finding is significant. Not necessarily that they found a better way to render the set.
They say in the article that while there are distance estimation functions for normal quadratic Julia sets, such functions are not known for these higher-order rational Julia-like functions.
So the day after writing this i noticed that Inigo went ahead and made a raymarching rendering of the bunny on _CPU_ in a _MINUTE_, seems there was problems with the GPU variation that probably would have been interactive.
> the scalability of the technique was limited because it used high-order rationals requiring 80 bits of precision. We address the sources of numerical difficulty and allow the same computation to be performed using 64 bits. Crucially, this enables computation on the GPU, and computing a 10-billion-triangle model now takes 17 days instead of 10 years.
While improving numerical stability is interesting and useful work, a GPU isn't strictly limited to 64 bit math. You can glue two 64-bit floats together to act like a much more precise float. It will be a big hit to efficiency, but nowhere near a 200x hit!
How do you propose glueing together two floating-point numbers? Concatenating the exponents and mantissas sounds good, but isn't implementable using the existing floating-point hardware because the operations are now intrinsically coupled between the two primitive numbers.
The other posters are right, double double and quad doubles.
The way to think of them is for a real number r, a double d(r) is the closest value storable in a double, but there may be some error between the real value and the floating-point approximation.
Store that in another double, called dd(r) = r-d(r), and it will have a smaller exponent, but most importantly, it gives you twice as many bits of precision.
Then carefully make +-*/ operations, and you're off to the races.
I've implemented them from scratch on many platforms over the years, often to do deep mandelbrot runs, since they have really good performance for the between double and arbitrary precision libraries.
But it's always the same idea: one double as normal, and another double to represent the difference between the value you care about and the double that represents the higher order bits.
Double-double and quad-double arithmetic have even been implemented for the GPU as research prototype libraries (gpuprec [0,1] and campary [2,3] among other).
"Such an algorithm is useful, because ever since "A Bug’s Life" in 1998, there is usually at least one scene in each Pixar movie that exceeds the capacity of even the most memory-rich node in the render farm. The scene usually appears well after R&D has concluded, so ad hoc solutions must then be employed, e.g., manually trimming the scene until it barely fits in-core."
https://en.wikipedia.org/wiki/Duff's_device