That's a classic. While the technical stuff was neat, I still remember the line "Despite being around 90% complete, the last 90% still remained to be done."
If I would want to solve that problem, first thing I would have tried is bypassing text representation of these shaders. For a few things in the past I have generated, and dynamically patched, D3D11 shaders directly in DXBC byte code, without HLSL anywhere. The byte code format is even documented by Microsoft. Not the complete DXBC files though, but some people on the internets have reverse engineered the missing pieces.
Even if you use byte code, the driver will have to compile your shader for the target hardware. It won't be nearly as fast as what the game does on console - loading a blob in GPU memory. It will definitely be faster than compiling a text representation, but still not fast enough to skip the uebershader.
Right, I know about the JIT compiler in the user mode half of GPU drivers. It’s just much faster than the source code compiler.
It’s possible the DXBC byte code generation might work fast enough for their application without the overhead of the ubershader, or the complexity of their hybrid approach.