It is somewhat unclear from his writing but did he compare this metaprogramming solution against OpenBlas or Atlas or just the reference BLAS/LAPACK implementation? If I was trying to do any significant linear algebra computations, I would definitely first try an optimized BLAS/LAPACK solution first.
Also, it seems the author solved a triangular system. LAPACK has special routines for that. Were they used?
LAPACK is typically optimized for larger systems, not 5 unknowns. Also, 5 is not a great number for vectorized operations - it might even be beneficial to zero/one pad the matrix.
Optimized LAPACK is often 5-10x faster than «basic LAPACK».
BLAS/LAPACK was originally written in the 70s/80s and sometimes unroll loops to the tune of 5 or 7 - it made sense then. Not so much these days.
I didn’t read the article in detail, but there appear to be a lot of holes in it.
Personally, my biggest reason to prefer LAPACK in general is that its authors have already put a great deal of effort into correctness and numerical stability, so I don't have to. Even basic LAPACK is pretty fast, let alone the optimized libraries. Hand-optimizing my own special case is an absolute last resort.