On my system it uses twice as much CPU as plain old ls in a directory with just 13k files. To recursively list a directory with 500k leaf files, lla needs > 10x as much CPU. Apparently it is both slower and with higher complexity.
Will definitely prioritize optimization in the next releases. Planning to benchmark against ls on various systems and file counts to get this properly sorted.