The formulation of a succinct full text index as an "infinite n-gram model" is b...

jltsiren · on May 6, 2024

The idea is far older than BWA. The entire point of the Burrows-Wheeler transform is to create a compact order-k "language model" for every value of k simultaneously. And then use that for data compression.

David Wheeler reportedly had the idea in the early 80s, but he rejected it as impractical. Then he tried publishing it with Michael Burrows in the early 90s, but it was rejected from the Data Compression Conference. Algorithms researchers found the BWT more interesting than data compression researchers, especially after the discovery of the FM-index in 2000. There was a lot of theoretical work done in the 2000s, and the paper mentioned by GP cites some of that.

The primary application imagined for the FM-index was usually information retrieval, mostly due to the people involved. But some people considered using it for DNA sequences and started focusing on efficient construction and approximate pattern matching instead of better compression. And when sequencing technology advanced to the point where BWT-based aligners became relevant, a few of them were published almost simultaneously.

And if you ask Heng Li, he gives credit to Tak-Wah Lam: https://lh3.github.io/2024/04/12/where-did-bwa-come-from