I am always skeptical of RNN approaches but this paper is just sparsifying the input, it is not compressing any size input to a fixed memory. I am hopeful maybe this is a big break. 11x inference speedup with no degradation from an algorithmic improvement. Is it really that good? almost too good to be true. Adoption in the next 6 months will tell us the truth.