There would be need to a state specifically for “the cow jumped over the” (and a...

thesz · 2025-09-23T21:16:02 1758662162

> It ‘just’ needs a state for every possible random color between.

You can use skipgrams - prefixes with holes in them.

Sparse Non-negative Matrix Language Model [1] uses them with great success.

The pure n-gram language models would have hard time computing escape weights for such contexts, but mixture of probabilities that is used in SNMLM does not need to do that.

If I may, I've implemented an online per-byte version of SNMLM [2], which allows skipgrams' use. They make performance worse, but they can be used. SNMLM's predictive performance for my implementation is within percents to performance of LSTM on enwik8.

[2] https://github.com/thesz/snmlm-per-byte