Basically, it just selects all substrings that appear in the corpus up to a certain length (and then maybe trims it a little by discarding rare substrings or something to reduce the initial size a bit and make things faster).
If the library has two vocabulary learners, only one of which does the described thing, then isn't it unambiguous which implementation within the library the question refers to? And wouldn't it be ambiguous to instead say "how does Unigram do it" without referring to any particular implementation?
Anyway, the paper says "Frequent substrings can be enumerated in O(T) time and O(20T) space with the Enhanced Suffix Array algorithm (Nong et al., 2009)", which is hilariously underspecified, at least in part because a suffix array algorithm isn't a top-k algorithm.
It appears to be the top n-grams scored by the product of frequency and length. Including the frequency weighting is a bit nonstandard among ablative methods.