Our paper [1] is kind of a goofy adversarial thing where we thought "here's this cool metric, how can we break it?". The tokenizers we propose are definitely not tokenizers you should use in practice.
The original paper that proposes the metric is, imo, much more interesting theoretically [2].
Our paper [1] is kind of a goofy adversarial thing where we thought "here's this cool metric, how can we break it?". The tokenizers we propose are definitely not tokenizers you should use in practice.
The original paper that proposes the metric is, imo, much more interesting theoretically [2].
[1]: https://aclanthology.org/2024.lrec-main.1469/
[2]: https://aclanthology.org/2023.acl-long.284/