Hm, maybe. It depends on how easy their training code is to use and how long retraining would take. It presumably will take at least a week because 345M took about a week, but I'm not sure I want to spend the money on a week of a very large cloud instance (which would be what, $300?) for what is probably a substantial but not stunning improvement in generation quality.
I might rather wait for the next leap, from something like a Sparse Transformer approach which can get global coherency by having a lookback over the entire poem or getting a better poetry corpus with delimited poems (rather than entire books).
I might rather wait for the next leap, from something like a Sparse Transformer approach which can get global coherency by having a lookback over the entire poem or getting a better poetry corpus with delimited poems (rather than entire books).