Possibly, but the perplexity has shown to decrease while fine-tuning a 2048 mode...

Tostino on June 30, 2023 | parent | context | favorite | on: XGen-7B, a new 7B foundational model trained on up...

Possibly, but the perplexity has shown to decrease while fine-tuning a 2048 model on larger context sizes for outputs within it's original context limit...so, more research needed.