Exactly, BloombergGPT performed worse on financial sentiment analysis then much ...

Exactly, BloombergGPT performed worse on financial sentiment analysis then much smaller fine-tuned Bert-based models.

For many extractive tasks BloombergGPT was quite disappointing. A 5-10% performance hit with much larger inference cost compared to smaller models is not desirable.

But the research investment for Bloomberg makes sense to take the risk: a do-it-all generative model can mean significant complexity reduction in maintenance and deployment overhead.

It didn't directly pay off for many extractive tasks, but I bet they're iterating. Bloomberg has the data moat and the business needs in their core products to make it worthwhile.