I guess this makes sense. Because while there should be some noise from the text translation into the internal representation of the financial data once ingested into the model, the authors purposefully re-formatted all the reports to be formatted consistently. That then should allow the model to essentially do less of the LLM magic and more plain linear regression of the financial stats. And often past performance does have an impact on future performance, up to a point.
I wonder what the results would have been with still-anonymized but non-fully standardized statements.
I wonder what the results would have been with still-anonymized but non-fully standardized statements.
Still though, impressive.