Cool to see this but I am always surprised how often LLM output is used to train other LLMs. They used gpt3.5 turbo and gpt4 for multiple tasks. Even simple translation of englisch benchmarks and writing German poems to train on, in order to create an LLM that works better in German?
AFAIK, this still goes against OpenAI TOS and also the basic idea that training on AI output leads to worse results in general. Was there some major shift in this over the years, or has it simply become the default approach due to it being easy to do?
Also nice to finally see something from Hessian.AI, as a local, I heard them talk big more than once but never saw results. I wonder what Aleph Alpha thinks about this, since they want to make "AI made in Europe to challenge OpenAI"