Could you explain how supporting multiple languages increases the parameter count so much? I'm genuinely curious.
LLMs seem to be comfortable with hundreds of programming languages, DSLs and application specific syntaxes so how does supporting a couple more natural languages become so expensive?
I see how more training data would be needed, but I don't understand how that maps to a greater parameter count.
If you focus on english only, this can easily reduce the paramters 5fold