There was no parameter creep with Llama. Llama 8B is actually a ~7B model compar...

		kouteiheika on June 27, 2024 \| parent \| context \| favorite \| on: Gemma 2: Improving Open Language Models at a Pract... There was no parameter creep with Llama. Llama 8B is actually a ~7B model comparable to Mistral 7B if you strip away multilingual embeddings and match what Mistral 7B supports.