I thought there were some open-source models in the 70-120B range that were GPT3...

dragonwriter · on Dec 6, 2023

Measuring LLM quality is problematic (and may not even be meaningful in a general sense, the idea that there is a measurable strict ordering of general quality that is applicable to all use cases, or even strongly predictive of utiity for particular uses, may be erroneous.)

If you trust Winogrande scores (one of the few where I could find GPT3.5 and GPT4 [0] ratings that is also on the HuggingFace leaderboard [1]), there are a lot of models between GPT3.5 and GPT4 with some of them being 34B parameter models (Yi-34b and its derivatives), and una_cybertron_7b comes close to GPT3.5.

[0] https://llm-leaderboard.streamlit.app/

[1] https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderb...

nkohari · on Dec 6, 2023

It depends on what's being evaluated, but from what I've read, Mistral is also fairly competitive at a much smaller size.

One of the biggest problems right now is that there isn't really a great way to evaluate the performance of models, which (among other issues) results in every major foundation model release claiming to be competitive with the SOTA.