Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What's the most obvious standouts?

In my experience, smaller models tend to do well on benchmarks and fail at generalization. Phi-2 comes to mind.



It's multilingual. Genuinely. Compared my results with some people on reddit and the consensus is that the 27B is near perfect in a few obscure languages and likely perfect in most common ones. The 9B is not as good but it's still coherent enough to use in a pinch.

It's literally the first omni-translation tool that actually works that you can run offline at home. I'm amazed that Google mentioned absolutely nothing about this in their paper.


Wow, that's very impressive and indeed a game changer. I've previously had trouble with various Scandinavian languages, but last I checked with was Llama 2 and I kind of gave up on it. I had expected we were going to need special purpose small models for these uses as a crutch, like SW-GPT3.

So I guess Gemma 2 is going to become Gemini 2.0 in their truly large and closed variants then? Or is it the open version of Gemini 1.5?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: