Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Given that you can download and use the weights, the model architecture has to be includded as part of that. And I did read a paper from them recently describing their MoE architecture and how it differs from the original GShard.


Excuse me? What weights can you download from OpenAI? gpt2 does not count


Sorry I meant that DeepSeek release their models. Wrong context.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: