Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It depends on the architecture (you very well can convert a decoder-only causal model to an embeddings model, e.g. Qwen/Mistral), but it is true the traditional embeddings models such as a BERT-based one are bidirectional, although unclear how much more compute that inherently requires.

Compare to ModernBERT, which uses more modern techniques and is still bidirectional, but it is very very speedy. https://huggingface.co/blog/modernbert



Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: