> I don’t know. After the model has been created (trained), I’m pretty sure that generating embeddings is much less computationally intensive than generating text.
An embedding is generated after a single pass through the model, so functionally it's the equivalent of generating a single token from an text generation model.
It depends on the architecture (you very well can convert a decoder-only causal model to an embeddings model, e.g. Qwen/Mistral), but it is true the traditional embeddings models such as a BERT-based one are bidirectional, although unclear how much more compute that inherently requires.
An embedding is generated after a single pass through the model, so functionally it's the equivalent of generating a single token from an text generation model.