I wrote a blog post about embedding - and a sample application to show their use...

btbuildem · 2025-05-12T16:01:39 1747065699

How would you approach using them in a specialized discipline (think technical jargon, acronyms etc) where traning a model from scratch is practically impossible because everyone (customers, solution providers) fiercely guards their data?

A generic embedding model does not have enough specificity to cluster the specialized terms or "code names" of specific entities (these differ across orgs but represent the same sets of concepts within the domain). A more specific model cannot be trained because the data is not available.

Quite the conundrum!

minimaxir · 2025-05-12T17:18:54 1747070334

You can fine-tune existing embedding models.