I tried to use the embedding stuff a year ago but the results were lackluster, even with the larger embedding model.
With the new multimodal LLMs it seems a better approach might be to get a multimodal LLM to describe the image and list keywords, and then just use the included Meilisearch.
That said, I see they list some models I haven't tried, so perhaps time to try again.
With the new multimodal LLMs it seems a better approach might be to get a multimodal LLM to describe the image and list keywords, and then just use the included Meilisearch.
That said, I see they list some models I haven't tried, so perhaps time to try again.