I tried to use the embedding stuff a year ago but the results were lackluster, e...

I tried to use the embedding stuff a year ago but the results were lackluster, even with the larger embedding model.

With the new multimodal LLMs it seems a better approach might be to get a multimodal LLM to describe the image and list keywords, and then just use the included Meilisearch.

That said, I see they list some models I haven't tried, so perhaps time to try again.