Content-Based Image Retrieval

superkuh · on July 19, 2022

Back in the python 2.4 days you could run ImgSeek and do content based image retrieval on the desktop easily. ImgSeek had a little MS Paint-style window to crudely draw colors like the image you wanted and ImgSeek would find it. It really sucks it was never ported to later versions of python since there's been nothing like it since. https://sourceforge.net/projects/imgseek/

telotortium · on July 19, 2022

Some of the code was developed a little farther in https://github.com/ricardocabral/iskdaemon. But you can still be the change you wish to see :)

TacticalCoder · on July 19, 2022

> It really sucks it was never ported to later versions of python since there's been nothing like it since.

I clearly remember a tool like that that was not on the desktop, from years and years ago. It was working quite well I'd say. For example you could draw a very crude sun and a blue sky (a simple example) and you'd get matching pictures. Now... It may just have been someone who wrapped ImgSeek and put it as a webapp: no clue. However I'm 100% positive I used an online tool exactly like what you're describing.

fzliu · on July 19, 2022

Great article. We used something very similar to help implement simlarity search at Yahoo a couple years back (https://yahooresearch.tumblr.com/post/158115871236/introduci...). We were using a indexing strategy called Locally Optimized Product Quantization, which worked great in terms of query times but required a training procedure which made successive inserts fairly inefficient.

Thankfully, we have a much wider variety of indexing options these days (https://milvus.io/docs/index.md) in addition to powerful vector databases (https://zilliz.com/learn/what-is-vector-database). I'm glad to see the barrier to entry for semantic image retrieval becoming lower and lower as ML infrastructure matures.

[EDIT] Disclosure: I work at Zilliz.

gk1 · on July 19, 2022

If folks just want to get started with vector search faster they can try https://www.pinecone.io.

Full disclosure: I work for Pinecone. It's important to disclose you work for a company if you're going to promote their links.

fzliu · on July 19, 2022

Good catch on the disclosure, I edited my original comment to reflect this fact.

On the topic of vector search, Milvus is another great vector database - it's open source and we provide single-line startup scripts via `docker-compose` in addition to installation via apt & yum (https://milvus.io/docs/install_standalone-docker.md). There are also no restrictions on the number of vectors that users can store. Internally, we've successfully scaled Milvus to handle billion+ vectors, while many of our users have stored hundreds of millions of vectors in a production environments as well.

bitforger · on July 19, 2022

We do this with Pinecone, but we use CLIP embeddings of images, and they work incredibly well. It's kind of crazy how easy it is to get semantic search of images these days.

CLIP also does caption embeddings, so you can lookup images via both images and captions.

minimaxir · on July 19, 2022

Seconding the recommendation of CLIP embeddings, especially compared to image histograms + requiring OpenCV.

I wrote a naive, minimal dependency Python package to calculate image embeddings (https://github.com/minimaxir/imgbeddings) with some lookup demo notebooks and it works well in a pinch, although it's due for an upgrade.

gk1 · on July 19, 2022

Hey! We love CLIP and plan to cover it later in this series.