Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Content-Based Image Retrieval (pinecone.io)
43 points by gk1 on July 19, 2022 | hide | past | favorite | 9 comments


Back in the python 2.4 days you could run ImgSeek and do content based image retrieval on the desktop easily. ImgSeek had a little MS Paint-style window to crudely draw colors like the image you wanted and ImgSeek would find it. It really sucks it was never ported to later versions of python since there's been nothing like it since. https://sourceforge.net/projects/imgseek/


Some of the code was developed a little farther in https://github.com/ricardocabral/iskdaemon. But you can still be the change you wish to see :)


> It really sucks it was never ported to later versions of python since there's been nothing like it since.

I clearly remember a tool like that that was not on the desktop, from years and years ago. It was working quite well I'd say. For example you could draw a very crude sun and a blue sky (a simple example) and you'd get matching pictures. Now... It may just have been someone who wrapped ImgSeek and put it as a webapp: no clue. However I'm 100% positive I used an online tool exactly like what you're describing.


Great article. We used something very similar to help implement simlarity search at Yahoo a couple years back (https://yahooresearch.tumblr.com/post/158115871236/introduci...). We were using a indexing strategy called Locally Optimized Product Quantization, which worked great in terms of query times but required a training procedure which made successive inserts fairly inefficient.

Thankfully, we have a much wider variety of indexing options these days (https://milvus.io/docs/index.md) in addition to powerful vector databases (https://zilliz.com/learn/what-is-vector-database). I'm glad to see the barrier to entry for semantic image retrieval becoming lower and lower as ML infrastructure matures.

[EDIT] Disclosure: I work at Zilliz.


If folks just want to get started with vector search faster they can try https://www.pinecone.io.

Full disclosure: I work for Pinecone. It's important to disclose you work for a company if you're going to promote their links.


Good catch on the disclosure, I edited my original comment to reflect this fact.

On the topic of vector search, Milvus is another great vector database - it's open source and we provide single-line startup scripts via `docker-compose` in addition to installation via apt & yum (https://milvus.io/docs/install_standalone-docker.md). There are also no restrictions on the number of vectors that users can store. Internally, we've successfully scaled Milvus to handle billion+ vectors, while many of our users have stored hundreds of millions of vectors in a production environments as well.


We do this with Pinecone, but we use CLIP embeddings of images, and they work incredibly well. It's kind of crazy how easy it is to get semantic search of images these days.

CLIP also does caption embeddings, so you can lookup images via both images and captions.


Seconding the recommendation of CLIP embeddings, especially compared to image histograms + requiring OpenCV.

I wrote a naive, minimal dependency Python package to calculate image embeddings (https://github.com/minimaxir/imgbeddings) with some lookup demo notebooks and it works well in a pinch, although it's due for an upgrade.


Hey! We love CLIP and plan to cover it later in this series.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: