Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
How to build a simple image and primary-colour similarity database (wearewizards.io)
70 points by teh on Jan 2, 2015 | hide | past | favorite | 14 comments


The first thought I had would be to put this into HSV, and then index that with an R-tree [1]. This way, you can do nearest neighbor kind of stuff, similar to geospatial indexing.

[1]: http://en.wikipedia.org/wiki/R-tree


I've had good results from converting to Lab space and geospatial indexing the ab components.

http://en.m.wikipedia.org/wiki/Lab_color_space


You can't use HSV because the apparent differences in H / S / V aren't linear. You need to convert it to a different color system:

https://en.wikipedia.org/wiki/Munsell_color_system

What is interesting about this color system is that it isn't symmetric (your eye is better at differentiating certain hues like blue at more maximal values than others) so more care is needed on the nearest neighbour lookups.

If you are using this coloring scheme for labelling colors in a photograph then you also need to adjust for colors that are especially vibrant and overtake the photo. For example this photo:

http://1.bp.blogspot.com/-OkTS4Wo34UM/Uwg568lRl2I/AAAAAAAAFi...

Is best tagged "pink" not "brown".


> A few obvious choices are a better ranker or transforming to perception-friendly colour spaces like HSV.

HSV is not good for perceptual comparison. Try HSP [1].

[1] http://alienryderflex.com/hsp.html


As with anything in image machine learning, it's all about the complexity of your feature space and how well it captures your business needs. Starting with nearest-neighbors with color-based features is a great starting point, and it's always great to see posts that start from scratch and show how far you can get.

For those interested in some of the commercial solutions in the space, Cortexica ( http://www.cortexica.com/ ) is doing interesting work using neural networks for fashion image similarity. Sadly, they seem to be focusing on white-label solutions rather than having an API.


For those of you interested in this subject, there is a library called LIRE (http://www.semanticmetadata.net/lire/) that's been around for a long time which will index various color and textures features into a feature vector compatible with Lucene. It does end up with the interesting scenario of requiring the search query itself to be an image, but if you are looking for a solution whereby for a given image you can find other images most similar to it (and ranked, of course), this does the job surprisingly well.


Why not a quick check of the upper-left pixel color, and if that's similar to one of the identified "popular" colors, remove it? I'd suspect most of the source material would be photographed professionally on a uniform background color.

I chose upper-left simply because these sorts of photos would likely would be framed in a way that the background is visible in the upper margins, while possibly cropping the object at the bottom of the frame.


That's a good suggestion! Especially having a "sanity check" against a popular colour which should increase robustness when new images arrive. Robustness is such a tricky subject, especially when the loss, i.e. the cost of returning a wrong image is fairly high. For fashion items the occasional odd colour is OK but accidentally returning naked people can be devastating for a product.

I want to stress that this code is massively simplified to avoid obscuring the main plot. There's so much we skipped: the code in the blog post actually computes pretty terrible results compared to an even moderately tuned pipeline :)


>> There's so much we skipped: the code in the blog post actually computes pretty terrible results compared to an even moderately tuned pipeline :)

That's fine. I am a day away from starting an image processing app and was thinking of Python and some libraries. This blog confirms that idea and gives the right amount of "starting point" for my effort. My requirements are very different so showing a more specific solution while still interesting is of less universal value. Perhaps a part-II with more application specific methods would be good.


Doesn't seem logical to me. What if you have a white coat, with black trim, with white background. Wouldn't your solution incorrectly remove white from the results?


Exceptions to every rule.

I would submit that most professionally photographed fashion items would endeavor to include sufficient contrast between the background and the item being photographed.


The page you linked to is very pleasant to view. The colors, spacing, and layout are remarkable. Your tutorial is great, too.


That was a good read. We are currently working on a tool to match product images and its hard (but fun) to do. I'd like to know if anyone has created anything like this already in Python and if so if you'd be prepared to have a chat with us?


Great post. I was part of a team that built a commercial image comparison engine, and worked on a lot of this stuff directly. Feeling all nostalgic now.

Some of the issues that anyone attempting similar will bump into are:

(1) If you're looking at shape features, a heavy dependency on product positioning. We found you need to do classification and use secondary features to get around this.

(2) Avoiding matching on background colours. For flat-background images you can sample the corners. For gradient backgrounds you can need some kind of edge detection.

(3) Shadow removal. A lot of the colours extracted from images are shadows of the actual perceived colour. You can generally pick the perceived colour by clustering the relevant pixels by hue/saturation and picking the middle of the lightness curve bit it varies by material.

(4) Data. Fashion retail products are numerous and go out of stock very quickly so to get any real value out of this specific example you need a nice efficient data pipeline and capacity to handle constant change

(5) Query performance / index size. We used an inverted index based on Lucene to query the data. To get decent relevance you need to include more visual words than you'd typically use in a text query. In our setup we found 25-30 visual words per feature was about right. This meant you needed to keep the index size under a million records, which meant we needed a distributed index. MLT queries are useful here, but last time I looked, which was admittedly a long time ago, Lucene had an annoying performance problem that I had to work around (https://issues.apache.org/jira/browse/LUCENE-1690)

(6) Value prop. Make sure you're optimising for the right use case, because as other commenters have pointed out, the relevance of this kind of system needs to be tuned to what you want to accomplish with it and the kind of data you'll be processing.

Nobody should let themselves be put off by that list though. There's a lot of value in it, and it's a lot of fun to play with. Happy hunting :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: