Great solution, but... "Maybe I should revert to storing ratings in PostgreSQL a...

davidcelis · on March 21, 2013

To be honest, when I first wrote my site (and, later, recommendable) it was for my senior project in school. I had only been programming in school so far (which was fairly different from programming in the real world) and this site was really my first real-world application. Back then I was using MySQL (!) as my main data store. I was storing Likes/Dislikes as polymorphic Rails models, but then I read about Redis and it sounded like the perfect fit for all the set math I would have to do. So when refreshing recommendations, I would pull the ratings out of MySQL and feed them into Redis to do some fast set math. Then I'd tear down the Redis sets. I only had friends on the site testing it out for me, so I didn't notice much of a problem.

Later, I switched my database to PostgreSQL but wasn't having Redis issues at that point, so I let it be. Then when goodbre.ws found itself with a lot of new faces, I found myself with the I/O bottleneck I mentioned in the article. I decided to try to solve that by moving the ratings to permanently reside in Redis. That way I'd only have to hit Redis a couple times to regenerate someone's recommendations. Then I started storing too much in my ZSETS, but here I am now.

Now that I've reduced that memory footprint again, I feel as though attempting to move the ratings back into Postgres would be a premature optimization in itself. I was, however, looking into some combination of Postgres' array types (to replace the sets) and hstore (to replace the ZSETS, but sort in the query). I think it's worth looking into, but I have some other things to focus on first!

joevandyk · on March 21, 2013

One thing that's commonly done is to keep the data in a normalized format, but use a view or a function to return the data in whatever format you want (arrays, json, etc).

This lets the application request the data in the ideal format for consumption. The database then becomes a "black box" service that returns a nested json graph -- the application no longer needs to know how the data is stored.

See https://gist.github.com/joevandyk/5215014 for a simple example.