"Maybe I should revert to storing ratings in PostgreSQL and accept what would certainly be a large performance hit during recommendation generation."
I wondered if you prematurely optimized here. Did you try Postgres in the first place? What was the performance like? I can't help but wonder if you dismissed Postgres simply because it wasn't as sexy as redis.
To be honest, when I first wrote my site (and, later, recommendable) it was for my senior project in school. I had only been programming in school so far (which was fairly different from programming in the real world) and this site was really my first real-world application. Back then I was using MySQL (!) as my main data store. I was storing Likes/Dislikes as polymorphic Rails models, but then I read about Redis and it sounded like the perfect fit for all the set math I would have to do. So when refreshing recommendations, I would pull the ratings out of MySQL and feed them into Redis to do some fast set math. Then I'd tear down the Redis sets. I only had friends on the site testing it out for me, so I didn't notice much of a problem.
Later, I switched my database to PostgreSQL but wasn't having Redis issues at that point, so I let it be. Then when goodbre.ws found itself with a lot of new faces, I found myself with the I/O bottleneck I mentioned in the article. I decided to try to solve that by moving the ratings to permanently reside in Redis. That way I'd only have to hit Redis a couple times to regenerate someone's recommendations. Then I started storing too much in my ZSETS, but here I am now.
Now that I've reduced that memory footprint again, I feel as though attempting to move the ratings back into Postgres would be a premature optimization in itself. I was, however, looking into some combination of Postgres' array types (to replace the sets) and hstore (to replace the ZSETS, but sort in the query). I think it's worth looking into, but I have some other things to focus on first!
One thing that's commonly done is to keep the data in a normalized format, but use a view or a function to return the data in whatever format you want (arrays, json, etc).
This lets the application request the data in the ideal format for consumption. The database then becomes a
"black box" service that returns a nested json graph -- the application no longer needs to know how the data is stored.
"Maybe I should revert to storing ratings in PostgreSQL and accept what would certainly be a large performance hit during recommendation generation."
I wondered if you prematurely optimized here. Did you try Postgres in the first place? What was the performance like? I can't help but wonder if you dismissed Postgres simply because it wasn't as sexy as redis.