I was working with Mercury Parser (pluggable parsing for different sites) in the past.
https://github.com/postlight/mercury-parser