Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I guess if you're really focused on it. I built a content tagging system for an old employer that would attempt to guess context based on keywords and associations but give the writer of the content the final say in what's actually being tagged.

Sure, I could have spent a thousand hours refining it, but the improvement would have been marginal and it still would need human interaction.



Was it used for content related to that particular business? I think as long as you have relatively limited variety, you can make something that works well enough.


The content was a publisher. Writers would submit articles, the system would automatically tag them, and the system was good enough that usually the writer or editor would weed out a false positive or two, which is about the same result as all these machine learning use cases with thousands of hours of dev time.

IIRC, terms were weighted, so that some of them needed to have more instances in the articles than others in order to be included in the final tag results.

Locations were one-offs, but specific topical items required more mentions because of false positives. And then there were things we called branch-offs. Branch-off tags occurred when a topic was mentioned enough to be a tag but there's another name that some segment of the population would know it by.

For example, the fish known as the white crappie are known in Louisiana as sac-au-lait, but people would also spell the word sacalait or sac-a-lait. So when we would get an article from an author in the Carolinas, they have no familiarity with a term that is the dominant one in Louisiana, but the software would add the tag anyway, which also exposed it to our site search.


Similarly, I think if you have a limited number of people doing the classification, you can also make a good shot at it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: