Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes, we have. What we do is not keyword extraction, our tool suggests tags based on probabilistic algorithms. For example, if your document contains the terms Bush and Obama it should be tagged as politics even if that word is not present in it. Compare to the Yahoo Extraction Tool, for example. This approach will not add new keywords that would help in a search. It's only useful to have an idea of what the document is about.

The main problem is not the algorithm but the input data. Our system learns from millions of tagged blog posts among other sources. The quality of the tags varies a lot, and most of the work we do is about deciding what data to use for training.



When is the API coming? I can see us using this quite a bit.


The API is already available although we haven't announced it. There is a WordPress plugin that uses it, called TagMahal.

Please contact us if you'd like to use the API, if you need to do up to 5k queries per day or so it shouldn't have much of an impact on our server.


I would like to see a Blogspot plug-in too! I just tried it on my latest post and it worked much better than I did. :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: