Python NLTK Bayesian Classifier for word sense disambiguation - 92% accuracy

terra_t · on Nov 30, 2010

92% accuracy, unfortunately, isn't good enough.

Bag-of-words models perform pretty well at classification and search, and the main thing you need to improve search is to boost scores when words are close together.

You might think you could improve performance by using semantically better defined features, but even 92% accuracy adds enough noise to foil your plans.

It's a big problem in A.I. systems that have multiple stages. You might have 5 steps in a chain which are each 90% accurate, but put them together and you've got a system that sucks. Ultimately there's a need for a holistic approach that can use higher-level information to fix mistakes and ambiguities at the lower levels.

endtime · on Dec 1, 2010

92% in general would actually be really good for word sense disambiguation, but..."Apple" is a really easy choice. I'd like to see how he does with a trickier word like "right" (as in civil, vs. not wrong, vs. not left).

terra_t · on Dec 1, 2010

Yes, it is good, but not good enough for many applications. You're also left with the issue that one kind of "apple" is more common than the other kind of "apple" so the baseline accuracy of something that always assumes it's one kind of apple might be surprisingly good.

That said, text-to-speech is a system where it's important to do disambiguation of a particular set of words. For instead,

"I read the news today, oh boy", "read" sounds like "red"

"I read the news every day", "read" sounds like "reed"

You need to be able to disambiguate the word sense to be able to correctly read the world "read". There are maybe 20 or so very common words that are like this, so a modest amount of work in this area would be part of a good TTS system.

temugen · on Dec 1, 2010

Perhaps 92% is poor in a lot of scenarios, but for this type of approach it's a good accuracy (within some 1-delta confidence). The unfortunate part is that it really was just a simple naive bayes bag-of-words, and it's not surprising that it did so well on one test case (apple). Extending that to help general NLP in any way would be much more difficult.

djacobs · on Dec 1, 2010

To be fair, he did this in a couple of hours.

terra_t · on Dec 1, 2010

Sure, but the fact is that nobody actually needs a "word-sense disambiguator", they need a search system with better accuracy, or a classifier with better accuracy or an information extraction system that turns text into facts.

Many areas in NLP are like this. You can get 92% accuracy in a few hours of work, and then you can get 93% after a week or work, and then you can write a whole PhD thesis about how you got 94% accuracy.

To a certain extent, there are approaches, such as the Support Vector Machine that are "unreasonably effective" but once you get past that, you often have to confront issues that everybody wants to sweep under the rug to make a real breakthrough.

For instance, there was that NELL paper that came out a few months ago; NELL extracted facts from text but it had no idea that "Barack Obama is the President of the United States" was true in 2010, and that "Richard Nixon is the President of the United States" was true in 1972. If you can't handle the fact that different people believe different things and that statements have expiration dates, no wonder you can only get 70% accuracy in IX

alanman25 · on Nov 30, 2010

As someone who has spent a considerable amount of time studying NLP, I have to say that this post outlines a pretty naive approach when it comes to disambiguating words.

Here are some questions:

- What happens when we change the language model? - What happens when we intersperse language models (English phrases within Chinese)? - What if someone were to just say "i love apple"?

This post title is also very misleading. The 92% accuracy reflects only one particular use case. How about attempting to disambiguate hundreds and thousands of terms?

l0nwlf · on Dec 1, 2010

> As someone who has spent a considerable amount of time studying NLP

I'm quite interested in how will you approach this problem ?

vietor · on Nov 30, 2010

A blinking favicon? Seriously? I couldn't finish the article because my eyes jumped to the tab bar every 5 seconds.

Has anyone else seen this elsewhere? It's new to me and I was surprised by how obnoxious it was given that the web isn't exactly a stranger to obnoxious flashing content.

dkarl · on Dec 1, 2010

It doesn't blink for me. (In case you're wondering why you're getting downvoted -- though I didn't downvote you myself.)

vietor · on Dec 1, 2010

[EDIT: Can't edit the top message anymore, but the problem was broken browser and amusing 'blink' vs ' blink' confusion as detailed deeper in the thread. Also in chrome it doesn't appear to animate, removing the annoyance entirely.]

Interesting. I asked a few coworkers if it was just me and they confirmed it. The actual favicon doesn't blink for you? http://www.litfuel.net/favicon.ico

It reports itself as a 6 frame gif for me. Maybe your browser is just more sane than mine (Firefox 3.x) and refuses to honor animated favicons?

I figured I was getting rightly downvoted because I wasn't saying anything about NLTK.

dkarl · on Dec 1, 2010

I tried Firefox 3.6 out of curiosity, and it doesn't blink, even when I open the favicon itself in a browser tab. I think you must have changed your browser config at some point and forgotten about it.

vietor · on Dec 1, 2010

So this is actually pretty funny.

In my local version of firefox, the entire icon vanishes for about 5 seconds, every 5 seconds. Causing a notable visual disturbance. In Chrome on another system the eyes blink every 5 seconds or so. So yeah, my browser is being broken.

But when I asked people if it blinked ... they, of course, said yes.

EDIT: Also, as noted in an edit above, when viewed in Chrome it didn't animate unless the image was accessed directly. So for you it may blink, not blink, or really blink, depending on your browser and configuration...

natch · on Dec 1, 2010

Terrible choice of words to use, since the capitalization (or not) of the word carries so much information. It's hard to take this seriously given his apparent obliviousness.

fauigerzigerk · on Dec 1, 2010

It may not be the best choice of words, but capitalization plays no role in his tests as everything is transformed to lower case on line 30: http://pastebin.com/4B1xHHht

fibonacci1 · on Dec 1, 2010

He did it for one word. Bad article title.

beagledude · on Dec 1, 2010

It was actually a method of using wikipedia to build your corpus for any ambiguous word to automatically build some word sense disambiguation in your application. One word was just a simple example of using that data.

danieldk · on Dec 1, 2010

- The article does not add anything new. Using Wikipedia for word sense disambiguation has been a hot topic for some years. [1]

- The article title implies that this is somehow a spectacular finding. Doing word sense disambiguation for one word is not that interesting, and there is no comparison with existing methods to show that this is actually a high score. I suspect that it is not that spectacular, since 'Apple' is relatively easy to disambiguate using a few context words.

[1] E.g. see:

- Using Wikipedia for Automatic Word Sense Disambiguation, R. Mihalcea, 2007, for a discussion of using Wikipedia to train a word sense disambiguator.

- Integrating multiple knowledge sources to disambiguate word sense: An exemplar-based approach, H.T. Ng and H.B. Lee, 1996, provide a good overview of types of features that can be used in disambiguation. They use features that go beyond simple 'bag of word' and 'bag of n-gram' features, e.g. by using syntactical patterns.

There is a whole lot more research of course, but just to show two examples that describe far more sophisticated approaches.

tkahnoski · on Nov 30, 2010

This is an interesting exercise in building a very specific word disambiguator ('apple' the company vs 'apple' the fruit).

It is a testmanet to NLTK that this can be accomplished in less than 100 lines.

danieldk · on Dec 1, 2010

Maybe apart from stemming), it's not hard to implement this in ~100 lines without NLTK.:

- In naive Bayes classification, model parameters can usually be estimated using relative frequencies in the training data.

- WordPunctTokenizer is a very simple tokenizer that makes anything matching \w+ and [^\w\s]+ a separate token.

- Extracting Bigrams from a list of tokens is trivial.

Of course, using NLTK will be very helpful in many situations, but this is hardly a testament to NLTK.

jlees · on Dec 1, 2010

I always prefer to see these things in context (how well does a naive rule-based classifier do? what's the P/R/F-score i.e. how many Apples are apples and apples are Apples? What about Apple Records?)

It's still fun to remember how quick and easy something like this is though. Any interest in similar articles on named entity recognition, sentiment/topic classification and spam filtering? I've been meaning to do a few for a while, but you know how it is.

beagledude · on Dec 1, 2010

I'd love to see some more articles on the subject out there, definitely take the time to post something. Entity or sentiment would be my first choices :)

cybernytrix · on Dec 1, 2010

I'm surprised that no one mentioned this paper that first evaluated this approach to using Wikipedia data: http://www.aaai.org/Papers/IJCAI/2007/IJCAI07-259.pdf That said, the major drawback of using Wikipedia is the size. If this approach is to be used for all words (not just Apple) then the total training corpus will be several GBs. Definitely not practical...

nl · on Dec 1, 2010

What's not practical about it?

GBs of data are pretty easy to handle these days

cybernytrix · on Dec 2, 2010

GBs and TBs of data is common, not for this task. All you are doing is Word Sense Disambiguation and there are algorithms to do WSD that work with much much smaller training sets. Just don't think that the exponential increase in training data is justified...