They represent the sequence as a bag of n-grams, and feed that into the classifi... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

_delirium on July 9, 2016 | parent | context | favorite | on: Bag of Tricks for Efficient Text Classification

They represent the sequence as a bag of n-grams, and feed that into the classifier, rather than feeding the sequence directly. The paper basically combines variants on a few old techniques (although a few of the variants are significant and recent), but the interesting result is that they show that put together in the right way and tweaked a little, they're competitive in accuracy with state-of-the-art deep neural network models, at least on some problems, while being much faster to train. Section 2 of the paper, although pretty brief, is where this info is.

SomewhatLikely on July 9, 2016 [–]

Specifically the bag of n-grams can be viewed as a very sparse vector with non-zero entries corresponding to the n-grams in the bag. As a result, n-grams not seen during training need to be ignored.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact