Hacker Newsnew | past | comments | ask | show | jobs | submitlogin



Interesting - looks like they're doing this via a bunch of special-case rules.

To any google engineers reading:

Please add `really-verbatim` mode, indicated by backtick quotes, which also requires strict matching of punctuation.


I'm a Google engineer way too far organisationally to ever have any say in this.

I wonder if that will ever be worth the hardware cost. Back when I did some coursework on information retrieval, it seemed that you get superlinear savings via reducing the cardinality of tokens. So you'd do stemming, remove all punctuation, words that are too frequent ("do", "be", "and", "or", ...)... Basically remove all grammar. You do the same to your search query and the index. This intuitively reduces your compute by at least an order of magnitude, especially for languages with rich grammar (e.g. stemming nouns in Polish reduces the cardinality of tokens by a factor of 7 and verbs by a factor of 162).


No way they'll inflate their indexes even 20% and add complexity into their algorithms for 0.1% queries that won't bring any additional income.


They don't necessarily have to inflate their indexes. Backtick-quoted results ought to be a subset of double-quoted results, so they can use the standard quoted search algorithm, and then filter out imperfect matches from those results.


Google searches ignore punctuation, so it's not even indexed, so there's no way to search for punctuation without inflating the index


Read what I said. They can use the standard index, then filter the results as a last pass.


I did read what you said. Imagine trying to search for

  [::]
as 8192kjshad09- suggested earlier in the thread. What standard index results are you going to filter? Since "[::]" isn't in the index, you won't have anything to go on. To do your back-tick really-verbatim searches, the index has to be enlarged.


Ah, sorry, I see what you mean.


I work for Google Search. We did look a this, and we'll keep looking to see if we can improve, but it turns out to be a very hard lift.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: