I love this. It is such an easy to grasp example of what is "wrong" with search. Historically, searching was keyword based so documents with "shirt" and "stripes" would rank highly, even though none of those pages had the keyword "without".
As humans we know immediately that the search is for documents about shirts where stripes are not present. But the term 'without' doesn't make it through to the term compositor step which is feeding terms in a binary relationship. We might make such a relationship as
Q = "shirt" AND NOT "stripes"
You could onebox it (the Google term for a search short circuit path that recognizes the query pattern and some some specific action, for example calculations are a onebox) and then you get a box of shirts with no stripes and an bunch of query results with.
You can n-gram it, by ranking the without-stripes n-gram higher than the individual terms, but that doesn't help all that much because the English language documents don't call them "shirts without stripes", generally they are referred to as "plain shirts" or "solid shirts" (plain-shirt(s) and solid-shirt(s) respectively). But you might do okay punning without-stripes => plain or to solid.
From a query perspective you get better accuracy with the query "shirts -stripes". This algorithmic query uses unary minus to indicate a term that should not be on the document but it isn't very friendly to non-engineer searchers.
Finally you can build a punning database, which is often done with misspellings like "britney spears" (ok so I'm dating my tenure with that :-)) which takes construction terms like "without", "with", "except", "exactly" and creates an algorithmic query that is most like the original by simple substitution. This would map "<term> without <term>" => "<term> -<term>". The risk there is that "doctors without borders" might not return the organization on the first page (compare results from "doctors without borders" and "doctors -borders", ouch!)
When people get sucked into search it is this kind of problem that they spend a lot of time and debate on :-)
As humans we know immediately that the search is for documents about shirts where stripes are not present. But the term 'without' doesn't make it through to the term compositor step which is feeding terms in a binary relationship. We might make such a relationship as
Q = "shirt" AND NOT "stripes"
You could onebox it (the Google term for a search short circuit path that recognizes the query pattern and some some specific action, for example calculations are a onebox) and then you get a box of shirts with no stripes and an bunch of query results with.
You can n-gram it, by ranking the without-stripes n-gram higher than the individual terms, but that doesn't help all that much because the English language documents don't call them "shirts without stripes", generally they are referred to as "plain shirts" or "solid shirts" (plain-shirt(s) and solid-shirt(s) respectively). But you might do okay punning without-stripes => plain or to solid.
From a query perspective you get better accuracy with the query "shirts -stripes". This algorithmic query uses unary minus to indicate a term that should not be on the document but it isn't very friendly to non-engineer searchers.
Finally you can build a punning database, which is often done with misspellings like "britney spears" (ok so I'm dating my tenure with that :-)) which takes construction terms like "without", "with", "except", "exactly" and creates an algorithmic query that is most like the original by simple substitution. This would map "<term> without <term>" => "<term> -<term>". The risk there is that "doctors without borders" might not return the organization on the first page (compare results from "doctors without borders" and "doctors -borders", ouch!)
When people get sucked into search it is this kind of problem that they spend a lot of time and debate on :-)