ApacheCon Europe 2012

Rhein-Neckar-Arena, Sinsheim, Germany

5–8 November 2012

Compound Terms Query Parser for Great Shopping Experience

Mikhail Khludnev

Audience level:
Advanced
Track:
Lucene, Solr & Friends

Wednesday 9 a.m.–9:45 a.m. in Level 1 Right

Description

Describing important query processing techniques actual for eCommerce sites.

Abstract

eShop visitors are intended to input compound terms like brands, or product families without phrase demarcation. Common practice is use DisMaxQParser and tokenize during indexing, but this leads to false positive matches i.e. on “calvin klein jeans dress” you should not show other brands jeans or dresses even by Anne Klein. I’d like to present query parsing technique and/or special type of query which provides outstanding precision. At Lucene Eurocon’11 Stump the Chump session it occurs that it’s demanded by eCommerce for many years but community has lack of vision of this solution. In this session I want also present, the following two pearls:

Entity Recognition from Search Phrases

Basing on the technique above we can recognize what particular entities which shopper has in her mind eg. from “calvin klein jeans dress” we can conclude that dress is demanded - not jeans, and therefore we can propose some related offers, or provide proper ranking. Technically it’s somehow related to LUCENE-1999.

Staged Request Handling

Usually when you tuning relevance by query parsing you almost always have to trade precision for recall and vice versa. We’ve found a trivial way how to get both of them excellent fast.