ApacheCon Europe 2012

Rhein-Neckar-Arena, Sinsheim, Germany

5–8 November 2012

Fundamentals of Information Retrieval, Illustration with Apache Lucene

Majirus FANSI

Audience level:
Intermediate
Track:
Lucene, Solr & Friends

Tuesday 9:15 a.m.–10 a.m. in Press Room

Description

Understanding the fundamentals of Information retrieval. Getting an overview of what Lucene is, where and how it can be used.We will cover the basic Lucene concepts (index, directory, document, field, term), text analysis (tokenizing, token filtering, sotp words), indexing (how to create an index, how to index documents), and seaching (how to run keyword, phrase, Boolean and other queries).

Abstract

Information Retrieval is becoming the principal mean of access to Information. It is now common for web applications to provide interface for free text search. In this talk we start by describing the scientific underpinning of information retrieval. We review the main models on which are based the main search tools, i.e. the Boolean model and the Vector Space Model. We illustrate our talk with a web application based on Lucene. We show that Lucene combines both the Boolean and vector space models.

The presentation will give an overview of what Lucene is, where and how it can be used. We will cover the basic Lucene concepts (index, directory, document, field, term), text analysis (tokenizing, token filtering, sotp words), indexing (how to create an index, how to index documents), and seaching (how to run keyword, phrase, Boolean and other queries). We’ll inspect Lucene indices with Luke.

After this talk, the attendee will get the fundamentals of IR as well as how to apply them to build a search application with Lucene.