ApacheCon Europe 2012

Rhein-Neckar-Arena, Sinsheim, Germany

5–8 November 2012

Big Search with Big Data Principles

Eric Pugh

Audience level:
Beginner
Track:
Lucene, Solr & Friends

Wednesday 1:45 p.m.–2:45 p.m. in Level 1 Right

Description

Got hundreds of millions of documents to search? DataImportHandler blowing up while indexing? Query performance collapsing? Then you've searching at Big Data scale.

Abstract

Got hundreds of millions of documents to search? DataImportHandler blowing up while indexing? Random thread errors thrown by Solr Cellduring document extraction? Query performance collapsing? Then you've searching at Big Data scale. This talk will focus on the underlying principles of Big Data, and how to apply them to Solr. This talk isn't a deep dive into SolrCloud, though we'll talk about it. It also isn't meant to be a talk on traditional scaling of Solr. Instead we'll talk about how to apply principles of big data like "Bring the code to the data, not the data to the code" to Solr. How to answer the question "How many servers will I need?" when your volume of data is exploding. Some examples of models for predicting server and data growth, and how to look back and see how good your models are! You'll leave this session armed with an understanding of why Big Data is the buzzword of the year, and how you can apply some of the principles to your own search environment.