ApacheCon Europe 2012

Rhein-Neckar-Arena, Sinsheim, Germany

5–8 November 2012

Taking the guesswork out of your Hadoop Infrastructure

Steve Watt

Audience level:
Intermediate
Track:
Big Data

Tuesday 9:15 a.m.–10 a.m. in Level 2 Left

Description

Applicable commodity infrastructures for Apache Hadoop have advanced greatly in the last number of years. In this talk we'll discuss the lessons learned and outcomes from the work HP has done to optimally design and configure infrastructure for both MapReduce and HBase.

Abstract

Apache Hadoop is clearly one of the fastest growing big data platforms to store and analyze arbitrarily structured data in search of business insights. However, applicable commodity infrastructures have advanced greatly in the last number of years and there is not a lot of information to assist the community in optimally designing and configuring Hadoop Infrastructure based on specific requirements. For example, how many disks and controllers should you use? Should you buy processors with 4 or 6 cores? Do you need a 1GbE or 10GbE Network? Should you use SATA or MDL SAS? Small or Large Form Factor Disks? How much memory do you need ? How do you characterize your Hadoop workloads to figure out whether your are I/O, CPU, Network or Memory bound? In this talk we'll discuss the lessons learned and outcomes from the work HP has done to optimally design and configure infrastructure for both MapReduce and HBase.