ApacheCon Europe 2012

Rhein-Neckar-Arena, Sinsheim, Germany

5–8 November 2012

Cassandra and Hadoop: Combining Realtime and Analytics for Big Data

Sam Tunnicliffe

Audience level:
Intermediate
Track:
Big Data

Description

Apache Cassandra is widely regarded as the most performant and scalable of the NoSql datastores. Its fast reads and even faster writes have long made it a great fit for real time use cases where low latency and high throughput are key requirements. Less frequently discussed is Cassandra's excellent integration with the Hadoop ecosystem which enables support for a range of batch analytics workflows

Abstract

In this talk, we'll dive into those integration points and check out some of the key benefits of integrating these two powerful systems.

Introduction

  • Cassandra architecture & data model
  • Cluster configuration

Integration Points

  • MapReduce
  • Pig
  • Hive
  • Oozie
  • Mixing analytics and realtime workloads