ApacheCon Europe 2012

Rhein-Neckar-Arena, Sinsheim, Germany

5–8 November 2012

From Incubation to Continuous Ingestion - The Story of Apache Gora

Renato Marroquin , Lewis John McGibbney

Audience level:
Intermediate
Track:
Big Data

Wednesday 11 a.m.–noon in Level 2 Right

Description

Since early 2012 Gora has been proudly participating as an honourary Incubator post-grad within the ASF. This presentation combines the events of the last year in the form of a case study based upon Gora's Continuous Ingestion integration tesing platform. At stake? Accumulo, Cassandra, HBase MySQL, HSQLDB and Amazon's DynamoDB fight it out to earn thier place on the Continuous Ingestion podium.

Abstract

The Apache Gora open source framework provides an in-memory data model and persistence for big data. Gora supports persisting to column stores, key value stores, document stores and RDBMSs, and analyzing the data with extensive Apache Hadoop MapReduce support. In the body of this presentation we aim to take the audience firstly through an introduction to the Gora framework focusing on where Gora has come from and where the project currently is as of speaking. Further to the successful acceptance into GSoC 2012 a development drive focused on extending the Gora API to accomodate web services in general and Amazon's DynamoDB as an initial use case. Proceeding will be a live demonstration using Gora with DynamoDB to persist, query and delete data in the cloud before finally progressing to focus on the body of the presentation; a case study based on Gora's Continuous Ingestion integration testing suite. The test suite itself verifies that data is not lost at scale, running many ingest clients that continually create linked lists containing 25 million nodes. At some point the clients are stopped and a map reduce job is run to ensure no linked list has a hole. A hole indicates data was lost. The case study itself builds on this test suite testing Accumulo, Cassandra, HBase MySQL, HSQLDB and Amazon's DynamoDB against a set of metrics to determine how each performs, in addition we have added some technical challenges which each datastore must complete, these provide a nice twist to the outcomes. This presentation offers users not only a fully substantiated account of the Gora framework and API but also introduces real life examples of how users can easily get up to speed using Gora to solve problems with thier web services, NoSQL and big data requirements.