2018 – PhoenixCon Presentations

Presenter(s)

Session Title


Welcome

Tulasi Paradarami, Marcell Ortutay (23andMe Inc.)

Applications of HBase/Phoenix at 23andMe (Video)

Vincent Poon (Salesforce)

Phoenix Global Mutable Secondary Indexes – Past, Present, and Future (Video)

Sergey Soldatov (Hortonworks)

PhoenixStorageHandler for Hive: Basics, Use Cases, Known Limitations, Tips & Tricks (Video)

Anil Gupta (TrueCar, Inc)

Tuning Apache Phoenix (Slides | Video)

Anirudha Jadhav, Gabriel Jimenez, Rohit Jhunjhunwala (Bloomberg)

Introducing Big-SQL: Apache Ignite + Apache Phoenix on Spring Boot (Video)

Josh Elser (Hortonworks)

Using Python with Phoenix (Video)

Ohad Shacham, Edward Bortnikov (Yahoo Research/Oath)

Omid: Scalable and Highly Available Transaction Processing for Apache Phoenix (Video)

Sahil Ramrakhyani, Seshank Kalvala (Salesforce)

Usage of Phoenix Statistics to Scale Big Object Queries on the Salesforce Platform (Video)

Thomas D’Silva (Salesforce), Rahul Gidwani (Yahoo!/Flurry)

Solving Metadata Management Bottlenecks Now and in the Future (Video)

Was co-hosted with Apache HBaseCon West 2018


2017 – PhoenixCon Presentations

Presenter(s)

Session Title

Abstract

James Taylor

Welcome


Gary Horen, Matthew Van Wely (Salesforce)

Phoenix Use Cases in the Salseforce Chatter Product

Phoenix is in wide use at Salesforce, and a great help when we build features on HBase. The team that builds the Chatter product has been an early adopter. This talk will look at some of the different ways we use hbase and how Phoenix helped us succeed. We’ll show queries and the resulting plans, and share some perf numbers too.

Thomas D'Silva, Samarth Jain (Salesforce)

Column Mapping and Immutable Data Encoding (Slides)

Storage formats are a new feature introduced in Phoenix 4.10 that enables reducing the disk size of tables which also improves performance.

The column mapping feature allows us to use numbers instead of column names as HBase column qualifiers. This improves performance by allowing us to derive the ordinal position from the column qualifier of a cell to perform a lookup in the sorted list of cells returned by HBase. We can also do fast DDL operations like column rename and metadata level column drops on tables. As these number based qualifiers are generally smaller (1 to 4 bytes) than column names, the disk size of tables is smaller which improves performance for most queries.

The immutable data encoding feature packs all column values of a column family into a single HBase cell. Column values are serialized into a byte array with the offsets stored at the end. Enabling this feature on a table improves upsert times and decrease count queries and queries that filter or group by on non-pk columns.

We recommend enabling the column mapping feature unless you expect the number of columns in a table (and also its views) to exceed 2147483647. Also the Immutable Data Encoding feature is recommended when the data is sufficiently dense (about 50% of columns have values), with growing sparseness the overhead of storing offset information negatively affects performance. With the default HBase block size of 64K, performance starts to degrade once the size of the packed cell exceeds 50 KB. We do not recommend using this feature for immutable tables that have views because of the way column qualifiers are assigned for columns in views which makes data sparse (especially when columns are added to views).">Column Mapping and Immutable Data Encoding

Rajeshbabu Chintaguntla (Hortonworks)

Local secondary indexes in Apache Phoenix

Apache Phoenix supports two kinds of secondary indexes, global and local to provide orthogonal way to access data from its primary access path. A Local secondary index is 'local' in the sense that region wise data is indexed and the index data is co-reside in the same region but different column family/families. Local indexes are very useful for the write heavy, space constrained use cases and helpful for the systems require many alternative access paths than primary key and 100% consistent view of the base table. In this talk we discuss more about architecture, data model, how 100% atomic and how 'local' nature helps to keep the same write performance even with multiple indexes present in the system.

Anirudha Jadhav, Gabriel Jimenez (Bloomberg)

Phoenix @Bloomberg and Hardening Phoenix (Slides)

Phoenix has seen a growing presence at Bloomberg, Supporting applications from diverse business areas with a multitude of access patterns.

Phoenix helps us bridge the gap between Hbase and delivering the SQL simplicity to our Big Data developers. We would share some of our success stories and a few challenges while working with Apache Phoenix.


We have been engaged in working with Cursors, Phoenix/hbase cross-DC replication, Calcite in Phoenix and Harden Phoenix.


Harden Phoenix attempts to re-use the HBase test harnesses and port over the Hbase Big Link List tests to Apache Phoenix and perform the operations over a real cluster on a nightly basis to test features and track benchmarks per component. This process is still a work in progress and can been see on https://github.com/apache/phoenix/pull/245

Josh Elser (Hortonworks), Rahul Shrivastava (Salesforce)

Breaking the mold with the Phoenix Query Server (Slides)

This talk will cover some new features being added to the Phoenix Query Server (PQS) and present a vision of the future with Apache Avatica. We will discuss metrics instrumentation being added to PQS that provides automatic insight into the performance of the system. We'll also present an in-progress load-balancer that provides both high availability and service discovery for PQS. Finally, we'll take a step back to envision an alternate reality where Avatica, the technology powering the Phoenix Query Server and its light-weight client driver, simplifies client applications to all of the databases that comprise an organization. We’ll describe a reality where classpath and dependency issues are a problem of the past with a universal client for users to access all kinds of JDBC-compatible databases.

Anil Gupta, Amey Hegde (Truecar)

Dynamic Columns for SQL on NoSql Data (Slides)

Truecar needs to provide OEM Incentives to consumers. These incentives are time bound and based on zipcode(s) and member networks. Each incentive can be valid for any number of zipcode ranging from 1-40k. We also need to keep the historical data of Incentive for Analytics.

To facilitate filtering on basis of zipcode, we used zipcode as column name. Then we used Dynamic Columns feature of Phoenix to query Incentives while leveraging NoSql nature of HBase. In Incentives table each row has columns ranging from 10-30K. Even with such wide rows we were able to get 3x performance boost from Phoenix/HBase solution as compared to our Elastic Search solution for Incentives.

Along with Dynamic Columns, we also leveraged column families to group zipcodes and bloom filters for performance improvements.


In this presentation, we would like to share our journey of migrating Incentives from Elastic Search to Phoenix/HBase while achieving better scalability and performance metrics.

Ohad Shacha, Edward Bortnikov (Yahoo)

Omid - a Scalable and Highly Available Transaction Processing for Phoenix (Slides)

Transaction processing is a critical service in databases that ensures that all data accesses satisfy the ACID properties. Traditionally, Phoenix has been using the Apache Tephra transaction processing technology. Recently, we introduced into Phoenix the support for Apache Omid - an open source transaction processor for HBase that is used at Yahoo at a large scale. A single Omid instance sustains hundreds of thousands of transactions per second, and provides high availability at zero cost for mainstream processing. Omid, as well as Tephra, are now configurable choices for the Phoenix transaction processing backend, being enabled by the newly introduced Transaction Abstraction Layer (TAL) API. In this talk, we walk through the challenges of the project, focusing on the new use cases introduced by Phoenix, and how we address them in Omid.">Omid - a Scalable and Highly Available Transaction Processing for Phoenix


2016 – PhoenixCon Presentations

Track

Presenter(s)

Session Title

Phoenix Use Cases

Jan Fernando (Salesforce)

Salesforce's Trusted Enterprise Platform and Apache Phoenix (Slides)


Vijay Vangapandu (eHarmony)

Phoenix and eHarmony, a Perfect Match (Slides)


Masayasu Suzuki (Sony)

Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster (Slides)


Bhinav Sura (Salesforce)

Argus: Time series metrics data through Phoenix

Phoenix Internals

Maryann Xue (Intel)

Phoenix on Calcite


Poorna Chandra (Cask)

ACID Transactions in Phoenix (Slides)


Josh Elser (Hortonworks)

Phoenix Query Server (Slides)


Samarth Jain (Salesforce)

Column Encoding (Slides)


Anil Gupta (Truecar)

Don't be a byte code Jedi (Slides)

Combined


Roadmap discussion