Apache Arrow DataFusion: Vectorized execution framework for maximum performance

εˆ˜ζ˜†

Chinese Session 2023-08-18 14:30 GMT+8  #olap

Apache Arrow DataFusion is a fast, extensible, vectorized execution framework that uses Arrow as its in-memory data format, implemented in the Rust language. DataFusion provides multiple levels of extension interfaces: users can easily integrate DataFusion into their database or query system implementations, taking advantage of its extreme performance and avoiding the problem of repeating the query engine implementation.

This presentation mainly includes:

  1. What is DataFusion and its history
  2. DataFusion’s architecture
  3. What extension capabilities does DataFusion provide (udf, logical plan, execution plan/node, etc.)
  4. What scenarios does DataFusion use
  5. DataFusion what use cases are currently available

Speakers:


Liu Kun: eBay, Big data engineer, Graduated from School of Software, Tsinghua University; Currently working in eBay Big Data development team, big data engineer; Apache Arrow PMC, Apache IoTDB PMC, mainly works in the fields of database, storage engine, and query engine.