Federated Cross-platform SQL with Apache Wayang

Kaustubh Beedkar

English Session 2023-08-20 14:00 GMT+8  #olap

Federated query processing enables distributed query processing across multiple data sources, eliminating silos and improving data accessibility. It allows organizations to seamlessly query and analyze diverse databases or systems as a unified virtual database. By leveraging federated query processing, businesses gain deeper insights from distributed data sources, while data remains in its original location. This approach simplifies data integration, enhances governance, and empowers informed decision-making.

In this talk, I will present how we can achieve federated cross-platform query processing with Apache Wayang. Apache Wayang (incubating) is a scalable cross-platform system that decouples applications with data processing platforms and hence it frees developers from developing applications for specific platforms. It provides an abstraction layer on top of existing data processing platforms, such as Apache Spark and Apache Flink, with the aim of enabling cross-platform optimization and interoperability. It automatically selects the best data processing platforms for a given task and also handles cross-platform execution. Apache Wayang comes with a cross-platform optimizer at its core to achieve this.

To enable federated SQL analytics, we have built a library on top of Wayang that provides a unified SQL interface for cross-platform SQL processing. The SQL library allows users to embed SQL queries in their cross-platform applications. I will talk about how we utilize Apache Calcite to support cross-platform SQL. The major benefit of Calcite integration in Wayang is that of platform independence and opportunistic cross-platform data processing. Apache Wayang with Calcite integration leads to a powerful system capable of federated data processing in a platform-agnostic way.

Speakers:


Kaustubh Beedkar: Indian Institute of Technology Delhi, Assistant Professor, –Experience– [April 2023 – present] Assistant Professor, Indian Institute of Technology, Delhi [May 2023 – present] Committer and PPMC Apache Wayang, The Apache Software Foundation [2022 – present] Co-Founder, Databloom AI [June 2021 – March 2023] Junior Fellow, The Berlin Institute for the Foundations of Learning and Data (BIFOLD) [June 2017 – March 2023] Senior Researcher, Technical University of Berlin, Germany [Oct 2014 – Dec 2016] Researcher, University of Mannheim, Germany [April 2012 – Sept 2014] Researcher, Max-Planck-Institute for Informatics, Germany [Jul 2011 – April 2012] Visiting Scholar, Max-Planck-Institute for Informatics, Germany

–Education– [2017] Ph.D. in Computer Science, University of Mannheim [2008] MS in Computer Science, Georgia Institute of Technology, USA [2007] B.Tech. in Information Technology, Amrita University, India