SlideShare une entreprise Scribd logo
1  sur  26
Télécharger pour lire hors ligne
Confidential - Do Not Share or Distribute
Apache Arrow Flight SQL
A universal standard for high performance
data transfers from databases
Alex Merced
Developer Advocate @ Dremio
Confidential - Do Not Share or Distribute
Agenda
● What if we could have for our data transfers…
● What options do we have?
● What is Apache Arrow? Is it enough?
● What is Arrow Flight? Is it enough?
● What is Arrow Flight SQL? Is it enough?
● What are the Arrow Flight SQL ODBC & JDBC
Drivers?
Confidential - Do Not Share or Distribute
What if we could have for our data transfers…
✓ Standardized APIs
✓ No need for many different drivers
✓ High performance transfers
✓ Parallel data transfers (1:many, many:many)
✓ Easy to implement on server side, easy to use on client side
Confidential - Do Not Share or Distribute
● Built in 1992 & 1997, respectively
○ Inherently row-based
○ Prevented the many:many of
client-to-database-APIs
○ But resulted in one:many of
API-to-database-protocols
Why don’t ODBC & JDBC fit in today’s analytics world?
Today
Yesterday
● Today: heterogeneous big data workloads
● Most tools involved with OLAP workloads
today are columnar
○ But there is no columnar transfer
standard
Confidential - Do Not Share or Distribute
Increasingly, columnar → columnar is the standard
Yesterday
Today
Confidential - Do Not Share or Distribute
● Custom protocols
○ One per server
○ Implemented individually by each client
What options do we have to keep things columnar?
Confidential - Do Not Share or Distribute
Pick your poison
Enter: Apache Arrow Flight
JDBC/ODBC Custom protocol
Standardize client interface ✅ ❌
Standardize client database access
interface
✅ ❌
Standardize server interface ❌ ❌
Prevent unnecessary serialization ❌ ✅
Confidential - Do Not Share or Distribute
What if?
Enter: Arrow Flight
Confidential - Do Not Share or Distribute
Before we go there:
Apache Arrow
Confidential - Do Not Share or Distribute
Introduction to Apache Arrow
● An in-memory columnar format
○ Includes libraries for working with the format
■ E.g., computation engine, IPC, serialization /
deserialization from file formats.
○ Support for many languages (12 currently)
● Co-created by Dremio and Wes McKinney
● Rapid adoption, popularity, and capability growth
Confidential - Do Not Share or Distribute
● Arrow is primarily geared towards operations on data in a single-node
○ No specification for cross-network transfers
■ Within a cluster
■ Client-server
11
Isn’t Arrow enough?
Enter: Arrow Flight
Confidential - Do Not Share or Distribute
12
What is Arrow Flight?
● A general-purpose client-server framework to simplify high performance transport of
large datasets over network interfaces
● Designed for data in the modern world
○ Column-oriented, rather than row-oriented
○ Arrow’s columnar format is designed for high compressibility and large numbers of
rows
● Makes including the client-side in distributed computing easier
● Enables parallel data transfers
Confidential - Do Not Share or Distribute
13
Parallel data transfers with Arrow Flight
Client
Node 2
Node N
Node 1
CPU
memory
CPU
memory
CPU
memory
CPU
memory
DoGet(<ticket>)
DoGet(<ticket>)
DoGet(<ticket>)
Confidential - Do Not Share or Distribute
Isn’t Arrow Flight Enough?
JDBC/ODBC Custom protocols Arrow Flight
Standardize client interface ✅ ❌ ✅
Standardize client database access
interface
✅ ❌ ❌
Standardize server interface ❌ ❌ ✅
Prevent unnecessary serialization ❌ ✅ ✅
Confidential - Do Not Share or Distribute
Isn’t Arrow Flight Enough?
● Client sends an arbitrary byte stream, server sends a result
○ The byte stream only has meaning for a particular server that’s chosen to
support it
○ Example – Dremio interprets the byte stream to be a UTF-8 encoded SQL
query string
● Flight is meant to serve any tabular data, not databases in particular
○ Catalog information is not part of Arrow Flight’s design
● Haven’t we seen these problems before?
○ ODBC and JDBC standardize database activities (e.g., query execution,
catalog access)
Enter: Arrow Flight SQL
Confidential - Do Not Share or Distribute
16
● Client-server protocol for interacting with SQL databases built on Arrow Flight
○ Allows databases to use Arrow Flight as the transport protocol
○ Leverage the performance of Arrow and Flight for database access
● Set of commands to standardize a SQL interface on Flight:
○ Query execution
○ Prepared statements
○ Database catalog metadata (tables, columns, data types).
○ SQL syntax capabilities
● Language and database-agnostic
○ A single set of client libraries to connect to any Flight SQL server
○ Clients can talk to any database implementing the protocol
○ No need for a client-side driver per database
What is Arrow Flight SQL?
Confidential - Do Not Share or Distribute
Client Database
2 - FlightInfo<Schema, Endpoints>
1 - GetFlightInfo(GetTables)
4 - Arrow record batches
3 - DoGet(<ticket>)
6 - FlightInfo<Schema, Endpoints>
5 - GetFlightInfo(CommandStatementQuery)
7 - DoGet(<ticket>)
Memory
Listing Tables:
Querying a
table:
Memory
8 - Arrow record batches
Confidential - Do Not Share or Distribute
Queries:
● CommandStatementQuery
● CommandStatementUpdate
Prepared statements:
● ActionClosePreparedStatementRequest
● ActionCreatePreparedStatementRequest
● CommandPreparedStatementQuery
● CommandPreparedStatementUpdate
Metadata requests:
● CommandGetCatalogs
● CommandGetCrossReference
● CommandGetDbSchemas
● CommandGetExportedKeys
● CommandGetImportedKeys
● CommandGetPrimaryKeys
● CommandGetSqlInfo
● CommandGetTables
● CommandGetTableTypes
Arrow Flight SQL Commands
Confidential - Do Not Share or Distribute
Isn’t Arrow Flight SQL Enough?
JDBC/ODBC Custom protocols Arrow Flight Arrow Flight SQL
Standardize client interface ✅ ❌ ✅ ✅
Standardize client database
access interface
✅ ❌ ❌ ✅
Standardize server interface ❌ ❌ ✅ ✅
Prevent unnecessary serialization ❌ ✅ ✅ ✅
Confidential - Do Not Share or Distribute
● Arrow Flight SQL is fairly new, will take time to be widely adopted by existing
client tools (especially the enterprise BI ones)
○ Most BI tools already support ODBC or JDBC
Isn’t Arrow Flight SQL Enough?
Enter: Arrow Flight SQL ODBC & JDBC Drivers
Confidential - Do Not Share or Distribute
21
Arrow Flight SQL ODBC & JDBC Drivers
● ODBC & JDBC Drivers built on top of Flight SQL client libraries
○ A single driver to connect to any Flight SQL server
○ Supports arbitrary server-side options as connection properties
● The drivers will work with any ODBC/JDBC client tool without any code changes
● Completely open source
Confidential - Do Not Share or Distribute
22
Traditional ODBC/JDBC vs Arrow Flight SQL ODBC/JDBC
Traditional ODBC/JDBC Arrow Flight SQL ODBC/JDBC
Driver Management
Performance
install and manage a driver
for each database
row-based transfer, so not
great for large amounts of
data
column-based transfer, so
benefit from compression of
data over the wire
install and manage
a single driver for all
databases
Confidential - Do Not Share or Distribute
When wouldn’t I use Arrow Flight ODBC/JDBC Drivers?
● Generally preferable to have native Flight SQL applications
○ Easier to harness future features like multiple endpoints
○ Better performance if your client is column-based
Client is row-based
Client is column-based
Network
gRPC
Arrow
Arrow Flight
Arrow Flight SQL
JDBC
Legacy
Client
SQL Native
Client
Non-SQL
Client
Confidential - Do Not Share or Distribute
What if we could have for our data transfers…
✓ Standardized APIs
○ Arrow Flight for arbitrary tabular transfers, Arrow Flight SQL for transfers from databases
✓ No need for many different drivers
○ Both client and server side APIs are standardized
✓ High performance transfers
○ Transfers via highly optimized Arrow format
✓ Parallel data transfers (1:many, many:many)
○ Lays the foundation for parallel data transfers
✓ Easy to implement on server side, easy to use on client side
○ Server-side implementation is as easy as implementing a few RPC request handlers
○ A single set of client libraries to connect to any Flight SQL server
Confidential - Do Not Share or Distribute
Q&A
Confidential - Do Not Share or Distribute
Thanks!
jason@dremio.com

Contenu connexe

Similaire à Data Engineer's Lunch #77: Apache Arrow Flight SQL: A Universal Standard for High-Performance Data Transfers from Data

Similaire à Data Engineer's Lunch #77: Apache Arrow Flight SQL: A Universal Standard for High-Performance Data Transfers from Data (20)

Presto: Query Anything - Data Engineer’s perspective
Presto: Query Anything - Data Engineer’s perspectivePresto: Query Anything - Data Engineer’s perspective
Presto: Query Anything - Data Engineer’s perspective
 
Introduction to ClustrixDB
Introduction to ClustrixDBIntroduction to ClustrixDB
Introduction to ClustrixDB
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
 
Serverless Data Platform
Serverless Data PlatformServerless Data Platform
Serverless Data Platform
 
introduction to micro services
introduction to micro servicesintroduction to micro services
introduction to micro services
 
Vue d'ensemble Dremio
Vue d'ensemble DremioVue d'ensemble Dremio
Vue d'ensemble Dremio
 
Bootify Yyour App from Zero to Hero
Bootify Yyour App from Zero to HeroBootify Yyour App from Zero to Hero
Bootify Yyour App from Zero to Hero
 
Introduction to Datasource V2 API
Introduction to Datasource V2 APIIntroduction to Datasource V2 API
Introduction to Datasource V2 API
 
Introdcution to Azure
Introdcution to AzureIntrodcution to Azure
Introdcution to Azure
 
Defrag 2014 - Blend Web IDEs, Open Source and PaaS to Create and Deploy APIs
Defrag 2014 - Blend Web IDEs, Open Source and PaaS to Create and Deploy APIsDefrag 2014 - Blend Web IDEs, Open Source and PaaS to Create and Deploy APIs
Defrag 2014 - Blend Web IDEs, Open Source and PaaS to Create and Deploy APIs
 
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache CassandraApache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
 
Normalizing x pages web development
Normalizing x pages web development Normalizing x pages web development
Normalizing x pages web development
 
Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flink
 
Red Hat Storage Roadmap
Red Hat Storage RoadmapRed Hat Storage Roadmap
Red Hat Storage Roadmap
 
Red Hat Storage Roadmap
Red Hat Storage RoadmapRed Hat Storage Roadmap
Red Hat Storage Roadmap
 
MongoDB 4.0 새로운 기능 소개
MongoDB 4.0 새로운 기능 소개MongoDB 4.0 새로운 기능 소개
MongoDB 4.0 새로운 기능 소개
 
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
 
Ghost Environment
Ghost EnvironmentGhost Environment
Ghost Environment
 
Provisioning infrastructure to AWS using Terraform – Exove
Provisioning infrastructure to AWS using Terraform – ExoveProvisioning infrastructure to AWS using Terraform – Exove
Provisioning infrastructure to AWS using Terraform – Exove
 

Plus de Anant Corporation

NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
Anant Corporation
 
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPTAutomate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Anant Corporation
 
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
Anant Corporation
 

Plus de Anant Corporation (20)

QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137
QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137
QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137
 
Kono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdf
Kono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdfKono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdf
Kono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdf
 
Data Engineer's Lunch 96: Intro to Real Time Analytics Using Apache Pinot
Data Engineer's Lunch 96: Intro to Real Time Analytics Using Apache PinotData Engineer's Lunch 96: Intro to Real Time Analytics Using Apache Pinot
Data Engineer's Lunch 96: Intro to Real Time Analytics Using Apache Pinot
 
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
 
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPTAutomate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
 
YugabyteDB Developer Tools
YugabyteDB Developer ToolsYugabyteDB Developer Tools
YugabyteDB Developer Tools
 
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer RoadmapEpisode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
 
Machine Learning Orchestration with Airflow
Machine Learning Orchestration with AirflowMachine Learning Orchestration with Airflow
Machine Learning Orchestration with Airflow
 
Cassandra Lunch 130: Recap of Cassandra Forward Talks
Cassandra Lunch 130: Recap of Cassandra Forward TalksCassandra Lunch 130: Recap of Cassandra Forward Talks
Cassandra Lunch 130: Recap of Cassandra Forward Talks
 
Data Engineer's Lunch 90: Migrating SQL Data with Arcion
Data Engineer's Lunch 90: Migrating SQL Data with ArcionData Engineer's Lunch 90: Migrating SQL Data with Arcion
Data Engineer's Lunch 90: Migrating SQL Data with Arcion
 
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
 
Cassandra Lunch 129: What’s New: Apache Cassandra 4.1+ Features & Future
Cassandra Lunch 129: What’s New:  Apache Cassandra 4.1+ Features & FutureCassandra Lunch 129: What’s New:  Apache Cassandra 4.1+ Features & Future
Cassandra Lunch 129: What’s New: Apache Cassandra 4.1+ Features & Future
 
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
 
Data Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackData Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data Stack
 
CL 121
CL 121CL 121
CL 121
 
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache IcebergData Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
 
Apache Cassandra Lunch 120: Apache Cassandra Monitoring Made Easy with AxonOps
Apache Cassandra Lunch 120: Apache Cassandra Monitoring Made Easy with AxonOpsApache Cassandra Lunch 120: Apache Cassandra Monitoring Made Easy with AxonOps
Apache Cassandra Lunch 120: Apache Cassandra Monitoring Made Easy with AxonOps
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
 
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessData Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
 
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data PlatformsData Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
 

Dernier

Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Dernier (20)

BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 

Data Engineer's Lunch #77: Apache Arrow Flight SQL: A Universal Standard for High-Performance Data Transfers from Data

  • 1. Confidential - Do Not Share or Distribute Apache Arrow Flight SQL A universal standard for high performance data transfers from databases Alex Merced Developer Advocate @ Dremio
  • 2. Confidential - Do Not Share or Distribute Agenda ● What if we could have for our data transfers… ● What options do we have? ● What is Apache Arrow? Is it enough? ● What is Arrow Flight? Is it enough? ● What is Arrow Flight SQL? Is it enough? ● What are the Arrow Flight SQL ODBC & JDBC Drivers?
  • 3. Confidential - Do Not Share or Distribute What if we could have for our data transfers… ✓ Standardized APIs ✓ No need for many different drivers ✓ High performance transfers ✓ Parallel data transfers (1:many, many:many) ✓ Easy to implement on server side, easy to use on client side
  • 4. Confidential - Do Not Share or Distribute ● Built in 1992 & 1997, respectively ○ Inherently row-based ○ Prevented the many:many of client-to-database-APIs ○ But resulted in one:many of API-to-database-protocols Why don’t ODBC & JDBC fit in today’s analytics world? Today Yesterday ● Today: heterogeneous big data workloads ● Most tools involved with OLAP workloads today are columnar ○ But there is no columnar transfer standard
  • 5. Confidential - Do Not Share or Distribute Increasingly, columnar → columnar is the standard Yesterday Today
  • 6. Confidential - Do Not Share or Distribute ● Custom protocols ○ One per server ○ Implemented individually by each client What options do we have to keep things columnar?
  • 7. Confidential - Do Not Share or Distribute Pick your poison Enter: Apache Arrow Flight JDBC/ODBC Custom protocol Standardize client interface ✅ ❌ Standardize client database access interface ✅ ❌ Standardize server interface ❌ ❌ Prevent unnecessary serialization ❌ ✅
  • 8. Confidential - Do Not Share or Distribute What if? Enter: Arrow Flight
  • 9. Confidential - Do Not Share or Distribute Before we go there: Apache Arrow
  • 10. Confidential - Do Not Share or Distribute Introduction to Apache Arrow ● An in-memory columnar format ○ Includes libraries for working with the format ■ E.g., computation engine, IPC, serialization / deserialization from file formats. ○ Support for many languages (12 currently) ● Co-created by Dremio and Wes McKinney ● Rapid adoption, popularity, and capability growth
  • 11. Confidential - Do Not Share or Distribute ● Arrow is primarily geared towards operations on data in a single-node ○ No specification for cross-network transfers ■ Within a cluster ■ Client-server 11 Isn’t Arrow enough? Enter: Arrow Flight
  • 12. Confidential - Do Not Share or Distribute 12 What is Arrow Flight? ● A general-purpose client-server framework to simplify high performance transport of large datasets over network interfaces ● Designed for data in the modern world ○ Column-oriented, rather than row-oriented ○ Arrow’s columnar format is designed for high compressibility and large numbers of rows ● Makes including the client-side in distributed computing easier ● Enables parallel data transfers
  • 13. Confidential - Do Not Share or Distribute 13 Parallel data transfers with Arrow Flight Client Node 2 Node N Node 1 CPU memory CPU memory CPU memory CPU memory DoGet(<ticket>) DoGet(<ticket>) DoGet(<ticket>)
  • 14. Confidential - Do Not Share or Distribute Isn’t Arrow Flight Enough? JDBC/ODBC Custom protocols Arrow Flight Standardize client interface ✅ ❌ ✅ Standardize client database access interface ✅ ❌ ❌ Standardize server interface ❌ ❌ ✅ Prevent unnecessary serialization ❌ ✅ ✅
  • 15. Confidential - Do Not Share or Distribute Isn’t Arrow Flight Enough? ● Client sends an arbitrary byte stream, server sends a result ○ The byte stream only has meaning for a particular server that’s chosen to support it ○ Example – Dremio interprets the byte stream to be a UTF-8 encoded SQL query string ● Flight is meant to serve any tabular data, not databases in particular ○ Catalog information is not part of Arrow Flight’s design ● Haven’t we seen these problems before? ○ ODBC and JDBC standardize database activities (e.g., query execution, catalog access) Enter: Arrow Flight SQL
  • 16. Confidential - Do Not Share or Distribute 16 ● Client-server protocol for interacting with SQL databases built on Arrow Flight ○ Allows databases to use Arrow Flight as the transport protocol ○ Leverage the performance of Arrow and Flight for database access ● Set of commands to standardize a SQL interface on Flight: ○ Query execution ○ Prepared statements ○ Database catalog metadata (tables, columns, data types). ○ SQL syntax capabilities ● Language and database-agnostic ○ A single set of client libraries to connect to any Flight SQL server ○ Clients can talk to any database implementing the protocol ○ No need for a client-side driver per database What is Arrow Flight SQL?
  • 17. Confidential - Do Not Share or Distribute Client Database 2 - FlightInfo<Schema, Endpoints> 1 - GetFlightInfo(GetTables) 4 - Arrow record batches 3 - DoGet(<ticket>) 6 - FlightInfo<Schema, Endpoints> 5 - GetFlightInfo(CommandStatementQuery) 7 - DoGet(<ticket>) Memory Listing Tables: Querying a table: Memory 8 - Arrow record batches
  • 18. Confidential - Do Not Share or Distribute Queries: ● CommandStatementQuery ● CommandStatementUpdate Prepared statements: ● ActionClosePreparedStatementRequest ● ActionCreatePreparedStatementRequest ● CommandPreparedStatementQuery ● CommandPreparedStatementUpdate Metadata requests: ● CommandGetCatalogs ● CommandGetCrossReference ● CommandGetDbSchemas ● CommandGetExportedKeys ● CommandGetImportedKeys ● CommandGetPrimaryKeys ● CommandGetSqlInfo ● CommandGetTables ● CommandGetTableTypes Arrow Flight SQL Commands
  • 19. Confidential - Do Not Share or Distribute Isn’t Arrow Flight SQL Enough? JDBC/ODBC Custom protocols Arrow Flight Arrow Flight SQL Standardize client interface ✅ ❌ ✅ ✅ Standardize client database access interface ✅ ❌ ❌ ✅ Standardize server interface ❌ ❌ ✅ ✅ Prevent unnecessary serialization ❌ ✅ ✅ ✅
  • 20. Confidential - Do Not Share or Distribute ● Arrow Flight SQL is fairly new, will take time to be widely adopted by existing client tools (especially the enterprise BI ones) ○ Most BI tools already support ODBC or JDBC Isn’t Arrow Flight SQL Enough? Enter: Arrow Flight SQL ODBC & JDBC Drivers
  • 21. Confidential - Do Not Share or Distribute 21 Arrow Flight SQL ODBC & JDBC Drivers ● ODBC & JDBC Drivers built on top of Flight SQL client libraries ○ A single driver to connect to any Flight SQL server ○ Supports arbitrary server-side options as connection properties ● The drivers will work with any ODBC/JDBC client tool without any code changes ● Completely open source
  • 22. Confidential - Do Not Share or Distribute 22 Traditional ODBC/JDBC vs Arrow Flight SQL ODBC/JDBC Traditional ODBC/JDBC Arrow Flight SQL ODBC/JDBC Driver Management Performance install and manage a driver for each database row-based transfer, so not great for large amounts of data column-based transfer, so benefit from compression of data over the wire install and manage a single driver for all databases
  • 23. Confidential - Do Not Share or Distribute When wouldn’t I use Arrow Flight ODBC/JDBC Drivers? ● Generally preferable to have native Flight SQL applications ○ Easier to harness future features like multiple endpoints ○ Better performance if your client is column-based Client is row-based Client is column-based Network gRPC Arrow Arrow Flight Arrow Flight SQL JDBC Legacy Client SQL Native Client Non-SQL Client
  • 24. Confidential - Do Not Share or Distribute What if we could have for our data transfers… ✓ Standardized APIs ○ Arrow Flight for arbitrary tabular transfers, Arrow Flight SQL for transfers from databases ✓ No need for many different drivers ○ Both client and server side APIs are standardized ✓ High performance transfers ○ Transfers via highly optimized Arrow format ✓ Parallel data transfers (1:many, many:many) ○ Lays the foundation for parallel data transfers ✓ Easy to implement on server side, easy to use on client side ○ Server-side implementation is as easy as implementing a few RPC request handlers ○ A single set of client libraries to connect to any Flight SQL server
  • 25. Confidential - Do Not Share or Distribute Q&A
  • 26. Confidential - Do Not Share or Distribute Thanks! jason@dremio.com