SlideShare une entreprise Scribd logo
1  sur  17
Télécharger pour lire hors ligne
Jayjeet Chakraborty
Towards an Arrow-Native Storage System
SkyhookDM
Mentored by: Carlos Maltzahn, Ivo Jimenez, Je
ff
LeFevre
1
Who am I ?
• Incoming Grad Student at UC Santa Cruz

• CS Graduate from NIT Durgapur, India

• IRIS-HEP Fellow Summer 2020

• Twitter: @heyjc25

• Github: JayjeetAtGithub

• LinkedIn: https://www.linkedin.com/in/jayjeet-chakraborty-077579162/

• E-Mail: jchakra1@ucsc.edu
2
Problem
• CPU is the new bottleneck with high speed network and storage devices.

• Client-side processing of data from highly e
ffi
cient storage formats like
Parquet, ORC exhausts the CPUs.

• Severely hampered scalability.
• O
ffl
oad computation from client to the storage layer.

• Take advantage of the idle CPUs of storage systems for increased processing
rates and faster queries.

• Results in less data movement and network tra
ffi
c.
Our Solution
3
Introduction to Ceph
1.Provides 3 types of storage interface:
File, Object, Block.

2.No central point of failure. Uses
CRUSH maps that contains object -
OSD mapping. A CRUSH map in each
client. Client talks directly to OSD.

3.Highly extensible Object storage layer
via the Ceph Object Classes SDK.

4
• Language-independent columnar memory format for
fl
at and hierarchical data,
organised for e
ffi
cient analytic operations on modern hardware.

• Share data between processes without serialization overhead.
Before
Arrow
After Arrow
5
Components of Arrow
6
Arrow components
used by Skyhook
Design Paradigm
• Extend client and storage layers of
programmable storage systems
with data access libraries.

• Embed a FS shim inside storage
nodes to have
fi
le-like view over
objects.

• Allow direct interaction with objects
in an object store while bypassing
the
fi
lesystem layer utilising FS
metadata.
7
Architecture
• Arrow data access libraries embedded inside Ceph OSDs to allow
fi
le fragment scanning inside the storage
layer. 

• Expose the functionality through the Arrow Dataset API by creating a new
fi
le format abstraction
“RadosParquetFileFormat”.
8
File Layout Design
• Large multi-gigabyte Parquet
fi
les are split into smaller ~128 MB Parquet
fi
les.
• Each Parquet
fi
le is stored in a single RADOS object for SkyhookDM to access.
9
Experiments: Latency
• O
ffl
oading makes queries with higher
selectivity faster as less amount of data
is moved around the system. Also, less
time goes in data (de)serialization and
more into processing.

• LZ4 compressed Arrow IPC
fi
les
(Bottom) makes SkyhookDM better
performing than Parquet
fi
les (Top) since
they are faster to R/W.
Parquet
on Disk
LZ4 IPC on
Disk
10
Experiments: CPU Usage
• SkyhookDM nicely o
ffl
oads CPU usage from client layer to storage layer. For
example with 4 OSDs and 100% selectivity,
Without
Skyhook
With Skyhook
11
Experiments: Network Traffic
• SkyhookDM saves network
bandwidth by transferring only
the data that is requested by the
client.

• We end up transferring a little
more data in case of 100% as
LZ4 compressed Arrow is larger
than Parquet binary data.
1%
10%
100%
12
Experiments: Crash Recovery
• In SkyhookDM, since processing is colocated with storage nodes, the crash recovery
and consistency semantics of the storage layer apply naturally to query processing.
Crash Point
13
Coffea + SkyhookDM
• Implemented a run_parquet_job executor method in Co
ff
ea to be able to read from
Parquet
fi
les using the Arrow Dataset API. This in turn allowed integrating Co
ff
ea with
SkyhookDM seamlessly.
14
41.5%
30.5%
24.6%
3
.
3
4
%
0.103%
0.0324%
0.00855%
0.00511%
[6] Serialize Result Table
[5] Scan Parquet Data
[7] Result Transfer
[4] Disk I/O
[3] Deserialize Scan Request
[1] Stat Fragment
[8] Deserialize Result Table
[2] Serialize Scan Request
Sending uncompressed IPC
Ongoing Work
• Arrow’s memory layout requires internal memory copies to serialize it to a
contiguous on the wire format and this has a very high overhead.
48.3%
29.5%
11.7%
5.37%
5.11%
0.0513%
0.0304%
0.00771%
[5] Scan Parquet Data
[6] Serialize Result Table
[7] Result Transfer
[8] Deserialize Result Table
[4] Disk I/O
[3] Deserialize Scan Request
[1] Stat Fragment
[2] Serialize Scan Request
Sending LZ4 compressed IPC
• Collaborating with ServiceX and Co
ff
ea team to integrate SkyhookDM into the
larger analysis facility ecosystem.
15
Checkout our work
• Github Repository: https://github.com/uccross/skyhookdm-arrow

• Docker containers: https://github.com/uccross/skyhookdm-arrow-docker

• ArXiv Paper: https://arxiv.org/pdf/2105.09894.pdf

• Co
ff
ea Skyhook Plugin: https://github.com/Co
ff
eaTeam/co
ff
ea/tree/master/
docker/co
ff
ea_rados_parquet

• Several bugs found and reported in Apache Arrow: ARROW-13161,
ARROW-13126, ARROW-13088.
16
Thank You


Questions ?


17

Contenu connexe

Tendances

inGeneoS: Intercontinental Genetic sequencing over trans-Pacific networks and...
inGeneoS: Intercontinental Genetic sequencing over trans-Pacific networks and...inGeneoS: Intercontinental Genetic sequencing over trans-Pacific networks and...
inGeneoS: Intercontinental Genetic sequencing over trans-Pacific networks and...
Andrew Howard
 

Tendances (20)

Using A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific OutputUsing A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific Output
 
Deep Learning on Aerial Imagery: What does it look like on a map?
Deep Learning on Aerial Imagery: What does it look like on a map?Deep Learning on Aerial Imagery: What does it look like on a map?
Deep Learning on Aerial Imagery: What does it look like on a map?
 
Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013
Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013
Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013
 
Q4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis PresentationQ4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis Presentation
 
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...
 
Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...
 
GeoSpatially enabling your Spark and Accumulo clusters with LocationTech
GeoSpatially enabling your Spark and Accumulo clusters with LocationTechGeoSpatially enabling your Spark and Accumulo clusters with LocationTech
GeoSpatially enabling your Spark and Accumulo clusters with LocationTech
 
Federated HPC Clouds applied to Radiation Therapy
Federated HPC Clouds applied to Radiation TherapyFederated HPC Clouds applied to Radiation Therapy
Federated HPC Clouds applied to Radiation Therapy
 
The OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack NordicThe OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack Nordic
 
inGeneoS: Intercontinental Genetic sequencing over trans-Pacific networks and...
inGeneoS: Intercontinental Genetic sequencing over trans-Pacific networks and...inGeneoS: Intercontinental Genetic sequencing over trans-Pacific networks and...
inGeneoS: Intercontinental Genetic sequencing over trans-Pacific networks and...
 
OpenStack at CERN : A 5 year perspective
OpenStack at CERN : A 5 year perspectiveOpenStack at CERN : A 5 year perspective
OpenStack at CERN : A 5 year perspective
 
How a Particle Accelerator Monitors Scientific Experiments Using InfluxDB
How a Particle Accelerator Monitors Scientific Experiments Using InfluxDBHow a Particle Accelerator Monitors Scientific Experiments Using InfluxDB
How a Particle Accelerator Monitors Scientific Experiments Using InfluxDB
 
OpenStack @ CERN, by Tim Bell
OpenStack @ CERN, by Tim BellOpenStack @ CERN, by Tim Bell
OpenStack @ CERN, by Tim Bell
 
Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013
Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013
Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013
 
20150924 rda federation_v1
20150924 rda federation_v120150924 rda federation_v1
20150924 rda federation_v1
 
20170926 cern cloud v4
20170926 cern cloud v420170926 cern cloud v4
20170926 cern cloud v4
 
Cycle Computing Record-breaking Petascale HPC Run
Cycle Computing Record-breaking Petascale HPC RunCycle Computing Record-breaking Petascale HPC Run
Cycle Computing Record-breaking Petascale HPC Run
 
BioPig for scalable analysis of big sequencing data
BioPig for scalable analysis of big sequencing dataBioPig for scalable analysis of big sequencing data
BioPig for scalable analysis of big sequencing data
 
How HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceHow HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental science
 
20181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v320181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v3
 

Similaire à SkyhookDM - Towards an Arrow-Native Storage System

Skyhook: Towards an Arrow-Native Storage System, CCGrid 2022
Skyhook: Towards an Arrow-Native Storage System, CCGrid 2022Skyhook: Towards an Arrow-Native Storage System, CCGrid 2022
Skyhook: Towards an Arrow-Native Storage System, CCGrid 2022
JayjeetChakraborty
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the Cloud
Databricks
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the Cloud
Rose Toomey
 
Oracle GoldenGate Architecture Performance
Oracle GoldenGate Architecture PerformanceOracle GoldenGate Architecture Performance
Oracle GoldenGate Architecture Performance
Enkitec
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Cloudera, Inc.
 

Similaire à SkyhookDM - Towards an Arrow-Native Storage System (20)

Skyhook: Towards an Arrow-Native Storage System, CCGrid 2022
Skyhook: Towards an Arrow-Native Storage System, CCGrid 2022Skyhook: Towards an Arrow-Native Storage System, CCGrid 2022
Skyhook: Towards an Arrow-Native Storage System, CCGrid 2022
 
OracleStore: A Highly Performant RawStore Implementation for Hive Metastore
OracleStore: A Highly Performant RawStore Implementation for Hive MetastoreOracleStore: A Highly Performant RawStore Implementation for Hive Metastore
OracleStore: A Highly Performant RawStore Implementation for Hive Metastore
 
RaptorX: Building a 10X Faster Presto with hierarchical cache
RaptorX: Building a 10X Faster Presto with hierarchical cacheRaptorX: Building a 10X Faster Presto with hierarchical cache
RaptorX: Building a 10X Faster Presto with hierarchical cache
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications
 
Inter connect2016 yss1841-cloud-storage-options-v4
Inter connect2016 yss1841-cloud-storage-options-v4Inter connect2016 yss1841-cloud-storage-options-v4
Inter connect2016 yss1841-cloud-storage-options-v4
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
OSGi Community Event 2010 - Modular Applications on a Data Grid - A Case Stud...
OSGi Community Event 2010 - Modular Applications on a Data Grid - A Case Stud...OSGi Community Event 2010 - Modular Applications on a Data Grid - A Case Stud...
OSGi Community Event 2010 - Modular Applications on a Data Grid - A Case Stud...
 
Webinar: Untethering Compute from Storage
Webinar: Untethering Compute from StorageWebinar: Untethering Compute from Storage
Webinar: Untethering Compute from Storage
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right
 
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu YongUnlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the Cloud
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the Cloud
 
COBOL to Apache Spark
COBOL to Apache SparkCOBOL to Apache Spark
COBOL to Apache Spark
 
Oracle GoldenGate Architecture Performance
Oracle GoldenGate Architecture PerformanceOracle GoldenGate Architecture Performance
Oracle GoldenGate Architecture Performance
 
Scaling Security Workflows in Government Agencies
Scaling Security Workflows in Government AgenciesScaling Security Workflows in Government Agencies
Scaling Security Workflows in Government Agencies
 
GEN-Z: An Overview and Use Cases
GEN-Z: An Overview and Use CasesGEN-Z: An Overview and Use Cases
GEN-Z: An Overview and Use Cases
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
 
Cloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation inCloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation in
 
Accelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & AlluxioAccelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & Alluxio
 

Dernier

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 

Dernier (20)

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
 
SHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationSHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions Presentation
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
 

SkyhookDM - Towards an Arrow-Native Storage System

  • 1. Jayjeet Chakraborty Towards an Arrow-Native Storage System SkyhookDM Mentored by: Carlos Maltzahn, Ivo Jimenez, Je ff LeFevre 1
  • 2. Who am I ? • Incoming Grad Student at UC Santa Cruz • CS Graduate from NIT Durgapur, India • IRIS-HEP Fellow Summer 2020 • Twitter: @heyjc25 • Github: JayjeetAtGithub • LinkedIn: https://www.linkedin.com/in/jayjeet-chakraborty-077579162/ • E-Mail: jchakra1@ucsc.edu 2
  • 3. Problem • CPU is the new bottleneck with high speed network and storage devices. • Client-side processing of data from highly e ffi cient storage formats like Parquet, ORC exhausts the CPUs. • Severely hampered scalability. • O ffl oad computation from client to the storage layer. • Take advantage of the idle CPUs of storage systems for increased processing rates and faster queries. • Results in less data movement and network tra ffi c. Our Solution 3
  • 4. Introduction to Ceph 1.Provides 3 types of storage interface: File, Object, Block.
 2.No central point of failure. Uses CRUSH maps that contains object - OSD mapping. A CRUSH map in each client. Client talks directly to OSD.
 3.Highly extensible Object storage layer via the Ceph Object Classes SDK.
 4
  • 5. • Language-independent columnar memory format for fl at and hierarchical data, organised for e ffi cient analytic operations on modern hardware. • Share data between processes without serialization overhead. Before Arrow After Arrow 5
  • 6. Components of Arrow 6 Arrow components used by Skyhook
  • 7. Design Paradigm • Extend client and storage layers of programmable storage systems with data access libraries. • Embed a FS shim inside storage nodes to have fi le-like view over objects. • Allow direct interaction with objects in an object store while bypassing the fi lesystem layer utilising FS metadata. 7
  • 8. Architecture • Arrow data access libraries embedded inside Ceph OSDs to allow fi le fragment scanning inside the storage layer. • Expose the functionality through the Arrow Dataset API by creating a new fi le format abstraction “RadosParquetFileFormat”. 8
  • 9. File Layout Design • Large multi-gigabyte Parquet fi les are split into smaller ~128 MB Parquet fi les. • Each Parquet fi le is stored in a single RADOS object for SkyhookDM to access. 9
  • 10. Experiments: Latency • O ffl oading makes queries with higher selectivity faster as less amount of data is moved around the system. Also, less time goes in data (de)serialization and more into processing. • LZ4 compressed Arrow IPC fi les (Bottom) makes SkyhookDM better performing than Parquet fi les (Top) since they are faster to R/W. Parquet on Disk LZ4 IPC on Disk 10
  • 11. Experiments: CPU Usage • SkyhookDM nicely o ffl oads CPU usage from client layer to storage layer. For example with 4 OSDs and 100% selectivity, Without Skyhook With Skyhook 11
  • 12. Experiments: Network Traffic • SkyhookDM saves network bandwidth by transferring only the data that is requested by the client. • We end up transferring a little more data in case of 100% as LZ4 compressed Arrow is larger than Parquet binary data. 1% 10% 100% 12
  • 13. Experiments: Crash Recovery • In SkyhookDM, since processing is colocated with storage nodes, the crash recovery and consistency semantics of the storage layer apply naturally to query processing. Crash Point 13
  • 14. Coffea + SkyhookDM • Implemented a run_parquet_job executor method in Co ff ea to be able to read from Parquet fi les using the Arrow Dataset API. This in turn allowed integrating Co ff ea with SkyhookDM seamlessly. 14
  • 15. 41.5% 30.5% 24.6% 3 . 3 4 % 0.103% 0.0324% 0.00855% 0.00511% [6] Serialize Result Table [5] Scan Parquet Data [7] Result Transfer [4] Disk I/O [3] Deserialize Scan Request [1] Stat Fragment [8] Deserialize Result Table [2] Serialize Scan Request Sending uncompressed IPC Ongoing Work • Arrow’s memory layout requires internal memory copies to serialize it to a contiguous on the wire format and this has a very high overhead. 48.3% 29.5% 11.7% 5.37% 5.11% 0.0513% 0.0304% 0.00771% [5] Scan Parquet Data [6] Serialize Result Table [7] Result Transfer [8] Deserialize Result Table [4] Disk I/O [3] Deserialize Scan Request [1] Stat Fragment [2] Serialize Scan Request Sending LZ4 compressed IPC • Collaborating with ServiceX and Co ff ea team to integrate SkyhookDM into the larger analysis facility ecosystem. 15
  • 16. Checkout our work • Github Repository: https://github.com/uccross/skyhookdm-arrow • Docker containers: https://github.com/uccross/skyhookdm-arrow-docker • ArXiv Paper: https://arxiv.org/pdf/2105.09894.pdf • Co ff ea Skyhook Plugin: https://github.com/Co ff eaTeam/co ff ea/tree/master/ docker/co ff ea_rados_parquet • Several bugs found and reported in Apache Arrow: ARROW-13161, ARROW-13126, ARROW-13088. 16