SlideShare une entreprise Scribd logo
1  sur  37
© Cloudera, Inc. All rights reserved.
HADOOP STORAGE IN THE CLOUD NATIVE ERA
Nandakumar Vadivelu
nanda@apache.org
© Cloudera, Inc. All rights reserved. 2© Cloudera, Inc. All rights reserved.
Hadoop Storage
Container
Orchestrator
Application
(Container)
Application
(Container)
Application
(Container)
Application
(Container)
Application
(Container)
© Cloudera, Inc. All rights reserved. 3
HDDS
Ozone
Quadra
CSI
© Cloudera, Inc. All rights reserved. 4© Cloudera, Inc. All rights reserved.
HADOOP DISTRIBUTED DATA STORE
© Cloudera, Inc. All rights reserved. 5
Namespace
+
Blockspace
HDFS - NameNode
© Cloudera, Inc. All rights reserved. 6
Namespace
File -> B1, B2, B3
Block Management Layer
B1 -> Dn1, Dn2, Dn3
Namenode
Layering
© Cloudera, Inc. All rights reserved. 7
Namespace
HDDS
Block Storage
© Cloudera, Inc. All rights reserved. 8© Cloudera, Inc. All rights reserved.
BlocksMetadata
STORAGE CONTAINER
© Cloudera, Inc. All rights reserved. 9© Cloudera, Inc. All rights reserved.
ARCHITECTURE
Storage
Container
Manager
DatanodeDatanode Datanode
© Cloudera, Inc. All rights reserved. 10© Cloudera, Inc. All rights reserved.
Container Protocol
● Create Container
● Get Container
● List Container
● Close Container
● Delete Container
Container Operations
● Read Block
● Write Block
● Delete Block
© Cloudera, Inc. All rights reserved. 11© Cloudera, Inc. All rights reserved.
OZONE
© Cloudera, Inc. All rights reserved. 12
Ozone Manager
Key 1 -> List <block Id>
Key 2 -> List <block Id>
Key 3 -> List <block Id>
Key 4 -> List <block Id>
Key 5 -> List <block Id>
…
Key n -> List <block Id>
Block Id -> [Container Id + Local Id]
© Cloudera, Inc. All rights reserved. 13© Cloudera, Inc. All rights reserved.
Storage
Container Manager
DatanodeDatanode Datanode
HDDS
Ozone Manager
ARCHITECTURE
© Cloudera, Inc. All rights reserved. 14© Cloudera, Inc. All rights reserved.
INTERFACES
Object Store API
(RPC)
OzoneFileSystem (HCFS)
Connector
S3 Connector
© Cloudera, Inc. All rights reserved. 15© Cloudera, Inc. All rights reserved.
NAMENODE’
HDFS-10419
© Cloudera, Inc. All rights reserved. 16© Cloudera, Inc. All rights reserved.
Storage
Container Manager
DatanodeDatanode Datanode
HDDS
NameNode’
ARCHITECTURE
© Cloudera, Inc. All rights reserved. 17© Cloudera, Inc. All rights reserved.
QUADRA
HDFS-11118
© Cloudera, Inc. All rights reserved. 18© Cloudera, Inc. All rights reserved.
QUADRA
• LUN like Raw-Block Storage
• Backed by HDDS
• Mountable disk FS volume
• Volume: A raw-block device that can be used to create mountable disk
• Can create filesystems like ext4 or XFS on the volumes
• POSIX semantics
© Cloudera, Inc. All rights reserved. 19© Cloudera, Inc. All rights reserved.
Storage
Container Manager
DatanodeDatanode Datanode
HDDS
Quadra Manager
iSCSI Server
ARCHITECTURE
© Cloudera, Inc. All rights reserved. 20© Cloudera, Inc. All rights reserved.
USAGE
• Create a Volume
• quadra -c foo datavolume 4TB
• Mount the volume
• iscsiadm -m node -o new -T foo:datavolume -p localhost:3260
• Format the Volume
• mkfs.ext4 -b 4096 /dev/sdb
• Mount the filesystem
• mkdir datavol; mount /dev/sdb datavol
© Cloudera, Inc. All rights reserved. 21© Cloudera, Inc. All rights reserved.
Storage
Container
Manager
HDDS
Datanode DatanodeDatanode
Quadra
Volume
Manager
Quadra
Plugin
JSCSI
Kernel
User
SCSI Initiator
Volume API
Data Path
HOST
© Cloudera, Inc. All rights reserved. 22© Cloudera, Inc. All rights reserved.
Storage
Container Manager
DatanodeDatanode Datanode
HDDS
Quadra Manager Ozone Manager NameNode’
Block Store Object Store File Store
HADOOP STORAGE ECOSYSTEM
© Cloudera, Inc. All rights reserved. 23© Cloudera, Inc. All rights reserved.
CONTAINER STORAGE INTERFACE
© Cloudera, Inc. All rights reserved. 24© Cloudera, Inc. All rights reserved.
Pluggable Storage
Interface
Pluggable Storage
Interface
Pluggable Storage
Interface
Storage Provider has to write a plugin for each container orchestrator
WHY?
© Cloudera, Inc. All rights reserved. 25© Cloudera, Inc. All rights reserved.
CONTAINER STORAGE INTERFACE
• Specification
• Interoperable
• Vendor neutral
• Control plane only
© Cloudera, Inc. All rights reserved. 26© Cloudera, Inc. All rights reserved.
Pluggable Storage Interface
Storage Provider
CSI
© Cloudera, Inc. All rights reserved. 27© Cloudera, Inc. All rights reserved.
PLUGINS
Control
Plugin
Node
Plugin
Container
Orchestrator
Storage Provider
© Cloudera, Inc. All rights reserved. 28© Cloudera, Inc. All rights reserved.
• It can run anywhere
• Handles storage volume
creation and deletion
Control
Plugin
Node
Plugin
• Runs on all the nodes
• Handles storage volume
mounting and unmounting
© Cloudera, Inc. All rights reserved. 29
© Cloudera, Inc. All rights reserved. 30© Cloudera, Inc. All rights reserved.
CSI DRIVER FOR HADOOP STORAGE
HDDS-1382
© Cloudera, Inc. All rights reserved. 31
DatanodeDatanode Datanode
HDDS
Quadra Manager
iSCSI Server
Storage
Container Manager
Hadoop
Storage
Container
Orchestrator
Application Application Application Application Application
CSI Driver
© Cloudera, Inc. All rights reserved. 32© Cloudera, Inc. All rights reserved.
Control
Plugin
Node
Plugin
Hadoop CSI Driver
Storage
Container
Manager
HDDS
DatanodeDatanode
Quadra
Volume
Manager
Quadra
Plugin
JSCSI
Volume API
Data Path
Datanode
© Cloudera, Inc. All rights reserved. 33© Cloudera, Inc. All rights reserved.
DEMO
© Cloudera, Inc. All rights reserved. 35© Cloudera, Inc. All rights reserved.
CURRENT STATUS
● Apache Hadoop Ozone 0.4.0-alpha – Released on May 7
● Implementing Namenode on top of HDDS (HDFS-10419) – Design Discussion
● Quadra (HDFS-11118) – Design Discussion (POC)
● CSI Server for Ozone (HDDS-1382) – In development
© Cloudera, Inc. All rights reserved. 36© Cloudera, Inc. All rights reserved.
Q & A
© Cloudera, Inc. All rights reserved.
THANK YOU

Contenu connexe

Tendances

High Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureHigh Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureDataWorks Summit
 
HBase Backups
HBase BackupsHBase Backups
HBase BackupsHBaseCon
 
Ozone - Evolution of hdfs scalability
Ozone - Evolution of hdfs scalabilityOzone - Evolution of hdfs scalability
Ozone - Evolution of hdfs scalabilityDinesh Chitlangia
 
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsMulti-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsDataWorks Summit
 
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementTaming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementDataWorks Summit/Hadoop Summit
 
HBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and SparkHBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and SparkHBaseCon
 
From docker to kubernetes: running Apache Hadoop in a cloud native way
From docker to kubernetes: running Apache Hadoop in a cloud native wayFrom docker to kubernetes: running Apache Hadoop in a cloud native way
From docker to kubernetes: running Apache Hadoop in a cloud native wayDataWorks Summit
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestHBaseCon
 
HBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBaseCon
 
Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3Manish Chopra
 
Backup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
Backup management with Ceph Storage - Camilo Echevarne, Félix BarbeiraBackup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
Backup management with Ceph Storage - Camilo Echevarne, Félix BarbeiraCeph Community
 
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...Red_Hat_Storage
 
Red Hat Storage Day New York - New Reference Architectures
Red Hat Storage Day New York - New Reference ArchitecturesRed Hat Storage Day New York - New Reference Architectures
Red Hat Storage Day New York - New Reference ArchitecturesRed_Hat_Storage
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in HadoopBackup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadooplarsgeorge
 
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu YongUnlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu YongCeph Community
 
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownHow the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownDataWorks Summit
 
HBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at PinterestHBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at PinterestCloudera, Inc.
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSDataWorks Summit
 

Tendances (20)

High Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureHigh Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and Future
 
Cross-DC Fault-Tolerant ViewFileSystem @ Twitter
Cross-DC Fault-Tolerant ViewFileSystem @ TwitterCross-DC Fault-Tolerant ViewFileSystem @ Twitter
Cross-DC Fault-Tolerant ViewFileSystem @ Twitter
 
HBase Backups
HBase BackupsHBase Backups
HBase Backups
 
Ozone - Evolution of hdfs scalability
Ozone - Evolution of hdfs scalabilityOzone - Evolution of hdfs scalability
Ozone - Evolution of hdfs scalability
 
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsMulti-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
 
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementTaming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop Management
 
HBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and SparkHBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and Spark
 
From docker to kubernetes: running Apache Hadoop in a cloud native way
From docker to kubernetes: running Apache Hadoop in a cloud native wayFrom docker to kubernetes: running Apache Hadoop in a cloud native way
From docker to kubernetes: running Apache Hadoop in a cloud native way
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ Pinterest
 
HBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDK
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
 
Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3
 
Backup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
Backup management with Ceph Storage - Camilo Echevarne, Félix BarbeiraBackup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
Backup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
 
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...
 
Red Hat Storage Day New York - New Reference Architectures
Red Hat Storage Day New York - New Reference ArchitecturesRed Hat Storage Day New York - New Reference Architectures
Red Hat Storage Day New York - New Reference Architectures
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in HadoopBackup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu YongUnlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
 
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownHow the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside Down
 
HBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at PinterestHBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at Pinterest
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
 

Similaire à Hadoop Storage in the Cloud Native Era

Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)Wei-Chiu Chuang
 
Hadoop on Cloud: Why and How?
Hadoop on Cloud: Why and How?Hadoop on Cloud: Why and How?
Hadoop on Cloud: Why and How?Cloudera, Inc.
 
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupOne Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupAndrei Savu
 
One Hadoop, Multiple Clouds
One Hadoop, Multiple CloudsOne Hadoop, Multiple Clouds
One Hadoop, Multiple CloudsCloudera, Inc.
 
Querying multiple distributed storage systems with Apache Hive robustly
Querying multiple distributed storage systems with Apache Hive robustlyQuerying multiple distributed storage systems with Apache Hive robustly
Querying multiple distributed storage systems with Apache Hive robustlyAshish Singh
 
Cloudera のサポートエンジニアリング #supennight
Cloudera のサポートエンジニアリング #supennightCloudera のサポートエンジニアリング #supennight
Cloudera のサポートエンジニアリング #supennightCloudera Japan
 
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSFive Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSCloudera, Inc.
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Stefan Lipp
 
Apache Spark Operations
Apache Spark OperationsApache Spark Operations
Apache Spark OperationsCloudera, Inc.
 
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...Big Data Spain
 
Cloudera GoDataFest Deploying Cloudera in the Cloud
Cloudera GoDataFest Deploying Cloudera in the CloudCloudera GoDataFest Deploying Cloudera in the Cloud
Cloudera GoDataFest Deploying Cloudera in the CloudGoDataDriven
 
Scaling DataStax in Docker
Scaling DataStax in DockerScaling DataStax in Docker
Scaling DataStax in DockerDataStax
 
Apache Accumulo Overview
Apache Accumulo OverviewApache Accumulo Overview
Apache Accumulo OverviewBill Havanki
 
Data Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the EnterpriseData Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the EnterpriseCloudera, Inc.
 
Self-service Big Data Analytics on Microsoft Azure
Self-service Big Data Analytics on Microsoft AzureSelf-service Big Data Analytics on Microsoft Azure
Self-service Big Data Analytics on Microsoft AzureCloudera, Inc.
 
How to go into production your machine learning models? #CWT2017
How to go into production your machine learning models? #CWT2017How to go into production your machine learning models? #CWT2017
How to go into production your machine learning models? #CWT2017Cloudera Japan
 
Risk Management for Data: Secured and Governed
Risk Management for Data: Secured and GovernedRisk Management for Data: Secured and Governed
Risk Management for Data: Secured and GovernedCloudera, Inc.
 

Similaire à Hadoop Storage in the Cloud Native Era (20)

Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)
 
Hadoop on Cloud: Why and How?
Hadoop on Cloud: Why and How?Hadoop on Cloud: Why and How?
Hadoop on Cloud: Why and How?
 
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupOne Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data Meetup
 
One Hadoop, Multiple Clouds
One Hadoop, Multiple CloudsOne Hadoop, Multiple Clouds
One Hadoop, Multiple Clouds
 
Querying multiple distributed storage systems with Apache Hive robustly
Querying multiple distributed storage systems with Apache Hive robustlyQuerying multiple distributed storage systems with Apache Hive robustly
Querying multiple distributed storage systems with Apache Hive robustly
 
Cloudera のサポートエンジニアリング #supennight
Cloudera のサポートエンジニアリング #supennightCloudera のサポートエンジニアリング #supennight
Cloudera のサポートエンジニアリング #supennight
 
Apache Hadoop 3
Apache Hadoop 3Apache Hadoop 3
Apache Hadoop 3
 
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSFive Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWS
 
Hadoop Operations
Hadoop OperationsHadoop Operations
Hadoop Operations
 
Introduction to HBase
Introduction to HBaseIntroduction to HBase
Introduction to HBase
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
 
Apache Spark Operations
Apache Spark OperationsApache Spark Operations
Apache Spark Operations
 
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
 
Cloudera GoDataFest Deploying Cloudera in the Cloud
Cloudera GoDataFest Deploying Cloudera in the CloudCloudera GoDataFest Deploying Cloudera in the Cloud
Cloudera GoDataFest Deploying Cloudera in the Cloud
 
Scaling DataStax in Docker
Scaling DataStax in DockerScaling DataStax in Docker
Scaling DataStax in Docker
 
Apache Accumulo Overview
Apache Accumulo OverviewApache Accumulo Overview
Apache Accumulo Overview
 
Data Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the EnterpriseData Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the Enterprise
 
Self-service Big Data Analytics on Microsoft Azure
Self-service Big Data Analytics on Microsoft AzureSelf-service Big Data Analytics on Microsoft Azure
Self-service Big Data Analytics on Microsoft Azure
 
How to go into production your machine learning models? #CWT2017
How to go into production your machine learning models? #CWT2017How to go into production your machine learning models? #CWT2017
How to go into production your machine learning models? #CWT2017
 
Risk Management for Data: Secured and Governed
Risk Management for Data: Secured and GovernedRisk Management for Data: Secured and Governed
Risk Management for Data: Secured and Governed
 

Plus de DataWorks Summit

Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...DataWorks Summit
 
Applying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real ProblemsApplying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real ProblemsDataWorks Summit
 
Open Source, Open Data: Driving Innovation in Smart Cities
Open Source, Open Data: Driving Innovation in Smart CitiesOpen Source, Open Data: Driving Innovation in Smart Cities
Open Source, Open Data: Driving Innovation in Smart CitiesDataWorks Summit
 

Plus de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
 
Applying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real ProblemsApplying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real Problems
 
Open Source, Open Data: Driving Innovation in Smart Cities
Open Source, Open Data: Driving Innovation in Smart CitiesOpen Source, Open Data: Driving Innovation in Smart Cities
Open Source, Open Data: Driving Innovation in Smart Cities
 

Dernier

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 

Dernier (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 

Hadoop Storage in the Cloud Native Era

  • 1. © Cloudera, Inc. All rights reserved. HADOOP STORAGE IN THE CLOUD NATIVE ERA Nandakumar Vadivelu nanda@apache.org
  • 2. © Cloudera, Inc. All rights reserved. 2© Cloudera, Inc. All rights reserved. Hadoop Storage Container Orchestrator Application (Container) Application (Container) Application (Container) Application (Container) Application (Container)
  • 3. © Cloudera, Inc. All rights reserved. 3 HDDS Ozone Quadra CSI
  • 4. © Cloudera, Inc. All rights reserved. 4© Cloudera, Inc. All rights reserved. HADOOP DISTRIBUTED DATA STORE
  • 5. © Cloudera, Inc. All rights reserved. 5 Namespace + Blockspace HDFS - NameNode
  • 6. © Cloudera, Inc. All rights reserved. 6 Namespace File -> B1, B2, B3 Block Management Layer B1 -> Dn1, Dn2, Dn3 Namenode Layering
  • 7. © Cloudera, Inc. All rights reserved. 7 Namespace HDDS Block Storage
  • 8. © Cloudera, Inc. All rights reserved. 8© Cloudera, Inc. All rights reserved. BlocksMetadata STORAGE CONTAINER
  • 9. © Cloudera, Inc. All rights reserved. 9© Cloudera, Inc. All rights reserved. ARCHITECTURE Storage Container Manager DatanodeDatanode Datanode
  • 10. © Cloudera, Inc. All rights reserved. 10© Cloudera, Inc. All rights reserved. Container Protocol ● Create Container ● Get Container ● List Container ● Close Container ● Delete Container Container Operations ● Read Block ● Write Block ● Delete Block
  • 11. © Cloudera, Inc. All rights reserved. 11© Cloudera, Inc. All rights reserved. OZONE
  • 12. © Cloudera, Inc. All rights reserved. 12 Ozone Manager Key 1 -> List <block Id> Key 2 -> List <block Id> Key 3 -> List <block Id> Key 4 -> List <block Id> Key 5 -> List <block Id> … Key n -> List <block Id> Block Id -> [Container Id + Local Id]
  • 13. © Cloudera, Inc. All rights reserved. 13© Cloudera, Inc. All rights reserved. Storage Container Manager DatanodeDatanode Datanode HDDS Ozone Manager ARCHITECTURE
  • 14. © Cloudera, Inc. All rights reserved. 14© Cloudera, Inc. All rights reserved. INTERFACES Object Store API (RPC) OzoneFileSystem (HCFS) Connector S3 Connector
  • 15. © Cloudera, Inc. All rights reserved. 15© Cloudera, Inc. All rights reserved. NAMENODE’ HDFS-10419
  • 16. © Cloudera, Inc. All rights reserved. 16© Cloudera, Inc. All rights reserved. Storage Container Manager DatanodeDatanode Datanode HDDS NameNode’ ARCHITECTURE
  • 17. © Cloudera, Inc. All rights reserved. 17© Cloudera, Inc. All rights reserved. QUADRA HDFS-11118
  • 18. © Cloudera, Inc. All rights reserved. 18© Cloudera, Inc. All rights reserved. QUADRA • LUN like Raw-Block Storage • Backed by HDDS • Mountable disk FS volume • Volume: A raw-block device that can be used to create mountable disk • Can create filesystems like ext4 or XFS on the volumes • POSIX semantics
  • 19. © Cloudera, Inc. All rights reserved. 19© Cloudera, Inc. All rights reserved. Storage Container Manager DatanodeDatanode Datanode HDDS Quadra Manager iSCSI Server ARCHITECTURE
  • 20. © Cloudera, Inc. All rights reserved. 20© Cloudera, Inc. All rights reserved. USAGE • Create a Volume • quadra -c foo datavolume 4TB • Mount the volume • iscsiadm -m node -o new -T foo:datavolume -p localhost:3260 • Format the Volume • mkfs.ext4 -b 4096 /dev/sdb • Mount the filesystem • mkdir datavol; mount /dev/sdb datavol
  • 21. © Cloudera, Inc. All rights reserved. 21© Cloudera, Inc. All rights reserved. Storage Container Manager HDDS Datanode DatanodeDatanode Quadra Volume Manager Quadra Plugin JSCSI Kernel User SCSI Initiator Volume API Data Path HOST
  • 22. © Cloudera, Inc. All rights reserved. 22© Cloudera, Inc. All rights reserved. Storage Container Manager DatanodeDatanode Datanode HDDS Quadra Manager Ozone Manager NameNode’ Block Store Object Store File Store HADOOP STORAGE ECOSYSTEM
  • 23. © Cloudera, Inc. All rights reserved. 23© Cloudera, Inc. All rights reserved. CONTAINER STORAGE INTERFACE
  • 24. © Cloudera, Inc. All rights reserved. 24© Cloudera, Inc. All rights reserved. Pluggable Storage Interface Pluggable Storage Interface Pluggable Storage Interface Storage Provider has to write a plugin for each container orchestrator WHY?
  • 25. © Cloudera, Inc. All rights reserved. 25© Cloudera, Inc. All rights reserved. CONTAINER STORAGE INTERFACE • Specification • Interoperable • Vendor neutral • Control plane only
  • 26. © Cloudera, Inc. All rights reserved. 26© Cloudera, Inc. All rights reserved. Pluggable Storage Interface Storage Provider CSI
  • 27. © Cloudera, Inc. All rights reserved. 27© Cloudera, Inc. All rights reserved. PLUGINS Control Plugin Node Plugin Container Orchestrator Storage Provider
  • 28. © Cloudera, Inc. All rights reserved. 28© Cloudera, Inc. All rights reserved. • It can run anywhere • Handles storage volume creation and deletion Control Plugin Node Plugin • Runs on all the nodes • Handles storage volume mounting and unmounting
  • 29. © Cloudera, Inc. All rights reserved. 29
  • 30. © Cloudera, Inc. All rights reserved. 30© Cloudera, Inc. All rights reserved. CSI DRIVER FOR HADOOP STORAGE HDDS-1382
  • 31. © Cloudera, Inc. All rights reserved. 31 DatanodeDatanode Datanode HDDS Quadra Manager iSCSI Server Storage Container Manager Hadoop Storage Container Orchestrator Application Application Application Application Application CSI Driver
  • 32. © Cloudera, Inc. All rights reserved. 32© Cloudera, Inc. All rights reserved. Control Plugin Node Plugin Hadoop CSI Driver Storage Container Manager HDDS DatanodeDatanode Quadra Volume Manager Quadra Plugin JSCSI Volume API Data Path Datanode
  • 33. © Cloudera, Inc. All rights reserved. 33© Cloudera, Inc. All rights reserved. DEMO
  • 34.
  • 35. © Cloudera, Inc. All rights reserved. 35© Cloudera, Inc. All rights reserved. CURRENT STATUS ● Apache Hadoop Ozone 0.4.0-alpha – Released on May 7 ● Implementing Namenode on top of HDDS (HDFS-10419) – Design Discussion ● Quadra (HDFS-11118) – Design Discussion (POC) ● CSI Server for Ozone (HDDS-1382) – In development
  • 36. © Cloudera, Inc. All rights reserved. 36© Cloudera, Inc. All rights reserved. Q & A
  • 37. © Cloudera, Inc. All rights reserved. THANK YOU