SlideShare une entreprise Scribd logo
1  sur  26
Hadoop Enhancements Using Next-
Gen Intel ® Platform Technologies
Anoop Sam John – PMC member for Apache HBase
Rakesh R - Committer for Apache Zookeeper and PMC member for Apache Bookkeeper
Hadoop Enhancements Using Next-Gen
Intel ® Platform Technologies
Anoop Sam John
Rakesh R
About Us
• Anoop Sam John
• PMC member for Apache HBase and Phoenix
• anoopsamjohn@apache.org
• https://www.linkedin.com/in/anoopsamjohn
• Rakesh R
• Committer for Apache ZooKeeper and BookKeeper
• Apache Hadoop contributor
• rakeshr@apache.org
• https://www.linkedin.com/in/rakeshadr
Agenda
• Intel Enhancements on Hadoop platform
• HDFS
 Erasure coding using ISA-L library
 Encryption using AES-NI
• HBase
 Go Big Cache
HDFS – Distributed FileSystem
“Between the birth of the world and
2003, there were 5 Exabytes of
information created. We now create
5 Exabytes every two days.”
Eric Schmidt, Executive
Chairman of Alphabet, Inc.
HDFS – Current Replication Strategy
• Inherits 3-way replication from Google File System to increase data availability
- 3x storage overhead
• Expensive for,
- Massive amount of data
- Geo-distributed data recovery Datanode1
r1
Datanode2
r2
DFSClient
r3
Rack-1 Rack-2
3X
replication
HDFS – Erasure Coding
• k data blocks + m parity blocks (k + m)
 Example: Reed-Solomon 6 + 3
• Save disk space
• 1.5x storage overhead
X Y X Y
0 0 0
0 1 1
1 0 1
1 1 0
data bits parity bits
Sample codec library
(XOR based)
b1 b2 b3 b4 b5 b6 b7 b8 b9
6 data blocks 3 parity blocks
D1 D2 D3 D4 D5 D6 D7 D8 D9
Durability & Efficiency
3-way data replication Erasure coding : RS – (6,3)
Data Durability 2 3
Storage efficiency 1/3 (33.33%) 6/9 (67%)
Data Durability = How many simultaneous failures can be tolerated?
Storage Efficiency = How much portion of storage is for useful data?
useful data extra data
useful data
Datanode1 Datanode2 Datanode3
Replica1 Replica2 Replica3
redundant data
3-way data replication
D1 D2 D3 D4 D5 D6 D7 D8 D9
b1 b2 b3 b4 b5 b6 b7 b8 b9
6 data blocks 3 parity blocks
RS-(6,3) Erasure coding
• Released version – Apache Hadoop 3.0.0-alpha1
Microbenchmark : Codec Calculation
MBpersecond
Image courtesy Cloudera
• New Intel architecture solutions for storage (ISA-L)
• Intel® Intelligent Storage Acceleration Library provides a solution to deploy EC
with better performance.
https://01.org/intel%C2%AE-storage-acceleration-library-open-source-version
HDFS – Encryption
HDFS-Encryption
• Sensitivity of the data and
managing privacy of the data
is very important for the big
data analytics
• Encryption is a regulatory
requirement for many
business sectors
- Finance
- Government
- Healthcare etc.
DFSClient
Per-file key
operations
Data opsRead/Write
encrypted
data
KMS
HDFS Cluster
Encryption key ops
data at-rest
Disk
Encryption library
• Released version – Apache Hadoop 2.6.0
Encryption Algorithm
• Data encryption/decryption is costlier
• Encryption ciphers.
 AES-CTR (Advanced Encryption Standard - Counter Mode) is most popular
Either 128 or 192 or 256 bit keys
Encryption AES-CTR
• Two implementations of AES-CTR
1. JCE (Java Cryptography Extension) software implementation
2. OpenSSL hardware accelerated AES-NI (Intel ® Advanced Encryption
Standard New Instructions) implementation
AES-NI available in Westmere(2010) and newer Intel CPUs
AES-NI further optimized in Haswell(2013)
https://software.intel.com/en-us/articles/intel-advanced-encryption-standard-instructions-aes-ni
Microbenchmark: Encrypt/Decrypt 1GB Byte array
Test Environment:
 Run locally on a single Haswell machine
 Single threaded, excluded HDFS overheads(checksumming, network, copies)
Image courtesy Cloudera
Apache Commons Crypto
• Cryptographic layer is incubated as new Apache component
http://commons.apache.org/proper/commons-crypto/
• Apache Commons crypto was integrated with Apache Spark as well
for shuffle encryption.
HBase
HBase
• NoSQL database in Hadoop eco
• Accumulates writes in memory and flushes to HDFS
• Caches data
• Reduced read latency
• Better read throughput
• Memory hungry processes
Big Memory
• Hadoop platforms no longer only for commodity hardware.
• Systems moving towards faster CPU and bigger memories
Big Data => Big Storage + Big Memory
• Non Volatile memory technology
• 3D XPoint™ DIMMS from Intel®
• Higher memory capability
• Lower cost vs DDR
HBase – Go Big Cache
Data Data Data Data Data Data
HBase
JVM Offheap
memory
Client
HDFS
Cache
Reads
Reads
• JVM GC tuning continues to
be a challenge with larger
heaps (new GC algos)
• Much bigger sized cache in
offheap memory for faster
random reads
• Better predictable latency
• Building blocks for
supporting 3D XPoint™
products
HBase – Go Big Cache
Performance before offheaping
Performance after offheaping
Image courtesy Alibaba Inc.
• Alibaba adopted this feature for their 1600 node cluster.
• Used in double 11 online sale
Questions
Thank You
https://software.intel.com/en-us/bigdata/apache-big-data-stack
Backup slides
Microbenchmark : Codec CalculationMBperseconds
Microbenchmark : Codec CalculationMBperseconds

Contenu connexe

Tendances

Elasticsearch for Logs & Metrics - a deep dive
Elasticsearch for Logs & Metrics - a deep diveElasticsearch for Logs & Metrics - a deep dive
Elasticsearch for Logs & Metrics - a deep diveSematext Group, Inc.
 
Solr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the UglySolr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the UglySematext Group, Inc.
 
OpenNebulaConf 2016 - Icinga2 - APIFY them all by Achim Ledermüller, Netways ...
OpenNebulaConf 2016 - Icinga2 - APIFY them all by Achim Ledermüller, Netways ...OpenNebulaConf 2016 - Icinga2 - APIFY them all by Achim Ledermüller, Netways ...
OpenNebulaConf 2016 - Icinga2 - APIFY them all by Achim Ledermüller, Netways ...OpenNebula Project
 
Automating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesAutomating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesSadayuki Furuhashi
 
Building the Right Platform Architecture for Hadoop
Building the Right Platform Architecture for HadoopBuilding the Right Platform Architecture for Hadoop
Building the Right Platform Architecture for HadoopAll Things Open
 
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons LearnedCeph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons LearnedCeph Community
 
Develop with linux containers and docker
Develop with linux containers and dockerDevelop with linux containers and docker
Develop with linux containers and dockerFabio Fumarola
 
8a. How To Setup HBase with Docker
8a. How To Setup HBase with Docker8a. How To Setup HBase with Docker
8a. How To Setup HBase with DockerFabio Fumarola
 
Which Hypervisor is Best?
Which Hypervisor is Best?Which Hypervisor is Best?
Which Hypervisor is Best?Kyle Bader
 
Flying Circus Ceph Case Study (CEPH Usergroup Berlin)
Flying Circus Ceph Case Study (CEPH Usergroup Berlin)Flying Circus Ceph Case Study (CEPH Usergroup Berlin)
Flying Circus Ceph Case Study (CEPH Usergroup Berlin)Christian Theune
 
Running High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Running High Performance & Fault-tolerant Elasticsearch Clusters on DockerRunning High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Running High Performance & Fault-tolerant Elasticsearch Clusters on DockerSematext Group, Inc.
 
Linux Block Cache Practice on Ceph BlueStore - Junxin Zhang
Linux Block Cache Practice on Ceph BlueStore - Junxin ZhangLinux Block Cache Practice on Ceph BlueStore - Junxin Zhang
Linux Block Cache Practice on Ceph BlueStore - Junxin ZhangCeph Community
 
MySQL Head to Head Performance
MySQL Head to Head PerformanceMySQL Head to Head Performance
MySQL Head to Head PerformanceKyle Bader
 
Seastar / ScyllaDB, or how we implemented a 10-times faster Cassandra
Seastar / ScyllaDB,  or how we implemented a 10-times faster CassandraSeastar / ScyllaDB,  or how we implemented a 10-times faster Cassandra
Seastar / ScyllaDB, or how we implemented a 10-times faster CassandraTzach Livyatan
 

Tendances (20)

London HUG 8/3 - Nomad
London HUG 8/3 - NomadLondon HUG 8/3 - Nomad
London HUG 8/3 - Nomad
 
Elasticsearch for Logs & Metrics - a deep dive
Elasticsearch for Logs & Metrics - a deep diveElasticsearch for Logs & Metrics - a deep dive
Elasticsearch for Logs & Metrics - a deep dive
 
Scripting Embulk Plugins
Scripting Embulk PluginsScripting Embulk Plugins
Scripting Embulk Plugins
 
Solr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the UglySolr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the Ugly
 
OpenNebulaConf 2016 - Icinga2 - APIFY them all by Achim Ledermüller, Netways ...
OpenNebulaConf 2016 - Icinga2 - APIFY them all by Achim Ledermüller, Netways ...OpenNebulaConf 2016 - Icinga2 - APIFY them all by Achim Ledermüller, Netways ...
OpenNebulaConf 2016 - Icinga2 - APIFY them all by Achim Ledermüller, Netways ...
 
Automating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesAutomating Workflows for Analytics Pipelines
Automating Workflows for Analytics Pipelines
 
Building the Right Platform Architecture for Hadoop
Building the Right Platform Architecture for HadoopBuilding the Right Platform Architecture for Hadoop
Building the Right Platform Architecture for Hadoop
 
The Monitoring Playground
The Monitoring PlaygroundThe Monitoring Playground
The Monitoring Playground
 
Monitoring and Log Management for
Monitoring and Log Management forMonitoring and Log Management for
Monitoring and Log Management for
 
January 2011 HUG: Pig Presentation
January 2011 HUG: Pig PresentationJanuary 2011 HUG: Pig Presentation
January 2011 HUG: Pig Presentation
 
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons LearnedCeph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
 
Develop with linux containers and docker
Develop with linux containers and dockerDevelop with linux containers and docker
Develop with linux containers and docker
 
8a. How To Setup HBase with Docker
8a. How To Setup HBase with Docker8a. How To Setup HBase with Docker
8a. How To Setup HBase with Docker
 
MySQL Head-to-Head
MySQL Head-to-HeadMySQL Head-to-Head
MySQL Head-to-Head
 
Which Hypervisor is Best?
Which Hypervisor is Best?Which Hypervisor is Best?
Which Hypervisor is Best?
 
Flying Circus Ceph Case Study (CEPH Usergroup Berlin)
Flying Circus Ceph Case Study (CEPH Usergroup Berlin)Flying Circus Ceph Case Study (CEPH Usergroup Berlin)
Flying Circus Ceph Case Study (CEPH Usergroup Berlin)
 
Running High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Running High Performance & Fault-tolerant Elasticsearch Clusters on DockerRunning High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Running High Performance & Fault-tolerant Elasticsearch Clusters on Docker
 
Linux Block Cache Practice on Ceph BlueStore - Junxin Zhang
Linux Block Cache Practice on Ceph BlueStore - Junxin ZhangLinux Block Cache Practice on Ceph BlueStore - Junxin Zhang
Linux Block Cache Practice on Ceph BlueStore - Junxin Zhang
 
MySQL Head to Head Performance
MySQL Head to Head PerformanceMySQL Head to Head Performance
MySQL Head to Head Performance
 
Seastar / ScyllaDB, or how we implemented a 10-times faster Cassandra
Seastar / ScyllaDB,  or how we implemented a 10-times faster CassandraSeastar / ScyllaDB,  or how we implemented a 10-times faster Cassandra
Seastar / ScyllaDB, or how we implemented a 10-times faster Cassandra
 

En vedette

InfoLinux 04 2009
InfoLinux 04 2009InfoLinux 04 2009
InfoLinux 04 2009w0nd0
 
InfoLinux 12 2009
InfoLinux 12 2009InfoLinux 12 2009
InfoLinux 12 2009w0nd0
 
Mata kuliyah aliran teologi modern
Mata kuliyah aliran teologi modernMata kuliyah aliran teologi modern
Mata kuliyah aliran teologi modernHusain Rahim
 
Kelas xii sma ipa ips bahasa indonesia_muhammad rohmadi
Kelas xii sma ipa ips bahasa indonesia_muhammad rohmadiKelas xii sma ipa ips bahasa indonesia_muhammad rohmadi
Kelas xii sma ipa ips bahasa indonesia_muhammad rohmadiw0nd0
 
Pendidikan Bela Negara dan Ketahanan Nasional dalam rangka Pembangunan Bangsa
Pendidikan Bela Negara dan Ketahanan Nasional dalam rangka Pembangunan BangsaPendidikan Bela Negara dan Ketahanan Nasional dalam rangka Pembangunan Bangsa
Pendidikan Bela Negara dan Ketahanan Nasional dalam rangka Pembangunan BangsaDadang Solihin
 
The Snapys Brand Guidelines
The Snapys Brand Guidelines The Snapys Brand Guidelines
The Snapys Brand Guidelines The Snapys
 
Ukuran Penyebaran Data
Ukuran Penyebaran DataUkuran Penyebaran Data
Ukuran Penyebaran DataAisyah Turidho
 
Kondisi Fisik & Sosial Kota Bandung
Kondisi Fisik & Sosial Kota Bandung Kondisi Fisik & Sosial Kota Bandung
Kondisi Fisik & Sosial Kota Bandung Iqlima Pebrianti
 
Politik islam dan Sejarahnya
Politik islam dan SejarahnyaPolitik islam dan Sejarahnya
Politik islam dan SejarahnyaYusuf Darismah
 
Language Assessment - Assessing Reading Full Description with Picture and Cha...
Language Assessment - Assessing Reading Full Description with Picture and Cha...Language Assessment - Assessing Reading Full Description with Picture and Cha...
Language Assessment - Assessing Reading Full Description with Picture and Cha...EFL Learning
 
MENGANALISIS TERBENTUKNYA NKRI - SEJARAH INDONESIA BAB 5 B
MENGANALISIS TERBENTUKNYA NKRI - SEJARAH INDONESIA BAB 5 BMENGANALISIS TERBENTUKNYA NKRI - SEJARAH INDONESIA BAB 5 B
MENGANALISIS TERBENTUKNYA NKRI - SEJARAH INDONESIA BAB 5 Bcalonmayat
 
Makalah ukuran penyebaran data
Makalah ukuran penyebaran dataMakalah ukuran penyebaran data
Makalah ukuran penyebaran dataAisyah Turidho
 

En vedette (17)

InfoLinux 04 2009
InfoLinux 04 2009InfoLinux 04 2009
InfoLinux 04 2009
 
InfoLinux 12 2009
InfoLinux 12 2009InfoLinux 12 2009
InfoLinux 12 2009
 
Como hacer un curriculum
Como hacer un curriculumComo hacer un curriculum
Como hacer un curriculum
 
Asas grafik
Asas grafikAsas grafik
Asas grafik
 
Mata kuliyah aliran teologi modern
Mata kuliyah aliran teologi modernMata kuliyah aliran teologi modern
Mata kuliyah aliran teologi modern
 
Kelas xii sma ipa ips bahasa indonesia_muhammad rohmadi
Kelas xii sma ipa ips bahasa indonesia_muhammad rohmadiKelas xii sma ipa ips bahasa indonesia_muhammad rohmadi
Kelas xii sma ipa ips bahasa indonesia_muhammad rohmadi
 
Pendidikan Bela Negara dan Ketahanan Nasional dalam rangka Pembangunan Bangsa
Pendidikan Bela Negara dan Ketahanan Nasional dalam rangka Pembangunan BangsaPendidikan Bela Negara dan Ketahanan Nasional dalam rangka Pembangunan Bangsa
Pendidikan Bela Negara dan Ketahanan Nasional dalam rangka Pembangunan Bangsa
 
The Snapys Brand Guidelines
The Snapys Brand Guidelines The Snapys Brand Guidelines
The Snapys Brand Guidelines
 
Ips skk1 tugas bab 5
Ips skk1 tugas bab 5Ips skk1 tugas bab 5
Ips skk1 tugas bab 5
 
Ukuran Penyebaran Data
Ukuran Penyebaran DataUkuran Penyebaran Data
Ukuran Penyebaran Data
 
Morfologi Tanaman
Morfologi Tanaman Morfologi Tanaman
Morfologi Tanaman
 
Kondisi Fisik & Sosial Kota Bandung
Kondisi Fisik & Sosial Kota Bandung Kondisi Fisik & Sosial Kota Bandung
Kondisi Fisik & Sosial Kota Bandung
 
Politik islam dan Sejarahnya
Politik islam dan SejarahnyaPolitik islam dan Sejarahnya
Politik islam dan Sejarahnya
 
Assessing Writing
Assessing WritingAssessing Writing
Assessing Writing
 
Language Assessment - Assessing Reading Full Description with Picture and Cha...
Language Assessment - Assessing Reading Full Description with Picture and Cha...Language Assessment - Assessing Reading Full Description with Picture and Cha...
Language Assessment - Assessing Reading Full Description with Picture and Cha...
 
MENGANALISIS TERBENTUKNYA NKRI - SEJARAH INDONESIA BAB 5 B
MENGANALISIS TERBENTUKNYA NKRI - SEJARAH INDONESIA BAB 5 BMENGANALISIS TERBENTUKNYA NKRI - SEJARAH INDONESIA BAB 5 B
MENGANALISIS TERBENTUKNYA NKRI - SEJARAH INDONESIA BAB 5 B
 
Makalah ukuran penyebaran data
Makalah ukuran penyebaran dataMakalah ukuran penyebaran data
Makalah ukuran penyebaran data
 

Similaire à Hadoop enhancements using next gen IA technologies

Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance DataWorks Summit/Hadoop Summit
 
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFGestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFSUSE Italy
 
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...inside-BigData.com
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Odinot Stanislas
 
Data processing at the speed of 100 Gbps@Apache Crail (Incubating)
Data processing at the speed of 100 Gbps@Apache Crail (Incubating)Data processing at the speed of 100 Gbps@Apache Crail (Incubating)
Data processing at the speed of 100 Gbps@Apache Crail (Incubating)DataWorks Summit
 
DUG'20: 13 - HPE’s DAOS Solution Plans
DUG'20: 13 - HPE’s DAOS Solution PlansDUG'20: 13 - HPE’s DAOS Solution Plans
DUG'20: 13 - HPE’s DAOS Solution PlansAndrey Kudryavtsev
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Jim Dowling
 
Optimizing Big Data to run in the Public Cloud
Optimizing Big Data to run in the Public CloudOptimizing Big Data to run in the Public Cloud
Optimizing Big Data to run in the Public CloudQubole
 
Michael stack -the state of apache h base
Michael stack -the state of apache h baseMichael stack -the state of apache h base
Michael stack -the state of apache h basehdhappy001
 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09Chris Purrington
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopEvans Ye
 
Slide 2 collecting, storing and analyzing big data
Slide 2 collecting, storing and analyzing big dataSlide 2 collecting, storing and analyzing big data
Slide 2 collecting, storing and analyzing big dataTrieu Nguyen
 
Hadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_PlanHadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_PlanNarayana B
 
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
 Improving Apache Spark by Taking Advantage of Disaggregated Architecture Improving Apache Spark by Taking Advantage of Disaggregated Architecture
Improving Apache Spark by Taking Advantage of Disaggregated ArchitectureDatabricks
 

Similaire à Hadoop enhancements using next gen IA technologies (20)

Hadoop Primer
Hadoop PrimerHadoop Primer
Hadoop Primer
 
Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance
 
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFGestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
 
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
 
Ceph
CephCeph
Ceph
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
 
Data processing at the speed of 100 Gbps@Apache Crail (Incubating)
Data processing at the speed of 100 Gbps@Apache Crail (Incubating)Data processing at the speed of 100 Gbps@Apache Crail (Incubating)
Data processing at the speed of 100 Gbps@Apache Crail (Incubating)
 
HDInsight for Architects
HDInsight for ArchitectsHDInsight for Architects
HDInsight for Architects
 
DUG'20: 13 - HPE’s DAOS Solution Plans
DUG'20: 13 - HPE’s DAOS Solution PlansDUG'20: 13 - HPE’s DAOS Solution Plans
DUG'20: 13 - HPE’s DAOS Solution Plans
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019
 
Optimizing Big Data to run in the Public Cloud
Optimizing Big Data to run in the Public CloudOptimizing Big Data to run in the Public Cloud
Optimizing Big Data to run in the Public Cloud
 
Hadoop training in bangalore
Hadoop training in bangaloreHadoop training in bangalore
Hadoop training in bangalore
 
Michael stack -the state of apache h base
Michael stack -the state of apache h baseMichael stack -the state of apache h base
Michael stack -the state of apache h base
 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache Bigtop
 
Slide 2 collecting, storing and analyzing big data
Slide 2 collecting, storing and analyzing big dataSlide 2 collecting, storing and analyzing big data
Slide 2 collecting, storing and analyzing big data
 
Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
 
Hadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_PlanHadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_Plan
 
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
 Improving Apache Spark by Taking Advantage of Disaggregated Architecture Improving Apache Spark by Taking Advantage of Disaggregated Architecture
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
 
Hadoop and Distributed Computing
Hadoop and Distributed ComputingHadoop and Distributed Computing
Hadoop and Distributed Computing
 

Dernier

Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...HyderabadDolls
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...HyderabadDolls
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...gajnagarg
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...HyderabadDolls
 
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...vershagrag
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...gajnagarg
 
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime GiridihGiridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridihmeghakumariji156
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...SOFTTECHHUB
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...kumargunjan9515
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 

Dernier (20)

Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
 
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime GiridihGiridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 

Hadoop enhancements using next gen IA technologies

  • 1. Hadoop Enhancements Using Next- Gen Intel ® Platform Technologies Anoop Sam John – PMC member for Apache HBase Rakesh R - Committer for Apache Zookeeper and PMC member for Apache Bookkeeper
  • 2. Hadoop Enhancements Using Next-Gen Intel ® Platform Technologies Anoop Sam John Rakesh R
  • 3. About Us • Anoop Sam John • PMC member for Apache HBase and Phoenix • anoopsamjohn@apache.org • https://www.linkedin.com/in/anoopsamjohn • Rakesh R • Committer for Apache ZooKeeper and BookKeeper • Apache Hadoop contributor • rakeshr@apache.org • https://www.linkedin.com/in/rakeshadr
  • 4. Agenda • Intel Enhancements on Hadoop platform • HDFS  Erasure coding using ISA-L library  Encryption using AES-NI • HBase  Go Big Cache
  • 5. HDFS – Distributed FileSystem “Between the birth of the world and 2003, there were 5 Exabytes of information created. We now create 5 Exabytes every two days.” Eric Schmidt, Executive Chairman of Alphabet, Inc.
  • 6. HDFS – Current Replication Strategy • Inherits 3-way replication from Google File System to increase data availability - 3x storage overhead • Expensive for, - Massive amount of data - Geo-distributed data recovery Datanode1 r1 Datanode2 r2 DFSClient r3 Rack-1 Rack-2 3X replication
  • 7. HDFS – Erasure Coding • k data blocks + m parity blocks (k + m)  Example: Reed-Solomon 6 + 3 • Save disk space • 1.5x storage overhead X Y X Y 0 0 0 0 1 1 1 0 1 1 1 0 data bits parity bits Sample codec library (XOR based) b1 b2 b3 b4 b5 b6 b7 b8 b9 6 data blocks 3 parity blocks D1 D2 D3 D4 D5 D6 D7 D8 D9
  • 8. Durability & Efficiency 3-way data replication Erasure coding : RS – (6,3) Data Durability 2 3 Storage efficiency 1/3 (33.33%) 6/9 (67%) Data Durability = How many simultaneous failures can be tolerated? Storage Efficiency = How much portion of storage is for useful data? useful data extra data useful data Datanode1 Datanode2 Datanode3 Replica1 Replica2 Replica3 redundant data 3-way data replication D1 D2 D3 D4 D5 D6 D7 D8 D9 b1 b2 b3 b4 b5 b6 b7 b8 b9 6 data blocks 3 parity blocks RS-(6,3) Erasure coding • Released version – Apache Hadoop 3.0.0-alpha1
  • 9. Microbenchmark : Codec Calculation MBpersecond Image courtesy Cloudera • New Intel architecture solutions for storage (ISA-L) • Intel® Intelligent Storage Acceleration Library provides a solution to deploy EC with better performance. https://01.org/intel%C2%AE-storage-acceleration-library-open-source-version
  • 11. HDFS-Encryption • Sensitivity of the data and managing privacy of the data is very important for the big data analytics • Encryption is a regulatory requirement for many business sectors - Finance - Government - Healthcare etc. DFSClient Per-file key operations Data opsRead/Write encrypted data KMS HDFS Cluster Encryption key ops data at-rest Disk Encryption library • Released version – Apache Hadoop 2.6.0
  • 12. Encryption Algorithm • Data encryption/decryption is costlier • Encryption ciphers.  AES-CTR (Advanced Encryption Standard - Counter Mode) is most popular Either 128 or 192 or 256 bit keys
  • 13. Encryption AES-CTR • Two implementations of AES-CTR 1. JCE (Java Cryptography Extension) software implementation 2. OpenSSL hardware accelerated AES-NI (Intel ® Advanced Encryption Standard New Instructions) implementation AES-NI available in Westmere(2010) and newer Intel CPUs AES-NI further optimized in Haswell(2013) https://software.intel.com/en-us/articles/intel-advanced-encryption-standard-instructions-aes-ni
  • 14. Microbenchmark: Encrypt/Decrypt 1GB Byte array Test Environment:  Run locally on a single Haswell machine  Single threaded, excluded HDFS overheads(checksumming, network, copies) Image courtesy Cloudera
  • 15. Apache Commons Crypto • Cryptographic layer is incubated as new Apache component http://commons.apache.org/proper/commons-crypto/ • Apache Commons crypto was integrated with Apache Spark as well for shuffle encryption.
  • 16. HBase
  • 17. HBase • NoSQL database in Hadoop eco • Accumulates writes in memory and flushes to HDFS • Caches data • Reduced read latency • Better read throughput • Memory hungry processes
  • 18. Big Memory • Hadoop platforms no longer only for commodity hardware. • Systems moving towards faster CPU and bigger memories Big Data => Big Storage + Big Memory • Non Volatile memory technology • 3D XPoint™ DIMMS from Intel® • Higher memory capability • Lower cost vs DDR
  • 19. HBase – Go Big Cache Data Data Data Data Data Data HBase JVM Offheap memory Client HDFS Cache Reads Reads • JVM GC tuning continues to be a challenge with larger heaps (new GC algos) • Much bigger sized cache in offheap memory for faster random reads • Better predictable latency • Building blocks for supporting 3D XPoint™ products
  • 20. HBase – Go Big Cache Performance before offheaping Performance after offheaping Image courtesy Alibaba Inc. • Alibaba adopted this feature for their 1600 node cluster. • Used in double 11 online sale
  • 23.
  • 25. Microbenchmark : Codec CalculationMBperseconds
  • 26. Microbenchmark : Codec CalculationMBperseconds