Soumettre la recherche
Mettre en ligne
Introduction to HBase
•
0 j'aime
•
329 vues
Apekshit Sharma
Suivre
Presented at Java One 2015
Lire moins
Lire la suite
Logiciels
Signaler
Partager
Signaler
Partager
1 sur 75
Télécharger maintenant
Télécharger pour lire hors ligne
Recommandé
Introduction to HBase - NoSqlNow2015
Introduction to HBase - NoSqlNow2015
Apekshit Sharma
Apache HBase: State of the Union
Apache HBase: State of the Union
DataWorks Summit/Hadoop Summit
HBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and Spark
HBaseCon
Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3
Manish Chopra
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop Management
DataWorks Summit/Hadoop Summit
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
DataWorks Summit/Hadoop Summit
Advanced Security In Hadoop Cluster
Advanced Security In Hadoop Cluster
Edureka!
The Heterogeneous Data lake
The Heterogeneous Data lake
DataWorks Summit/Hadoop Summit
Recommandé
Introduction to HBase - NoSqlNow2015
Introduction to HBase - NoSqlNow2015
Apekshit Sharma
Apache HBase: State of the Union
Apache HBase: State of the Union
DataWorks Summit/Hadoop Summit
HBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and Spark
HBaseCon
Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3
Manish Chopra
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop Management
DataWorks Summit/Hadoop Summit
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
DataWorks Summit/Hadoop Summit
Advanced Security In Hadoop Cluster
Advanced Security In Hadoop Cluster
Edureka!
The Heterogeneous Data lake
The Heterogeneous Data lake
DataWorks Summit/Hadoop Summit
Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)
Wei-Chiu Chuang
Hadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the field
Uwe Printz
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
DataWorks Summit
Applications on Hadoop
Applications on Hadoop
markgrover
HBase Backups
HBase Backups
HBaseCon
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)
Todd Lipcon
Empower Data-Driven Organizations with HPE and Hadoop
Empower Data-Driven Organizations with HPE and Hadoop
DataWorks Summit/Hadoop Summit
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
DataWorks Summit
Architecting Applications with Hadoop
Architecting Applications with Hadoop
markgrover
HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014
larsgeorge
Apache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration story
Sunil Govindan
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config Guide
Douglas Bernardini
Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux Configuration
Alex Moundalexis
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoop
markgrover
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application Meetup
Mike Percy
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
Uwe Printz
Tales from the Cloudera Field
Tales from the Cloudera Field
HBaseCon
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBaseCon
Meet hbase 2.0
Meet hbase 2.0
enissoz
Introducing Kudu
Introducing Kudu
Jeremy Beard
Cassandra nice use cases and worst anti patterns
Cassandra nice use cases and worst anti patterns
Duyhai Doan
Cassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patterns
Dave Gardner
Contenu connexe
Tendances
Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)
Wei-Chiu Chuang
Hadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the field
Uwe Printz
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
DataWorks Summit
Applications on Hadoop
Applications on Hadoop
markgrover
HBase Backups
HBase Backups
HBaseCon
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)
Todd Lipcon
Empower Data-Driven Organizations with HPE and Hadoop
Empower Data-Driven Organizations with HPE and Hadoop
DataWorks Summit/Hadoop Summit
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
DataWorks Summit
Architecting Applications with Hadoop
Architecting Applications with Hadoop
markgrover
HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014
larsgeorge
Apache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration story
Sunil Govindan
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config Guide
Douglas Bernardini
Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux Configuration
Alex Moundalexis
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoop
markgrover
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application Meetup
Mike Percy
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
Uwe Printz
Tales from the Cloudera Field
Tales from the Cloudera Field
HBaseCon
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBaseCon
Meet hbase 2.0
Meet hbase 2.0
enissoz
Introducing Kudu
Introducing Kudu
Jeremy Beard
Tendances
(20)
Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the field
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
Applications on Hadoop
Applications on Hadoop
HBase Backups
HBase Backups
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)
Empower Data-Driven Organizations with HPE and Hadoop
Empower Data-Driven Organizations with HPE and Hadoop
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Architecting Applications with Hadoop
Architecting Applications with Hadoop
HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014
Apache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration story
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config Guide
Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux Configuration
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoop
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application Meetup
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
Tales from the Cloudera Field
Tales from the Cloudera Field
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region Replicas
Meet hbase 2.0
Meet hbase 2.0
Introducing Kudu
Introducing Kudu
En vedette
Cassandra nice use cases and worst anti patterns
Cassandra nice use cases and worst anti patterns
Duyhai Doan
Cassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patterns
Dave Gardner
A Survey of HBase Application Archetypes
A Survey of HBase Application Archetypes
HBaseCon
Unique ID generation in distributed systems
Unique ID generation in distributed systems
Dave Gardner
Etsy Activity Feeds Architecture
Etsy Activity Feeds Architecture
Dan McKinley
Introduction to Redis
Introduction to Redis
Dvir Volk
En vedette
(6)
Cassandra nice use cases and worst anti patterns
Cassandra nice use cases and worst anti patterns
Cassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patterns
A Survey of HBase Application Archetypes
A Survey of HBase Application Archetypes
Unique ID generation in distributed systems
Unique ID generation in distributed systems
Etsy Activity Feeds Architecture
Etsy Activity Feeds Architecture
Introduction to Redis
Introduction to Redis
Similaire à Introduction to HBase
Introduction to Apache Kudu
Introduction to Apache Kudu
Jeff Holoman
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
huguk
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
Swiss Big Data User Group
Strata London 2019 Scaling Impala.pptx
Strata London 2019 Scaling Impala.pptx
Manish Maheshwari
Introducing Apache Kudu (Incubating) - Montreal HUG May 2016
Introducing Apache Kudu (Incubating) - Montreal HUG May 2016
Mladen Kovacevic
Building data pipelines with kite
Building data pipelines with kite
Joey Echeverria
Kudu: Resolving Transactional and Analytic Trade-offs in Hadoop
Kudu: Resolving Transactional and Analytic Trade-offs in Hadoop
jdcryans
Strata London 2019 Scaling Impala
Strata London 2019 Scaling Impala
Manish Maheshwari
Simplifying Hadoop: A Secure and Unified Data Access Path for Computer Framew...
Simplifying Hadoop: A Secure and Unified Data Access Path for Computer Framew...
Dataconomy Media
Cloudera Operational DB (Apache HBase & Apache Phoenix)
Cloudera Operational DB (Apache HBase & Apache Phoenix)
Timothy Spann
Hive spark-s3acommitter-hbase-nfs
Hive spark-s3acommitter-hbase-nfs
Yifeng Jiang
Hadoop Operations
Hadoop Operations
Cloudera, Inc.
[B4]deview 2012-hdfs
[B4]deview 2012-hdfs
NAVER D2
Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016
StampedeCon
HBase: Just the Basics
HBase: Just the Basics
HBaseCon
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Cloudera, Inc.
End to End Streaming Architectures
End to End Streaming Architectures
Cloudera, Inc.
Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...
Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...
Dataconomy Media
Apache Hadoop 3
Apache Hadoop 3
Cloudera, Inc.
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014
Jonathan Seidman
Similaire à Introduction to HBase
(20)
Introduction to Apache Kudu
Introduction to Apache Kudu
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
Strata London 2019 Scaling Impala.pptx
Strata London 2019 Scaling Impala.pptx
Introducing Apache Kudu (Incubating) - Montreal HUG May 2016
Introducing Apache Kudu (Incubating) - Montreal HUG May 2016
Building data pipelines with kite
Building data pipelines with kite
Kudu: Resolving Transactional and Analytic Trade-offs in Hadoop
Kudu: Resolving Transactional and Analytic Trade-offs in Hadoop
Strata London 2019 Scaling Impala
Strata London 2019 Scaling Impala
Simplifying Hadoop: A Secure and Unified Data Access Path for Computer Framew...
Simplifying Hadoop: A Secure and Unified Data Access Path for Computer Framew...
Cloudera Operational DB (Apache HBase & Apache Phoenix)
Cloudera Operational DB (Apache HBase & Apache Phoenix)
Hive spark-s3acommitter-hbase-nfs
Hive spark-s3acommitter-hbase-nfs
Hadoop Operations
Hadoop Operations
[B4]deview 2012-hdfs
[B4]deview 2012-hdfs
Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016
HBase: Just the Basics
HBase: Just the Basics
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
End to End Streaming Architectures
End to End Streaming Architectures
Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...
Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...
Apache Hadoop 3
Apache Hadoop 3
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014
Dernier
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
Jittipong Loespradit
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
Mind IT Systems
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
kalichargn70th171
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
masabamasaba
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
Jhone kinadey
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
Willy Marroquin (WillyDevNET)
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Delhi Call girls
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
masabamasaba
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
ayushiqss
Pharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodology
Anusha Are
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Steffen Staab
The title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
shikhaohhpro
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
ICS
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
alwaysnagaraju26
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
Delhi Call girls
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
ComplianceQuest1
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
Delhi Call girls
Dernier
(20)
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
Pharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodology
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
The title is not connected to what is inside
The title is not connected to what is inside
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
Introduction to HBase
1.
1© Cloudera, Inc.
All rights reserved. Apache HBase: Overview, Hands-‐On, and Use Cases Apekshit Sharma Esteban Gutierrez
2.
2© Cloudera, Inc.
All rights reserved. Apekshit Sharma • Distributed Software Engineer, Cloudera • Software Engineer, Google • Apache HBase contributor • Performance improvements and configuration framework Esteban Gutierrez • Software Engineer, Cloudera • Apache HBase committer • +15 years of experience on Internet scale platforms. Who we are
3.
3© Cloudera, Inc.
All rights reserved. Contents • Motivation • Introduction to Apache HBase • Data model • Hands-‐On: Installation, HBase shell • A more in-‐depth introduction to Apache HBase • Apache Hadoop HDFS • HBase Architecture
4.
4© Cloudera, Inc.
All rights reserved. Motivation http://www.internetlivestats.com/total-‐number-‐of-‐websites/
5.
5© Cloudera, Inc.
All rights reserved. Motivation • Indexing the internet has challenges: • Scale • Volume • Rate • Diversity of content • URLs • High-‐resolution images • Video
6.
6© Cloudera, Inc.
All rights reserved. Motivation
7.
7© Cloudera, Inc.
All rights reserved. • What if you’re not trying to index the internet? Motivation
8.
8© Cloudera, Inc.
All rights reserved. • Data for analytical processing • User-‐facing real-‐time platforms Motivation
9.
9© Cloudera, Inc.
All rights reserved. Apache HBase • Open source • Random access, Low latency and Consistent data store • Horizontally scalable • Built on top of Apache Hadoop
10.
10© Cloudera, Inc.
All rights reserved. Apache HBase is open source • Apache 2.0 License • A community project with committers and contributors from diverse organizations • Cloudera, Facebook, Salesforce.com, Huawei, TrendMicro, eBay, HortonWorks, Intel, Twitter, … • Code license means anyone can modify and use the code.
11.
11© Cloudera, Inc.
All rights reserved. •CAP theorem •Consistency •Availability •Partition tolerance Apache HBase is consistent HBase
12.
12© Cloudera, Inc.
All rights reserved. • Adding more servers linearly increases performance and capacity • Storage capacity • Input/output operations • Store and access data on commodity servers • Largest cluster: > 3000 nodes, > 100 PB • Average cluster: 10-‐40 nodes, 100-‐400TB Apache HBase is horizontally scalable 0 100 200 300 400 500 600 Performance (IOPs/Storage/Throughput) # of servers
13.
13© Cloudera, Inc.
All rights reserved. • Commodity servers (circa 2015) • 12-‐24 2-‐4TB hard disks • 2 (quad/hexa/octa)-‐core CPUs, 2-‐3 GHz • 64 -‐ 512 GBs of RAM • 10 Gbps ethernet • $5,000 -‐ $10,000 / machine Apache HBase is horizontally scalable
14.
14© Cloudera, Inc.
All rights reserved. Data model Row key info:name info:age comp:base comp:stocks 121 ‘tom’ ‘28’ ‘125k’ 145 ‘bob’ ‘32’ ‘110k’ ‘50’ (ts=2012) ‘100’ (ts=2014) Columns <Column family>:<column qualifier> Cells Row keys
15.
15© Cloudera, Inc.
All rights reserved. Data model • Data is stored… in a big table • Sorted rows with primary row keys. Supports billions of rows. • Column : <column family>:<column qualifier>; Supports millions of columns • Cell : intersection of row and column. • Can be empty. No storage/processing overheads • Can have different time-‐stamped values
16.
16© Cloudera, Inc.
All rights reserved. Hands On Apache HBase installation Play with HBase shell http://pastebin.com/60EUN75f
17.
17© Cloudera, Inc.
All rights reserved. Understanding basic architecture of Hadoop (HDFS)
18.
18© Cloudera, Inc.
All rights reserved. Apache Hadoop Open source Commodity servers Horizontally scalable Highly fault-‐tolerant Massive processing power
19.
19© Cloudera, Inc.
All rights reserved. Apache Hadoop MapReduce + YARN 2 Core Components HDFS (Hadoop Distributed File System)
20.
20© Cloudera, Inc.
All rights reserved. History 2003
21.
21© Cloudera, Inc.
All rights reserved. • Distributed file system • Commodity servers • horizontally scalable • Highly fault-‐tolerant • Proprietary GFS
22.
22© Cloudera, Inc.
All rights reserved. • Distributed file system • Commodity servers • Horizontally scalable • Highly fault-‐tolerant • Open source HDFS
23.
23© Cloudera, Inc.
All rights reserved. HDFS API • File • Open, Close, Read, Write, Move, etc • Directories • Create, Delete, etc • Permissions • Owners, Groups, rwx permissions
24.
24© Cloudera, Inc.
All rights reserved. Basic Architecture of HDFS
25.
25© Cloudera, Inc.
All rights reserved. File B1 B2 B3 File system will split the file into blocks DiskB1 B2 B3 Local file system
26.
26© Cloudera, Inc.
All rights reserved. DataNode 1 DataNode 2 DataNode 3 DataNode 4 HDFS File distributed across machines B1 B2 B3
27.
27© Cloudera, Inc.
All rights reserved. B1 DataNode 1 B2 DataNode 2 B3 DataNode 3 DataNode 4 HDFS DataNode
28.
28© Cloudera, Inc.
All rights reserved. DataNode 1 DataNode 2 DataNode 3 DataNode 4 HDFS NameNode NameNode B1 B2 B3
29.
29© Cloudera, Inc.
All rights reserved. DataNode 1 DataNode 2 DataNode 3 DataNode 4 HDFS Reading a file NameNode B1 B2 B3 Client 1. File ‘foo’ 2. Verify client has permissions to read the file 3. List of foo’s bocks and datanodes
30.
30© Cloudera, Inc.
All rights reserved. DataNode 1 DataNode 2 DataNode 3 DataNode 4 HDFS Fault tolerance NameNode B1 B2 B3
31.
31© Cloudera, Inc.
All rights reserved. DataNode 1 DataNode 2 DataNode 3 DataNode 4 HDFS Redundancy NameNode B1 B2 B3B1 B1B2 B2 B3B3
32.
32© Cloudera, Inc.
All rights reserved. DataNode 1 DataNode 2 DataNode 3 DataNode 4 HDFS Horizontal Scalability NameNode B1 B2 B3B1 B1B2 B2 B3B3 DataNode 5
33.
33© Cloudera, Inc.
All rights reserved. Let’s look at some existing HDFS systems...
34.
34© Cloudera, Inc.
All rights reserved. • Yahoo! HDFS Clusters 40k+ servers, 100k+ CPUs, 450PB data • Spam filtering, web indexing, personaliztion, etc. • Facebook HDFS Cluster 15TB new data per day 1200+ machines, 30PB in one cluster • Datawarehouse, Messages, etc.
35.
35© Cloudera, Inc.
All rights reserved. • 1-‐10PB HDFS Clusters Production critical workloads for websites, retail, finance, telcom, government, research … • Sub PB HDFS Clusters PoCs, R&D, production in early stages …
36.
36© Cloudera, Inc.
All rights reserved. But…. there are restrictions! It’s not a magic wand!
37.
37© Cloudera, Inc.
All rights reserved. Files are append only • Access Model : Write-‐once-‐read-‐many • Can not change existing contents
38.
38© Cloudera, Inc.
All rights reserved. Not designed for small files • HDFS block sizes are in MB (default 128MB) • Designed for typical GBs / TBs of file sizes • Normal files system have 4kb block size! • Name node becomes bottleneck because metadata grows too big
39.
39© Cloudera, Inc.
All rights reserved. Summary HDFS is a great distributed file system! • Store massive data • Scalable • High throughput • Fault tolerance
40.
40© Cloudera, Inc.
All rights reserved. MapReduce • Distributed processing framework • Commodity machines • Fault tolerance
41.
41© Cloudera, Inc.
All rights reserved. MapReduce • HBase uses and extends MR APIs for data access Further reading: http://hadoop.apache.org
42.
42© Cloudera, Inc.
All rights reserved.
43.
43© Cloudera, Inc.
All rights reserved. HBase Architecture Unique id Name price weight store1 store2 store3 “1000000” snickers $9.99 4 Oz Yes Yes Yes “3000000” almonds $9.99 8 Oz Yes No Yes “8000000” coke $9.99 16 Oz Yes Yes Yes
44.
44© Cloudera, Inc.
All rights reserved. HBase Architecture Unique id Name price weight store1 store2 store3 “1000000” snickers $9.99 4 Oz Yes Yes Yes “3000000” almonds $9.99 8 Oz Yes No Yes “8000000” coke $9.99 16 Oz Yes Yes Yes “4000000” new $34.63 16 Oz No Yes Yes
45.
45© Cloudera, Inc.
All rights reserved. HBase Architecture Unique id Name price weight store1 store2 store3 “1000000” snickers $9.99 4 Oz Yes Yes Yes “3000000” almonds $9.99 8 Oz Yes No Yes “8000000” coke $9.99 16 Oz Yes Yes Yes “4000000” foo $34.63 16 Oz No Yes Yes “5000000” bar $22.54 16 Oz Yes Yes Yes
46.
46© Cloudera, Inc.
All rights reserved. HBase Architecture Unique id Name price weight store1 store2 store3 “1000000” snickers $9.99 4 Oz Yes Yes Yes “3000000” almonds $9.99 8 Oz Yes No Yes “8000000” coke $9.99 16 Oz Yes Yes Yes “4000000” foo $34.63 16 Oz No Yes Yes “5000000” bar $22.54 16 Oz Yes Yes Yes “9000000” new1 $2.5 16 Oz Yes Yes Yes “7000000” new2 $6.4 16 Oz Yes Yes Yes “2000000” new3 $6.4 16 Oz Yes Yes Yes
47.
47© Cloudera, Inc.
All rights reserved. Row Key Name brand price weight store1 store2 store3 “1000000” snickers xxx $9.99 4 Oz Yes Yes Yes “2000000” new3 xxx $6.4 16 Oz Yes Yes Yes “3000000” almonds xxx $9.99 8 Oz Yes No Yes “4000000” foo xxx $34.63 16 Oz No Yes Yes Row Key Name brand price weight store1 store2 store3 “5000000” bar xxx $22.54 16 Oz Yes Yes Yes “7000000” new2 xxx $6.4 16 Oz Yes Yes Yes “8000000” coke xxx $9.99 16 Oz Yes Yes Yes “9000000” new1 xxx $2.5 16 Oz Yes Yes Yes [ “”, “5000000”) [ “5000000”, “”) HBase Architecture | Regions
48.
48© Cloudera, Inc.
All rights reserved. Row Key Name price weight …. “1000000” snickers $9.99 4 Oz …. “2000000” new3 $6.4 16 Oz …. “3000000” almonds $9.99 8 Oz …. “4000000” foo $34.63 16 Oz …. Row Key Name price weight …. “5000000” bar $22.54 16 Oz …. “7000000” new2 $6.4 16 Oz …. “8000000” coke $9.99 16 Oz …. “9000000” new1 $2.5 16 Oz …. HBase Architecture | RegionServer Server 12 Server 7
49.
49© Cloudera, Inc.
All rights reserved. HBase Architecture | Regions • Horizontal split of tables • Regions are served by RegionServers • Automatically (based on configurations). Can be done manually too. Relationship: • A Region is only served by a single RegionServer at a time. • A RegionServer can serve multiple Regions. Region Server X Region ["" , 10) Region [40, 50) Region Server Y Region [10, 30) Region [30, 40) Region [50 "")
50.
50© Cloudera, Inc.
All rights reserved. Row Key Name price weight store1 store2 store3 “1000000” snickers $9.99 4 Oz Yes Yes Yes “2000000” new3 $6.4 16 Oz Yes Yes Yes “3000000” almonds $9.99 8 Oz Yes No Yes “4000000” foo $34.63 16 Oz No Yes Yes “5000000” bar $22.54 16 Oz Yes Yes Yes “7000000” new2 $6.4 16 Oz Yes Yes Yes “8000000” coke $9.99 16 Oz Yes Yes Yes “9000000” new1 $2.5 16 Oz Yes Yes Yes HBase Architecture What if you have many more columns?
51.
51© Cloudera, Inc.
All rights reserved. Row Key info: name info: price info: weight availability: store1 availability: store2 availability: store3 “1000000” snickers $9.99 4 Oz Yes Yes Yes “2000000” new3 $6.4 16 Oz Yes Yes Yes “3000000” almonds $9.99 8 Oz Yes No Yes “4000000” foo $34.63 16 Oz No Yes Yes “5000000” bar $22.54 16 Oz Yes Yes Yes “7000000” new2 $6.4 16 Oz Yes Yes Yes “8000000” coke $9.99 16 Oz Yes Yes Yes “9000000” new1 $2.5 16 Oz Yes Yes Yes HBase Architecture | Column family
52.
52© Cloudera, Inc.
All rights reserved. HBase Architecture | Column family • Set of columns that have similar access patterns • Data stored in separate files • Tune performance • In-‐memory • Compression • Version retention policies • Cache priority • Needs to be specified by the user
53.
53© Cloudera, Inc.
All rights reserved. HBase Architecture info: name info: price info: weight “1000000” snickers $9.99 4 Oz “2000000” new3 $6.4 16 Oz “3000000” almonds $9.99 8 Oz available: store1 available: store2 available: store3 “1000000” Yes Yes Yes “2000000” Yes Yes Yes “3000000” Yes No Yes Region Column Family Column Family
54.
54© Cloudera, Inc.
All rights reserved. HBase Architecture | Put 1. Client creates a row to put. 2. Client checks which RegionServer hosts this row 3. Row is written into write-‐ahead log (WAL) on disk for persistence 4. Row is written to MemStore (in memory storage)
55.
55© Cloudera, Inc.
All rights reserved. HBase Architecture | Put RegionServer 2 Put Region Column Family Column Family MemStore (RAM) MemStore (RAM) WAL (disk)
56.
56© Cloudera, Inc.
All rights reserved. HBase Architecture | HFile Region RegionServer 2 Put Column Family Column Family FlushFlush MemStore (RAM) MemStore (RAM) WAL (disk) On Disk On Disk HFiles
57.
57© Cloudera, Inc.
All rights reserved. HBase Architecture | HFile • As more rows are added to the table • Write to WAL • Write to MemStore • When MemStore gets full, its contents are flushed to HDFS as Hfiles
58.
58© Cloudera, Inc.
All rights reserved. HBase Architecture | HFile Region RegionServer 2 Column Family Column Family MemStore (RAM) MemStore (RAM) On Disk On Disk • Side effect of more HFiles in a Region • More disk seeks on read • More disk usage (overwritten value, deletes, etc)
59.
59© Cloudera, Inc.
All rights reserved. HBase Architecture | Compactions Merging multiple Hfiles into a single Hfile. • Minor compactions • Major compactions
60.
60© Cloudera, Inc.
All rights reserved. HBase Architecture Region RegionServer 2 Column Family Column Family On Disk On Disk MemStore (RAM) MemStore (RAM)
61.
61© Cloudera, Inc.
All rights reserved. HBase Architecture Region RegionServer 2 Column Family Column Family On Disk On Disk MemStore (RAM) MemStore (RAM)
62.
62© Cloudera, Inc.
All rights reserved. HBase Architecture Region RegionServer 2 Column Family Column Family On Disk On Disk MemStore (RAM) MemStore (RAM)
63.
63© Cloudera, Inc.
All rights reserved. HBase Architecture | Compactions • Minor compactions • Can be finely tuned using multiple configurations • Controlled by policy (pluggable) • Major compactions • Periodic (set from configuration) or manually triggered. • Tend to be run during off-‐peak times.
64.
64© Cloudera, Inc.
All rights reserved. HBase Architecture | Splits • Eventually, Regions become imbalanced. • HBase can split a Region into two. • Daughter Regions are can be moved to a different RegionServer, if necessary. Gives HBase horizontal scalability!
65.
65© Cloudera, Inc.
All rights reserved. HBase Architecture | Splits Region RegionServer 2 10 20 30 40 50 60 70 80 90 100 • Region has rows [0, 100) • Splitting • Middle of current Region’s row range • All column families are split
66.
66© Cloudera, Inc.
All rights reserved. HBase Architecture | Splits Region RegionServer 2 10 20 30 40 50 • Splits • [0, 50) • [50, 100) 60 70 80 90 100
67.
67© Cloudera, Inc.
All rights reserved. HBase Architecture | Bulk Import • This write path is durable, but if you’re importing a lot of data • Every put goes into WAL, which means disk seeks. Lots of puts mean lots of disk seeks. • Lots of data into MemStores means lots of flushing to disk. • Lots of flushing to disk might mean lots of compactions. So we have other better options like….
68.
68© Cloudera, Inc.
All rights reserved. HBase Architecture | Bulk Loading • Bypass conventional write path • Extract data from source • Transform data into HFiles (done with MapReduce job) directly • Tell RegionServers to serve these HFiles
69.
69© Cloudera, Inc.
All rights reserved. HBase Architecture | APIs • Java API • Most full-‐featured. • REST API • Easily accessible. • Thrift API • Support for many languages (e.g. C, C++, Perl, Ruby, Python).
70.
70© Cloudera, Inc.
All rights reserved. Enough of Architecture
71.
71© Cloudera, Inc.
All rights reserved. That’s all folks!
72.
72© Cloudera, Inc.
All rights reserved.10/17/14 Strata+Hadoop world 2014. George and Hsieh Try Hadoop Now cloudera.com/live
73.
73© Cloudera, Inc.
All rights reserved.10/17/14 Strata+Hadoop world 2014. George and Hsieh Join the Discussion Get community help or provide feedback cloudera.com/community
74.
74© Cloudera, Inc.
All rights reserved. Sources • A Survey of HBase Application Archetypes • Lars George, Jon Hsieh • http://www.slideshare.net/HBaseCon/case-‐studies-‐session-‐7 • Hadoop and HBase: Motivations, Use cases and Trade-‐offs • Jon Hsieh’s slides
75.
75© Cloudera, Inc.
All rights reserved. Questions ?
Télécharger maintenant