SlideShare une entreprise Scribd logo
1  sur  24
Apache HBase Internals
you Hoped you Never
Needed to Understand
Josh Elser
Future of Data, NYC
2016/10/11
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Engineer at Hortonworks, Member of the Apache Software Foundation
Top-Level Projects
• Apache Accumulo®
• Apache Calcite™
• Apache Commons ™
• Apache HBase ®
• Apache Phoenix ™
ASF Incubator
• Apache Fluo ™
• Apache Gossip ™
• Apache Pirk ™
• Apache Rya ™
• Apache Slider ™
These Apache project names are trademarks or registered
trademarks of the Apache Software Foundation.
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache HBase for storing your data!
CC BY 3.0 US: http://hbase.apache.org/
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
What happens when things go wrong?
CC BY-ND 2.0: https://www.flickr.com/photos/widnr/6588151679
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
The BigTable Architecture
 BigTable’s architecture is simple
 Debugging a distributed system is not simple
 How can we break down a complex system?
 How do we write resilient software?
• Log-Structured Merge Tree
• Write-Ahead Logs
• Distributed Coordination
• Row-based, Auto-Sharding
• Strong Consistency
• Read Isolation
• Coprocessors
• Security (AuthN/AuthZ)
• Backups
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Naming Conventions
 Servers
– Hostname, Port, and Timestamp
– RegionServer: r01n01.domain.com,16201,1475691463147
– Master: r02n01.domain.com,16000,1475691462616
 Regions
– Table, Start RowKey, Region ID (timestamp), Replica ID, Encoded name
– T1,x04x00x00,1470324608597.c04d94cd4ee9797da2fb906b4dcd2e3c.
– Or simply c04d94cd4ee9797da2fb906b4dcd2e3c
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Regions
 A sorted “shard” of a table
 At least one “column family”
– Physical partitions
 Each family can have zero to many files
 Hosted by at most one RegionServer
– Can have many hosting RS’s for reads
 In-memory locks for certain intra-row operations
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Region Assignment
 Coordinated by the HBase Master
 A Region must only be hosted by one RegionServer
 State tracked in hbase:meta
– hbck to fix issues
 Region splits/merges make a hard problem even harder
 Moving towards ProcedureV2
Closed Offline Opening OpenPending Open
Normal Region Assignment States
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
The File System
 HDFS “Compatible”
– Distributed, durable, ”write leases”
 Physical storage of HBase Tables (HFiles)
 Write-ahead logs
 A parent directory in that FileSystem (hbase.rootdir)
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
The File System
Physical Separation by HBase Namespace
/hbase/data/
/hbase/data/default/<table1>
/hbase/data/default/.tabledesc/.tableinfo…
/hbase/data/default/<table2>/<region_id1>
/hbase/data/default/<table2>/<region_id2>
/hbase/data/my_custom_ns/<table3>/…
/hbase/data/hbase/meta/…
/hbase/archive/…
/hbase/WALs/<regionserver_name>/…
/hbase/oldWALs/…
/hbase/corrupt/…
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
The File System for one Region
/hbase/data/default/<table2>/<region_id1>
…/.regioninfo
…/.tmp
…/<family1>/<hfile>
…/<family1>/<hfile>
…/<family2>/<hfile>
…/<family3>/<hfile>
…/recovered.edits/<number>.seqid
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Writes into HBase
 Mutations inserted into sorted in-memory structure and WAL
– Fast lookups of recent data
– Append-only log for durability and speed
 Mutations are collected by destination Region
 Beware of hot-spotting
 Data in memory eventually flush’ed into sorted (H)files
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Compactions and Flushes
 Flush: Taking Key-Values from the In-Memory map and creating an HFile
 Minor Compaction: Rewriting a subset of HFiles for a Region into one HFile
 Major Compaction: Rewriting all HFiles for a Region into one HFile
 Compactions balance improved query performance with cost of rewriting data
– Compactions are good!
– Must understand SLA’s to properly tune compactions
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Reads into HBase
 Merge-Sort over multiple streams of data
– Memory
– Disk (many files)
 hbase:meta is the definitive source of where to find Regions
RowKey Region
hbase:meta
RegionServer
ZooKeeper
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache ZooKeeper™
 Distributed coordination is really hard
 Obvious use cases
– Service Discovery
– Cluster Membership
– “Root Table”
 Non-obvious use cases
– Assignment (sometimes)
– Region Recovery
– WAL Splitting
– Cluster Replication
– Distributed Procedures
– HBase Snapshots
Apache ZooKeeper is a trademark of the Apache Software Foundation
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache ZooKeeper™
 Discovery/Leader ZNodes
– /hbase/rs/…
– /hbase/master/…
– /hbase/backup-masters/…
 Consensus
– /hbase/splitWAL/…
– /hbase/flush-table-proc/...
– /hbase/table-lock/...
– /hbase/region-in-transition/...
– /hbase/recovering-regions/...
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Distributed Procedures
 Resiliency in an unreliable system
– How do we create a table?
 “Procedure V2”
– Resilient, finite state machine
 HBase operations represented as
”procedures”
 Clients are agnostic of Master state
– Clients track procedure state
https://issues.apache.org/jira/secure/attachment/12679960/ProcedureV2.pdf
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Distributed Procedures
 Procedures are durable via Write-Ahead Log
– /hbase/MasterProcWALs/…
 Procedures only executed by the active HBase Master
 Reusable framework for the future
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HBase RPCs
 Internal and External HBase
Communication
 Half-Sync/Half-Async Model
 Many knobs to tweak
 Listener
 Readers
 Scheduler
 Call Queues
 Call Runners/Handlers
Overview Components
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HBase RPCs
L
i
s
t
e
n
e
r
Reader
Reader
Reader
Reader
S
c
h
e
d
u
l
e
r
Call Queues Handlers
Priority
Read
Write
Replication
Request to Execution
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Disaster Recovery
 Multiple tools to ensure copies of data in the face of catastrophic failure
 CopyTable
– MapReduce job which reads all data from a source, writing to destination
 Snapshots
– A collection of Regions, their HFiles, and metadata
 Backup & Restore
– HBASE-7912, current targeted for HBase-2.0.0
– Incremental and full backup/restore
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Kerberos
 Strong authentication for untrusted networks
 ”Standard” across Apache Hadoop and friends
 Requirements:
– Forward/Reverse DNS
– Unlimited Strength Java Cryptography Extension
 SASL used to build RPC systems
 “Practical Kerberos with Apache HBase” https://goo.gl/y0d9ZO
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Finding an Hypothesis
 Logs logs logs
 Application and System
 Metrics exposed by JMX
 Graphing solutions
– Ambari Metrics Server + Grafana
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank You
jelser@hortonworks.com / elserj@apache.org

Contenu connexe

Tendances

HBase state of the union
HBase   state of the unionHBase   state of the union
HBase state of the unionenissoz
 
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...DataWorks Summit/Hadoop Summit
 
HBase Read High Availability Using Timeline Consistent Region Replicas
HBase  Read High Availability Using Timeline Consistent Region ReplicasHBase  Read High Availability Using Timeline Consistent Region Replicas
HBase Read High Availability Using Timeline Consistent Region Replicasenissoz
 
Mapreduce over snapshots
Mapreduce over snapshotsMapreduce over snapshots
Mapreduce over snapshotsenissoz
 
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks
 
Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming...
Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming...Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming...
Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming...Trieu Nguyen
 
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix clusterFive major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix clustermas4share
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseDataWorks Summit/Hadoop Summit
 
Apache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New FeaturesApache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New FeaturesHBaseCon
 
Dancing with the elephant h base1_final
Dancing with the elephant   h base1_finalDancing with the elephant   h base1_final
Dancing with the elephant h base1_finalasterix_smartplatf
 
Operating and supporting HBase Clusters
Operating and supporting HBase ClustersOperating and supporting HBase Clusters
Operating and supporting HBase Clustersenissoz
 
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0DataWorks Summit
 
HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017larsgeorge
 
Apache Big Data EU 2015 - HBase
Apache Big Data EU 2015 - HBaseApache Big Data EU 2015 - HBase
Apache Big Data EU 2015 - HBaseNick Dimiduk
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0enissoz
 

Tendances (20)

HBase state of the union
HBase   state of the unionHBase   state of the union
HBase state of the union
 
Apache Hive on ACID
Apache Hive on ACIDApache Hive on ACID
Apache Hive on ACID
 
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
 
HBase Read High Availability Using Timeline Consistent Region Replicas
HBase  Read High Availability Using Timeline Consistent Region ReplicasHBase  Read High Availability Using Timeline Consistent Region Replicas
HBase Read High Availability Using Timeline Consistent Region Replicas
 
Apache Phoenix + Apache HBase
Apache Phoenix + Apache HBaseApache Phoenix + Apache HBase
Apache Phoenix + Apache HBase
 
April 2014 HUG : Apache Phoenix
April 2014 HUG : Apache PhoenixApril 2014 HUG : Apache Phoenix
April 2014 HUG : Apache Phoenix
 
Mapreduce over snapshots
Mapreduce over snapshotsMapreduce over snapshots
Mapreduce over snapshots
 
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix
 
Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming...
Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming...Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming...
Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming...
 
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix clusterFive major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
 
Apache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New FeaturesApache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New Features
 
Dancing with the elephant h base1_final
Dancing with the elephant   h base1_finalDancing with the elephant   h base1_final
Dancing with the elephant h base1_final
 
Operating and supporting HBase Clusters
Operating and supporting HBase ClustersOperating and supporting HBase Clusters
Operating and supporting HBase Clusters
 
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0
 
The Heterogeneous Data lake
The Heterogeneous Data lakeThe Heterogeneous Data lake
The Heterogeneous Data lake
 
HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017
 
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
 
Apache Big Data EU 2015 - HBase
Apache Big Data EU 2015 - HBaseApache Big Data EU 2015 - HBase
Apache Big Data EU 2015 - HBase
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0
 

En vedette

HBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the ArtHBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the ArtMichael Stack
 
Apache HBase 入門 (第2回)
Apache HBase 入門 (第2回)Apache HBase 入門 (第2回)
Apache HBase 入門 (第2回)tatsuya6502
 
Apache Spark streaming and HBase
Apache Spark streaming and HBaseApache Spark streaming and HBase
Apache Spark streaming and HBaseCarol McDonald
 
HBaseとSparkでセンサーデータを有効活用 #hbasejp
HBaseとSparkでセンサーデータを有効活用 #hbasejpHBaseとSparkでセンサーデータを有効活用 #hbasejp
HBaseとSparkでセンサーデータを有効活用 #hbasejpFwardNetwork
 
Free Code Friday - Spark Streaming with HBase
Free Code Friday - Spark Streaming with HBaseFree Code Friday - Spark Streaming with HBase
Free Code Friday - Spark Streaming with HBaseMapR Technologies
 
Apache HBase 入門 (第1回)
Apache HBase 入門 (第1回)Apache HBase 入門 (第1回)
Apache HBase 入門 (第1回)tatsuya6502
 

En vedette (7)

HBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the ArtHBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the Art
 
Apache HBase 入門 (第2回)
Apache HBase 入門 (第2回)Apache HBase 入門 (第2回)
Apache HBase 入門 (第2回)
 
Apache Spark streaming and HBase
Apache Spark streaming and HBaseApache Spark streaming and HBase
Apache Spark streaming and HBase
 
HBaseとSparkでセンサーデータを有効活用 #hbasejp
HBaseとSparkでセンサーデータを有効活用 #hbasejpHBaseとSparkでセンサーデータを有効活用 #hbasejp
HBaseとSparkでセンサーデータを有効活用 #hbasejp
 
Spark + HBase
Spark + HBase Spark + HBase
Spark + HBase
 
Free Code Friday - Spark Streaming with HBase
Free Code Friday - Spark Streaming with HBaseFree Code Friday - Spark Streaming with HBase
Free Code Friday - Spark Streaming with HBase
 
Apache HBase 入門 (第1回)
Apache HBase 入門 (第1回)Apache HBase 入門 (第1回)
Apache HBase 入門 (第1回)
 

Similaire à Apache HBase Internals you hoped you Never Needed to Understand

HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseHBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseCloudera, Inc.
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
 
HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for ArchitectsNick Dimiduk
 
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0DataWorks Summit
 
Big data spain keynote nov 2016
Big data spain keynote nov 2016Big data spain keynote nov 2016
Big data spain keynote nov 2016alanfgates
 
Hadoop and HBase in the Real World
Hadoop and HBase in the Real WorldHadoop and HBase in the Real World
Hadoop and HBase in the Real WorldCloudera, Inc.
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0DataWorks Summit
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Ankit Singhal
 
HBase Read High Availabilty using Timeline Consistent Region Replicas
HBase Read High Availabilty using Timeline Consistent Region ReplicasHBase Read High Availabilty using Timeline Consistent Region Replicas
HBase Read High Availabilty using Timeline Consistent Region ReplicasDataWorks Summit
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconYiwei Ma
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统yongboy
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase强 王
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championAmeet Paranjape
 
Hadoop and h base in the real world
Hadoop and h base in the real worldHadoop and h base in the real world
Hadoop and h base in the real worldJoey Echeverria
 
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...Big Data Spain
 
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceStorage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceChris Nauroth
 

Similaire à Apache HBase Internals you hoped you Never Needed to Understand (20)

HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseHBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for Architects
 
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0
 
Hbase mhug 2015
Hbase mhug 2015Hbase mhug 2015
Hbase mhug 2015
 
Big data spain keynote nov 2016
Big data spain keynote nov 2016Big data spain keynote nov 2016
Big data spain keynote nov 2016
 
Hadoop and HBase in the Real World
Hadoop and HBase in the Real WorldHadoop and HBase in the Real World
Hadoop and HBase in the Real World
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0
 
HBase Read High Availabilty using Timeline Consistent Region Replicas
HBase Read High Availabilty using Timeline Consistent Region ReplicasHBase Read High Availabilty using Timeline Consistent Region Replicas
HBase Read High Availabilty using Timeline Consistent Region Replicas
 
ApacheCon-HBase-2016
ApacheCon-HBase-2016ApacheCon-HBase-2016
ApacheCon-HBase-2016
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a champion
 
Hadoop and h base in the real world
Hadoop and h base in the real worldHadoop and h base in the real world
Hadoop and h base in the real world
 
Mar 2012 HUG: Hive with HBase
Mar 2012 HUG: Hive with HBaseMar 2012 HUG: Hive with HBase
Mar 2012 HUG: Hive with HBase
 
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
 
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceStorage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduce
 
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
 

Plus de Josh Elser

Effective Testing of Apache Accumulo Iterators
Effective Testing of Apache Accumulo IteratorsEffective Testing of Apache Accumulo Iterators
Effective Testing of Apache Accumulo IteratorsJosh Elser
 
Apache Accumulo 1.8.0 Overview
Apache Accumulo 1.8.0 OverviewApache Accumulo 1.8.0 Overview
Apache Accumulo 1.8.0 OverviewJosh Elser
 
Calcite meetup-2016-04-20
Calcite meetup-2016-04-20Calcite meetup-2016-04-20
Calcite meetup-2016-04-20Josh Elser
 
Designing and Testing Accumulo Iterators
Designing and Testing Accumulo IteratorsDesigning and Testing Accumulo Iterators
Designing and Testing Accumulo IteratorsJosh Elser
 
Alternatives to Apache Accumulo’s Java API
Alternatives to Apache Accumulo’s Java APIAlternatives to Apache Accumulo’s Java API
Alternatives to Apache Accumulo’s Java APIJosh Elser
 
Data-Center Replication with Apache Accumulo
Data-Center Replication with Apache AccumuloData-Center Replication with Apache Accumulo
Data-Center Replication with Apache AccumuloJosh Elser
 
RPInventory 2-25-2010
RPInventory 2-25-2010RPInventory 2-25-2010
RPInventory 2-25-2010Josh Elser
 

Plus de Josh Elser (7)

Effective Testing of Apache Accumulo Iterators
Effective Testing of Apache Accumulo IteratorsEffective Testing of Apache Accumulo Iterators
Effective Testing of Apache Accumulo Iterators
 
Apache Accumulo 1.8.0 Overview
Apache Accumulo 1.8.0 OverviewApache Accumulo 1.8.0 Overview
Apache Accumulo 1.8.0 Overview
 
Calcite meetup-2016-04-20
Calcite meetup-2016-04-20Calcite meetup-2016-04-20
Calcite meetup-2016-04-20
 
Designing and Testing Accumulo Iterators
Designing and Testing Accumulo IteratorsDesigning and Testing Accumulo Iterators
Designing and Testing Accumulo Iterators
 
Alternatives to Apache Accumulo’s Java API
Alternatives to Apache Accumulo’s Java APIAlternatives to Apache Accumulo’s Java API
Alternatives to Apache Accumulo’s Java API
 
Data-Center Replication with Apache Accumulo
Data-Center Replication with Apache AccumuloData-Center Replication with Apache Accumulo
Data-Center Replication with Apache Accumulo
 
RPInventory 2-25-2010
RPInventory 2-25-2010RPInventory 2-25-2010
RPInventory 2-25-2010
 

Dernier

Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Intelisync
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 

Dernier (20)

Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 

Apache HBase Internals you hoped you Never Needed to Understand

  • 1. Apache HBase Internals you Hoped you Never Needed to Understand Josh Elser Future of Data, NYC 2016/10/11
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Engineer at Hortonworks, Member of the Apache Software Foundation Top-Level Projects • Apache Accumulo® • Apache Calcite™ • Apache Commons ™ • Apache HBase ® • Apache Phoenix ™ ASF Incubator • Apache Fluo ™ • Apache Gossip ™ • Apache Pirk ™ • Apache Rya ™ • Apache Slider ™ These Apache project names are trademarks or registered trademarks of the Apache Software Foundation.
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache HBase for storing your data! CC BY 3.0 US: http://hbase.apache.org/
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What happens when things go wrong? CC BY-ND 2.0: https://www.flickr.com/photos/widnr/6588151679
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved The BigTable Architecture  BigTable’s architecture is simple  Debugging a distributed system is not simple  How can we break down a complex system?  How do we write resilient software? • Log-Structured Merge Tree • Write-Ahead Logs • Distributed Coordination • Row-based, Auto-Sharding • Strong Consistency • Read Isolation • Coprocessors • Security (AuthN/AuthZ) • Backups
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Naming Conventions  Servers – Hostname, Port, and Timestamp – RegionServer: r01n01.domain.com,16201,1475691463147 – Master: r02n01.domain.com,16000,1475691462616  Regions – Table, Start RowKey, Region ID (timestamp), Replica ID, Encoded name – T1,x04x00x00,1470324608597.c04d94cd4ee9797da2fb906b4dcd2e3c. – Or simply c04d94cd4ee9797da2fb906b4dcd2e3c
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Regions  A sorted “shard” of a table  At least one “column family” – Physical partitions  Each family can have zero to many files  Hosted by at most one RegionServer – Can have many hosting RS’s for reads  In-memory locks for certain intra-row operations
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Region Assignment  Coordinated by the HBase Master  A Region must only be hosted by one RegionServer  State tracked in hbase:meta – hbck to fix issues  Region splits/merges make a hard problem even harder  Moving towards ProcedureV2 Closed Offline Opening OpenPending Open Normal Region Assignment States
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved The File System  HDFS “Compatible” – Distributed, durable, ”write leases”  Physical storage of HBase Tables (HFiles)  Write-ahead logs  A parent directory in that FileSystem (hbase.rootdir)
  • 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved The File System Physical Separation by HBase Namespace /hbase/data/ /hbase/data/default/<table1> /hbase/data/default/.tabledesc/.tableinfo… /hbase/data/default/<table2>/<region_id1> /hbase/data/default/<table2>/<region_id2> /hbase/data/my_custom_ns/<table3>/… /hbase/data/hbase/meta/… /hbase/archive/… /hbase/WALs/<regionserver_name>/… /hbase/oldWALs/… /hbase/corrupt/…
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved The File System for one Region /hbase/data/default/<table2>/<region_id1> …/.regioninfo …/.tmp …/<family1>/<hfile> …/<family1>/<hfile> …/<family2>/<hfile> …/<family3>/<hfile> …/recovered.edits/<number>.seqid
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Writes into HBase  Mutations inserted into sorted in-memory structure and WAL – Fast lookups of recent data – Append-only log for durability and speed  Mutations are collected by destination Region  Beware of hot-spotting  Data in memory eventually flush’ed into sorted (H)files
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Compactions and Flushes  Flush: Taking Key-Values from the In-Memory map and creating an HFile  Minor Compaction: Rewriting a subset of HFiles for a Region into one HFile  Major Compaction: Rewriting all HFiles for a Region into one HFile  Compactions balance improved query performance with cost of rewriting data – Compactions are good! – Must understand SLA’s to properly tune compactions
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Reads into HBase  Merge-Sort over multiple streams of data – Memory – Disk (many files)  hbase:meta is the definitive source of where to find Regions RowKey Region hbase:meta RegionServer ZooKeeper
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache ZooKeeper™  Distributed coordination is really hard  Obvious use cases – Service Discovery – Cluster Membership – “Root Table”  Non-obvious use cases – Assignment (sometimes) – Region Recovery – WAL Splitting – Cluster Replication – Distributed Procedures – HBase Snapshots Apache ZooKeeper is a trademark of the Apache Software Foundation
  • 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache ZooKeeper™  Discovery/Leader ZNodes – /hbase/rs/… – /hbase/master/… – /hbase/backup-masters/…  Consensus – /hbase/splitWAL/… – /hbase/flush-table-proc/... – /hbase/table-lock/... – /hbase/region-in-transition/... – /hbase/recovering-regions/...
  • 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Distributed Procedures  Resiliency in an unreliable system – How do we create a table?  “Procedure V2” – Resilient, finite state machine  HBase operations represented as ”procedures”  Clients are agnostic of Master state – Clients track procedure state https://issues.apache.org/jira/secure/attachment/12679960/ProcedureV2.pdf
  • 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Distributed Procedures  Procedures are durable via Write-Ahead Log – /hbase/MasterProcWALs/…  Procedures only executed by the active HBase Master  Reusable framework for the future
  • 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HBase RPCs  Internal and External HBase Communication  Half-Sync/Half-Async Model  Many knobs to tweak  Listener  Readers  Scheduler  Call Queues  Call Runners/Handlers Overview Components
  • 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HBase RPCs L i s t e n e r Reader Reader Reader Reader S c h e d u l e r Call Queues Handlers Priority Read Write Replication Request to Execution
  • 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Disaster Recovery  Multiple tools to ensure copies of data in the face of catastrophic failure  CopyTable – MapReduce job which reads all data from a source, writing to destination  Snapshots – A collection of Regions, their HFiles, and metadata  Backup & Restore – HBASE-7912, current targeted for HBase-2.0.0 – Incremental and full backup/restore
  • 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Kerberos  Strong authentication for untrusted networks  ”Standard” across Apache Hadoop and friends  Requirements: – Forward/Reverse DNS – Unlimited Strength Java Cryptography Extension  SASL used to build RPC systems  “Practical Kerberos with Apache HBase” https://goo.gl/y0d9ZO
  • 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Finding an Hypothesis  Logs logs logs  Application and System  Metrics exposed by JMX  Graphing solutions – Ambari Metrics Server + Grafana
  • 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thank You jelser@hortonworks.com / elserj@apache.org

Notes de l'éditeur

  1. Architecture wise: BigTable as a system is well understood and simple. A decade since the paper. Distributed systems are complex! Easier to reason about if we consider them as smaller units.
  2. Important to be able to grep! Know what to look for. DNS important to make sure consistent naming across all nodes.
  3. HBase needs a distributed a resilient filesystem (see also Azure tech). Data that is written+sync’ed must be present! Relies on one-writer per file (hdfs leases) HBase Tables: Not just Key-Values (hfiles) but also serialized table metadata. WALs durabilty is key here
  4. /hbase/data = All table data /hbase/archive = Hfiles before deletion /hbase/WALs = Write-ahead logs /hbase/oldWALs = WALs before deletion /hbase/corrupt = Corrupt WALs
  5. .regioninfo = metadata about this region .tmp = general temporary space (compactions) recovered.edits = artifact of WAL recovery
  6. Compactions == fewer files, more efficient lookups
  7. “What happens when meta is unassigned?”
  8. ZooKeeper provides authentication and authorization as well (for HBase, no auth or Kerberos auth via SASL). ACLs are used to prevent users from changing sensitive data in ZK – only HBase nodes can change them.
  9. Resilience is hard. How do we make sure that an operation will succeed if servers fail? How do we determine between previous failed attempts and users trying to concurrently perform the same operation Table creation: unique name, directories in HDFS, create intial region in HDFS, update meta, enable the table, etc.
  10. ProcV2 implementation is tricky/complicated, but provides an internal API to make operations easy to implement and reason about in the future. Easy to inspect state. Model is proven in Accumulo’s FATE
  11. Lots of knobs because we want to be able to optimize things like throughput, latency, and fairness, which are often mutually exclusive
  12. Listener does Socket accept, dispatches to Readers. Readers read a number of bytes off the wire (the Selector channel). Sends the deserialized request to the Scheduler which gets it placed on a call queue, which a handler will eventually process.
  13. Aka “you dun goofed up” CopyTable – slow, requires src and destination to be up. Not really.. Desirable Snapshots – Great for one off’s. Can grow DFS usage though. Requires coordination of a flush for full backup B&R – Snapshots with ability track WALs for incremental backups since last full backup
  14. Brutally-sparse Kerberos talk
  15. JMX – JvisualVM, hbase web Uis, hadoop metrics 2 sink (AMS)