Building HBase Applications - Ted Dunning

© 2014 MapR Technologies 1
© MapR Technologies, confidential
Big Data Everywhere
Tel Aviv, June 2014
Building HBase Applications

Me, Us
• Ted Dunning, Chief Application Architect, MapR
Committer PMC member, Mahout, Zookeeper, Drill
Bought the beer at the first HUG
• MapR
Distributes more open source components for Hadoop
Adds major technology for performance, HA, industry standard API’s
• Info
Hash tag - #mapr #DataIsreal
See also - @ApacheMahout @ApacheDrill
@ted_dunning and @mapR

Topics For Today
• What is special about HBase applications?
• Example: Time Series Database
• Example: Web-fronted Dashboard
• Questions and Discussion

Disks have gotten slower
then: Fujitsu Eagle
380MB / 1.8MB / s = 221 s
now: WD4001FAEX
4TB / 154MB / s = 25 k s = 7.2 hours

Memory has gotten smaller
then:
64MB / 1 x Fujitsu Eagle = 0.168
now:
128GB / 12 x WD4001FAEX = 2.7 x 10-3

The Task Has Changed
The primary job for databases
is to discard data
(speaking historically)

Modern Database Goals
• Use modern disks fully
• Work around lack of memory
• Retain all the data

Modern Database Methods
• Use large sequential I/O transfers
• Use many machines
• Handle write-mostly work load
• Store related data elements together
• Relax constraints (ACID? Schema? Indexes?)

How Does This Work?
• Split data into tablets
– Store tablets on many computers
• Allow many columns
– Only store data for live columns
– Allows for innovative data arrangement
• Allow applications to encode data
• Buffer data to allow updates to be organized before writing
• Previously written data may be merged periodically to improve
organization, but avoid rewrite storms

MapR, HBase Table Architecture
• Tables are divided into key ranges (tablets or regions)
• Tablets are served automatically by MapR FS or region-server
• Columns are divided into access groups (column families)
CF1 CF2 CF3 CF4 CF5
R1
R2
R3
R4

MapR, HBase Tables are Divided into Regions
• A table is divided into one or more regions
• Each region is 1-5 GB in size, contains (start and end keys)
• Regions contained within a single container (MapR)
• Initially one Region per Table
– Support pre-split tables using HBase APIs and HBase shell
– You can also pre-splits with known access patterns
• Important to spread regions across all available nodes
• Region splits into 2 different regions when it becomes too large
– Split is very quick
• Uses MapR FS to spread data, manage space (MapR)

RDBMS versus MapR Tables
RDBMS tables MapR tables
ACID Row based ACID
Sharding/Partitioning Distributed Regions
SQL builtin Key lookup/key range scans
No Unix file metadata operation on
tables
Unix File metadata operation on
tables
Indexes (B+Tree, R-Tree) row-key, no built in secondary index
Primitive data types Byte arrays
Inplace update Cell versioning

HBase versus MapR Tables
HBase tables MapR tables
Table/region/column family Table/region/column family
Distributed Regions Distributed Regions
Wide variation of latency Consistent latency
No Unix file metadata operation on
tables
Unix File metadata operation on
tables
Limited column family count 64 column families
Fuzzy snapshots Precise snapshots
Replication API Not supported

Column Families
• Columns are defined per row
• Columns in HBase and MapR tables are grouped into column families
– MapR supports up-to 64 column families
– in-memory column-families
• Grouping should facilitate common access patterns, not just reflect
logical connection
– Columns written or read together make good column families
– Rarely needed columns should probably be in own column family
– Radically different cardinality may suggest separate column families
• Physically, all column family members are stored together on the file
system
– This makes access fast

Technical Summary
• Tables are split into tablets or regions
• Regions contain column families, stored together
• Columns are only stored where needed
• Many, many, many columns are allowed
• Rows are accessed by a single key, filters are allowed on scans
• Values are byte arrays

Technical Summary
• Tables are split into tablets or regions
• Regions contain column families, stored together
• Columns are only stored where needed
• Many, many, many columns are allowed
• Rows are accessed by a single key, filters are allowed on scans
• Values are byte arrays
• You get low-level access to control speed, allow scaling
• This is not your father’s RDBMS!

Cost/Benefits Summary
• Pro
– Predictable disk layout
– Flexibility in key design, data format
– Allows nested, document or relational models
– Superb scalability, speed are possible
• Con
– Technically more demanding than a small Postgres instance
– Hotspot risk requires proper design
– Latency can be highly variable (for vanilla HBase, not MapR)

Let’s build something!

Time Series Database Example
• Let’s build a time series database
• See http://opentsdb.net/

The Problem
• We have about 100,000 metrics with an average of about 10,000
distinct measurements per second
• Some things change over seconds, some over hours
• We want to query over time ranges from seconds to months to
produce plots of time window aggregates
– What is max latency per hour for last three weeks on all web tier
machines?

Non-solution
• Munin, RDF, Ganglia, Graphite all discard data
– Remember the primary job of a classic database?
• We want full resolution for historical comparisons
• Size is no longer an issue, big has gotten quite small
– 1012 data points << 10 nodes @ 12 x 4 TB per node
– We can piggy back on another cluster

Why is This Hard?
• 10,000 points per second x 84,600 seconds/day x 1000 days
• That is nearly a trillion data points! (0.8 x 1012)
• Queries require summarizing hundreds of thousands of points in
200 ms
• We want the solution to be low impact and inexpensive
– And be ready to scale by several orders of magnitude

Step 1: Compound keys give
control over layout

Key Composition #1
Time Metric Node Value
10667 load1m n1 1.3
10667 load5m n1 1.0
10668 load1m n2 0.1
10668 load5m n2 0.1
10727 load1m n1 0.9
10727 load5m n1 0.9
All samples go to a
single machine for a
long time

Key Composition #2
Metric Time Node Value
load1m 10667 n1 1.3
load1m 10668 n2 0.1
load1m 10727 n1 0.9
load5m 10667 n1 1.0
load5m 10668 n2 0.1
load5m 10727 n1 0.9
All samples for same
metric go to a single
machine
Queries commonly focus
on one or a few metrics
at a time

Key Composition #3
Node Metric Time Value
n1 load1m 10667 1.3
n1 load1m 10727 0.9
n1 load5m 10667 1.0
n1 load5m 10727 0.9
n2 load1m 10668 0.1
n2 load5m 10668 0.1
All samples for same
node go to a single
machine
Unfortunately, queries
commonly require data
for a single metric, but
many machines

Lesson: Pick door #2
Maximize density of desired data

Protip: Add key-value pairs to
end of key for additional tags

Step 1: Relational not
required

Tall and Skinny? Or Wide and Fat?
Metric Time Node Value

Tall and Skinny? Or Wide and Fat?
Metric Window Node +1
7
+1
8
+7
7
+7
8
+13
7
load1
m
13:00 n1 1.3 0.9
load1
m
13:00 n2 0.1 0.1
load5
m
13:00 n1 1.0 0.9
load5
m
13:00 n2 0.1
Filtering overhead is non-trivial …
wide and fat has to filter fewer rows

Or non-relational?
Metric Window Node Compressed
load1
m
13:00 n1 {t:[17,77],v:[1.3,0.9]}
load1
m
13:00 n2 {t:[18,78],v:[0.1,0.1]}
load5
m
13:00 n1 {t:[17,77],v:[1.0,0.9]}
load5
m
13:00 n2 {t:[18,78],v:[0.1,0.1]}
Cleanup process can sweep up old
values after the hour is finished.
Blob data can be compressed using
fancy tricks.

Lesson: Schemas can be very
flexible and can even
change on the fly

Step 3: Sequential reads hide
many sins if density is high

Which Queries? Which Data?
• Most common is 1-3 metrics for 5-100% of nodes based on tags
– Which nodes have unusual load?
– Do any nodes stand out for response latency?
– Alarm bots
• Also common to get 5-20 metrics for single node
– Render dashboard for particular machine
• Result density should be high for all common queries
• Most data is never read but is retained as insurance policy
– Can’t predict what you will need to diagnose future failure modes

Lesson: Have to know the
queries to design in
performance

Step 3: Time to Level up!

What About the Major Leagues?
• Industrial sensors can dwarf current TSDB loads
– Assume 100 (drill rigs | generators | heating systems | turbines)
– Each has 10,000 sensors
– Each is sampled once per second
– Total sample rate is 106 samples / s (100x faster than before)
• Industrial applications require extensive testing at scale
– Want to load years of test data in a few days
– Sample rate for testing is 100 x 106 samples / s (10,000x faster)
• And you thought the first example was extreme

What About the Major Leagues? World Cup?
• Industrial sensors can dwarf current TSDB loads
– Assume 100 (drill rigs | generators | heating systems | turbines)
– Each has 10,000 sensors
– Each is sampled once per second
– Total sample rate is 106 samples / s (100x faster than before)
• Industrial applications require extensive testing at scale
– Want to load years of test data in a few days
– Sample rate for testing is 100 x 106 samples / s (10,000x faster)
• And you thought the first example was extreme

Rough Design Outline
• Want to record and query 100M samples / s at full resolution
• Each MapR node serving tables can do ~20-40k inserts per
second @ 1kB/record, ~60k inserts/s @ 100B / record
• Each MapR node serving files can insert at ~1GB / s
• We can buffer data in file system until we get >1000 samples per
metric
• Once data is sufficient, we do one insert per metric per hour
– 3600x fewer inserts

Data Flow – High Speed TSDB
Web tier
Data
catcherData
catcherData
catcher
Flat
files
Consolidator
Consolidator
Consolidator
Consolidator
ConsolidatorMeasurement
Systems
TSDB
tables
Browser

Quick Results
• Estimated data volumes
– 100 M p / s / (3600 p/row) = 28 k row / s
• Estimated potential throughput
– 4 nodes @ 10 k row / s = 40 k row / s = 144 M p / s
• Observed throughput for 2 day prototype
– 1 feeder node, 4 table nodes, 10 M p / s
– Feeder node CPU bound, table nodes < 5% CPU, disk ~ 0
• Simple prototype is limited by generator
• Compare to observed max 100 k p / s on SQL Server
– “Only” 100x faster

Lesson: Very high rates look
plausible with hybrid design

Quick Example: Xactly
Dashboard

MapR’s higher performance solution is far more efficient and cost-effective.
“I can do something on a 10-node cluster that might require a 20-node cluster from a
different Hadoop vendor”.
CTO & SVP of Engineering
Xactly: Sales Performance Management
Xactly Insights: Delivering incentive compensation data for sales operations
• Provide cloud-based performance management solutions to sales ops
teams to help them design/manage optimal incentive compensation plans.
• RDBMS-based platform was unable to scale in a cost effective way
• Stringent performance and responsiveness expectations of users in a SaaS
application
• Highly responsive application that scaled to a growing customer base
• Multi-tenancy capabilities in MapR helped ensure each customer’s data
was isolated and separate from other customers in the SaaS application
• MapR delivered on Xactly’s need for scale and low operational overhead
OBJECTIVES
CHALLENGES
SOLUTION
Business
Impact

Dashboard Problem
• Hundreds to thousands of customers have hundreds to
thousands of sales team members
• Want to be able to compare stats for each team member, team,
company against relevant roll-ups
• Prototyped system in RDB, Mongo, MapR tables
– Natural fit to relational cubes
– Easy extension to Mongo documents with indexes
– HBase application architecture has only one index
• Production solution used special key design in MapR tables
– Disk-based speed matched in-memory speed of Mongo

Lesson: Obviously relational
problems often have effective
non-relational solutions

Summary
• HBase and MapR tables are conceptually very simple
– But require careful design
– Composite key design crucial
– Non-relational column usage often important
• Practical systems can exceed relational throughput by many
orders of magnitude with very small clusters
• Composite file/table designs can be very powerful
– The world is not a database

Me, Us
• Ted Dunning, Chief Application Architect, MapR
Committer PMC member, Mahout, Zookeeper, Drill
Bought the beer at the first HUG
• MapR
Distributes more open source components for Hadoop
Adds major technology for performance, HA, industry standard API’s
• Info
Hash tag - #mapr
See also - @ApacheMahout @ApacheDrill
@ted_dunning and @mapR

Building HBase Applications - Ted Dunning

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (6)

Similaire à Building HBase Applications - Ted Dunning

Similaire à Building HBase Applications - Ted Dunning (20)

Plus de MapR Technologies

Plus de MapR Technologies (20)

Dernier

Dernier (20)

Building HBase Applications - Ted Dunning

Notes de l'éditeur