SlideShare a Scribd company logo
1 of 62
Apache HBase 
Application 
Archetypes 
Strata + Hadoop World Barcelona. 
November 20h , 2014 
Lars George | @larsgeorge | Cloudera EMEA Chief Architect | 
HBase PMC 
Jonathan Hsieh | @jmhsieh | Cloudera HBase Tech lead | HBase 
PMC
2 
About Lars and Jon 
Lars George 
• EMEA Chief Architect 
@Cloudera 
– Apache HBase PMC 
– O’Reilly Author of HBase – 
The Definitive Guide 
• Contact 
– lars@cloudera.com 
– @larsgeorge 
Jon Hsieh 
• Tech Lead HBase Team 
@Cloudera 
– Apache HBase PMC 
– Apache Flume founder 
• Contact 
– jon@cloudera.com 
– @jmhsieh 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
3 
About Supporting HBase at Cloudera 
• Supporting Customers using HBase since 2011 
– HBase Training 
– Professional Services 
• Team has experience supporting and running HBase since 2009 
– 9 committers on staff 
– 2 HBase book authors 
• As of Jan 2014, ~20,000 HBase nodes (in aggregate) under 
management 
• Information in this presentation is either aggregated customer data 
or from public sources. 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
4 
An Apache HBase Timeline 
2008 2009 2010 2011 2012 2013 2014 
Apr’11: CDH3 
GA with HBase 
0.90.1 
May ‘12: 
HBaseCon 
2012 
Jun ‘13: 
HBaseCon 
2013 
Summer‘11: 
Messages 
on HBase 
Summer ‘09 
StumbleUpon 
goes production 
on HBase ~0.20 
Nov ‘11: 
Cassini 
on HBase 
Jan ‘13 
Phoenix 
on HBase 
Summer‘11: 
Web Crawl 
Cache 
Sept’11: 
HBase TDG 
published 
Nov’12: 
HBase in 
Action 
published 
May ‘14: 
HBaseCon 
2014 
Aug ‘13 
Flurry 1k-1k node 
cluster replication 
2015 
Fall’14/Winter 
‘15 
HBase v1.0.0 
released 
Jan’14: 
Cloudera has 
~20k HBase 
nodes under 
management 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
5 
Apache HBase “Nascar” Slide 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
6 
Outline 
• Definitions 
• Archetypes 
–The Good 
–The Bad 
–The Maybe 
• Conclusion 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
7 
Definitions 
A vocabulary for HBase Archetypes
8 
Defining HBase Archetypes 
• There are a lot of HBase applications 
– Some successful, some less so 
– They have common architecture patterns 
– They have common tradeoffs 
• Archetypes are common architecture patterns 
– Common across multiple use-cases 
– Extracted to be repeatable 
• Our Goal: Define patterns à la “Gang of Four” (Gamma, Helm, 
Johnson, Vlissides) 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
9 
So you want to use HBase? 
• What data is being stored? 
– Entity data 
– Event data 
• Why is the data being stored? 
– Operational use cases 
– Analytical use cases 
• How does the data get in and out? 
– Real time vs. Batch 
– Random vs. Sequential 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
10 
What is being stored? 
There are primarly two kinds of big data workloads. They have 
different storage requirements. 
Entities Events 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
11 
Entity Centric Data 
• Entity data is information about current state 
– Generally real time reads and writes 
• Examples: 
– Accounts 
– Users 
– Geolocation points 
– Click Counts and Metrics 
– Current Sensors Reading 
• Scales up with # of Humans and # of Machines/Sensors 
– Billions of distinct entities 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
12 
Event Centric Data 
• Event centric data are time-series data points recording successive 
points spaced over time intervals. 
– Generally real time write, some combination of real time read or batch read 
• Examples: 
– Sensor data over time 
– Historical Stock Ticker data 
– Historical Metrics 
– Click time-series 
• Scales up due to finer grained intervals, retention policies, and the 
passage of time 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
13 
Events about Entities 
• Majority Big Data use cases are dealing with event-based data 
– |Entities| * |Events| = Big data 
• When you ask questions, do you hone in on entity first? 
• When you ask questions, do you hone in on time ranges first? 
• Your answer will help you determine where and how to store your 
data. 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
14 
Why are you storing the data? 
• So what kind of questions are you asking the data? 
• Entity-centric questions 
– Give me everything about entity e 
– Give me the most recent event v about entity e 
– Give me the n most recent events V about entity e 
– Give me all events V about e between time [t1,t2] 
• Event and Time-centric questions 
– Give me an aggregate for each entity between time [t1,t2] 
– Give me an aggregate for each time interval for entity e 
– Find events V that match some other given criteria 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
15 
How does data get in and out of HBase? 
Put, Incr, Append 
HBase Client 
Gets 
Short scan 
HBase Client 
Full Scan, 
MapReduce 
HBase Scanner 
Bulk Import 
HBase Client 
HBase 
Replication 
HBase 
Replication 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
16 
How does data get in and out of HBase? 
Put, Incr, Append 
HBase Client 
Get, Scan 
HBase Client 
Bulk Import 
HBase Client 
HBase 
Replication 
HBase 
Replication 
low latency 
high throughput 
Gets 
Short scan 
Full Scan, 
MapReduce 
HBase Scanner 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
17 
What system is most efficient? 
• It is all physics 
• You have a limited I/O budget 
– Use all your I/O by parallelizing access and 
read/write sequentially. 
– Choose the system and features that reduces 
I/O in general 
• Pick the system that is best for your 
workload IOPs/s/disk 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
18 
The physics of Hadoop Storage Systems 
Workload HBase HDFS 
Low Latency ms, cached mins, MR 
+ seconds, Impala 
Random Read primary index - index?, small files 
problem 
Short Scan sorted + partition 
Full Scan 0 live table 
+ (MR on snapshots) 
MR, Hive, Impala 
Random Write log structured - Not supported 
Sequential Write hbase overhead 
bulk load 
minimal overhead 
Updates log structured - Not supported 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
19 
The physics of Hadoop Storage Systems 
Workload HBase HDFS 
Low Latency ms, cached mins, MR 
+ seconds, Impala 
Random Read primary index - index?, small files 
problem 
Short Scan sorted + partition 
Full Scan 0 live table 
+ (MR on snapshots) 
MR, Hive, Impala 
Random Write log structured - Not supported 
Sequential Write hbase overhead 
bulk load 
minimal overhead 
Updates log structured - Not supported 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
20 
The physics of Hadoop Storage Systems 
Workload HBase HDFS 
Low Latency ms, cached mins, MR 
+ seconds, Impala 
Random Read primary index - index?, small files 
problem 
Short Scan sorted + partition 
Full Scan 0 live table 
+ (MR on snapshots) 
MR, Hive, Impala 
Random Write log structured - not supported 
Sequential Write HBase overhead 
bulk load 
minimal overhead 
Updates log structured - not supported 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
21 
The Archetypes 
HBase Applications
22 
HBase application use cases 
• The Good 
– Simple Entities 
– Messaging Store 
– Graph Store 
– Metrics Store 
• The Bad 
– Large Blobs 
– Naïve RDBMS port 
– Analytic Archive 
• The Maybe 
– Time series DB 
– Combined workloads 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
23 
Archetypes: The 
Good 
HBase, you are my soul mate.
24 
Archetype: Simple Entities 
• Purely entity data, no relation between entities 
– Batch or real-time, random writes 
– Real-time, random reads 
– Could be a well-done denormalized RDBMS port 
– Often from many different sources, with poly-structured data 
• Schema: 
– Row per entity 
– Row key => entity ID, or hash of entity ID 
– Col qualifier => Property / field, possibly time stamp 
• Examples: 
– Geolocation data 
– Search index building 
– Use solr to make text data searchable 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
25 
Simple Entities access pattern 
Put, Incr, Append 
HBase Client 
Get, Scan 
HBase Client 
HBase 
Replication 
Bulk Import 
HBase Client 
low latency 
high throughput 
Gets 
Short scan 
HBase 
Replication 
Full Scan, 
MapReduce 
HBase Scanner 
Solr 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
26 
Archetype: Messaging Store 
• Messaging Data: 
– Realtime random writes: Emails, SMS, MMS, IM 
– Realtime random updates: Msg read, starred, moved, deleted 
– Reading of top-N entries, sorted by time 
– Records are of varying size 
– Some time series, but mostly random read/write 
• Schema: 
– Row = users/feed/inbox 
– Row key = UID or UID + time 
– Column Qualifier = time or conversation id + time. 
– Use CF’s for indexes. 
• Examples: 
– Facebook Messages, Xiaomi Messages 
– Telco SMS/MMS services 
– Feeds like tumblr, pinterest 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
27 
Facebook Messages - Statistics 
Source: HBaseCon 2012 - Anshuman Singh 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
28 
Messages Access Pattern 
Put, Incr, Append 
HBase Client 
Get, Scan 
HBase Client 
Bulk Import 
HBase Client 
HBase 
Replication 
HBase 
Replication 
low latency 
high throughput 
Gets 
Short scan 
Full Scan, 
MapReduce 
HBase Scanner 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
29 
Archetype: Graph Data 
• Graph Data: All entities and relations 
– Batch or realtime, random writes 
– Batch or realtime, random reads 
– Its an entity with relation edges 
• Schema: 
– Row = Node. 
– Row key => Node ID. 
– Col qualifier => Edge ID, or properties:values 
• Examples: 
– Web Caches – Yahoo!, Trend Micro 
– Titan Graph DB with HBase storage backend 
– Sessionization (financial transactions, clicks streams, network traffic) 
– Government (connect the bad guy) 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
30 
Graph Data Access Pattern 
Put, Incr, Append 
HBase Client 
Get, Scan 
HBase Client 
Bulk Import 
HBase Client 
HBase 
Replication 
HBase 
Replication 
low latency 
high throughput 
Gets 
Short scan 
Full Scan, 
MapReduce 
HBase Scanner 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
31 
Archetype: Metrics 
• Frequently updated metrics 
– Increments 
– Roll ups generated by MR and bulk loaded to HBase 
• Schema 
– Row: Entity for a time period 
– Row key: entity-<yymmddhh> (granular time) 
– Col Qualifier: property -> count 
• Examples 
– Campaign Impression/Click counts (Ad tech) 
– Sensor data (Energy, Manufacturing, Auto) 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
32 
Metrics Access Pattern 
Put, Incr, Append 
HBase Client 
Get, Scan 
HBase Client 
Bulk Import 
HBase Client 
HBase 
Replication 
HBase 
Replication 
low latency 
high throughput 
Gets 
Short scan 
Full Scan, 
MapReduce 
HBase Scanner 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
33 
Archetypes: The 
Bad 
These are not the droids you are looking for
34 
Current HBase weak spots 
• HBase’s architecture can handle a lot 
– Engineering tradeoffs optimize for some usecases and against others 
– HBase can still do things it is not optimal for 
– However, other systems are fundamentally more efficient for some workloads 
• We’ve seen folks forcing apps into HBase 
– If there is only one workloads on the data, consider another system 
– If there is a mixed workload, some of cases become “maybes” 
• Just because it is not good today, doesn’t mean it can’t be better 
tomorrow! 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
35 
Bad Archetype: Large Blob Store 
• Saving large objects >3MB per cell 
• Schema: 
– Normal entity pattern, but with some columns with large cells 
• Examples 
– Raw photo or video storage in HBase 
– Large frequently updated structs as a single cell 
• Problems: 
– Write amplification when reoptimizing data for read (compactions on large 
unchanging data) 
– Write amplification when large structs rewritten to update subfields. Cells are 
atomic, and HBase must rewrite an entire cell 
• Note: Medium Binary Object (MOB) support coming (lots of 100KB-10MB 
cells) 
– See HBASE-11339 for more details. 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
36 
Bad Archetype: Naïve RDBMS port 
• A naïve port of an RDBMS into HBase, directly copying the schema 
• Schema 
– Many tables, just like an RDBMS schema 
– Row key: primary key or auto-incrementing key, like RDBMS schema 
– Column qualifiers: field names 
– Manually do joins, or secondary indexes (not consistent) 
• Solution: 
– HBase is not a SQL Database 
– No multi-region/multi-table in HBase transactions (yet) 
– No built in join support. Must to denormalize your schema to use HBase 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
37 
Large blob store, Naïve RDBMS port access 
patterns 
Put, Incr, Append 
HBase Client 
Get, Scan 
HBase Client 
Bulk Import 
HBase Client 
HBase 
Replication 
HBase 
Replication 
low latency 
high throughput 
Gets 
Short scan 
Full Scan, 
MapReduce 
HBase Scanner 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
38 
Bad Archetype: Analytic archive 
• Store purely chronological data, partitioned by time 
– Real time writes, chronological time as primary index 
– Column-centric aggregations over all rows 
– Bulk reads out, generally for generating periodic reports 
• Schema 
– Row key: date+xxx or salt+date+xxx 
– Column qualifiers: properties with data or counters 
• Example 
– Machine logs organized by date (causes write hotspotting) 
– Full fidelity clickstream organzied by date (as opposed to campaign) 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
39 
Bad Archetype: Analytic Archive Problems 
• HBase non-optimal as primary use case 
– Will get crushed by frequent full table scans 
– Will get crushed by large compactions 
– Will get crushed by write-side region hot spotting 
• Solution: 
– Store in HDFS; Use Parquet columnar data storage + Impala/Hive 
– Build rollups in HDFS+MR; store and serve rollups in HBase 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
40 
Analytic Archive access patterns 
Put, Incr, Append 
HBase Client 
Get, Scan 
HBase Client 
Bulk Import 
HBase Client 
HBase 
Replication 
HBase 
Replication 
low latency 
high throughput 
Gets 
Short scan 
Full Scan, 
MapReduce 
HBase Scanner 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
41 
Archetypes: The 
Maybe 
And this is crazy | But here’s my data, | serve it, 
maybe!
42 
The Maybe’s 
• For some applications, doing it right gets complicated. 
• More sophisticated or nuanced cases 
• Require considering these questions: 
– When do you choose HBase vs HDFS storage for time series data? 
– Are there times where bad archetypes are ok? 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
43 
Time Series: in HBase or HDFS? 
• Timeseries IO Pattern Physics: 
– Reads: Collocate related data 
• Make reads cheap and fast 
– Writes: Spread writes out as much as possible 
• Maximize write throughput 
• HBase: Tension between these goals 
– Spreading writes spreads data making reads inefficient 
– Colocating on write causes hotspots, underutilizes resources by limiting write 
throughput 
• HDFS: The sweet spot 
– Sequential writes and and sequential read 
– Just write more files in date-dirs; physically spreads writes but logically groups data 
– Reads for time centric queries: just read files in date-dir 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
44 
Time Series data flows 
• Ingest 
– Flume or similar direct tool via app 
• HDFS for historical 
– No real time serving 
– Batch queries and generate rollups in Hive/MR 
– Faster queries in Impala 
• HBase for recent 
– Serve individual events 
– Serve pre-computed aggregates 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
45 
Archetype: Entity Time Series 
• Full fidelity historical record of metrics 
– Random write to event data, random read specific event or 
aggregate data 
• Schema: 
– Rowkey: entity-timestamp or hash(entity)-timestamp, possibly 
with salt added after entity 
– Col qualifiers: granular time stamps -> value 
– Use custom aggregation to consolidate old data 
– Use TTL’s to bound and age off old data 
• Examples: 
– OpenTSDB is a system on HBase that handles this for numeric 
values 
• Lazily aggregates cells for better performance 
– Facebook Insights, ODS 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
46 
Entity Time Series access pattern 
Put, Incr, Append 
HBase Client 
Get, Scan 
HBase Client 
Bulk Import 
HBase Client 
HBase 
Replication 
HBase 
Replication 
low latency 
high throughput 
Gets 
Short scan 
Full Scan, 
MapReduce 
HBase Scanner 
Flume 
Custom 
App 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
47 
Archetypes: Hybrid Entity Time Series 
• Essentially a combo of the Metric Archetype and Entity Time Series 
Archetype, with bulk loads of rollups via HDFS 
– Land data in HDFS and HBase 
– Keep all data in HDFS for future use 
– Aggregate in HDFS and write to HBase 
– HBase can do some aggregates too (counters) 
– Keep serve-able data in HBase 
– Use TTL to discard old values from HBase 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
48 
Hybrid time series access pattern 
Put, Incr, Append 
HBase Client 
Get, Scan 
HBase Client 
Hive or MR: 
Bulk Import 
HBase Client 
HBase 
Replication 
HBase 
Replication 
low latency 
high throughput 
Gets 
Short scan 
Full Scan, 
MapReduce 
HBase Scanner 
Flume 
HDFS 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
49 
Meta Archetype: Combined workloads 
• In these cases, the use of HBase depends on workload 
• Cases where we have multiple workloads styles. 
– Many cases we want to do multiple things with the same data 
– Primary use case (real time, random access) 
– Secondary use case (analytical) 
– Pick for your primary, here’s some patterns on how to do your 
secondary. 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
50 
Operational with Analytical access pattern 
Get, Scan 
HBase Client 
poor latency! 
full scans 
interfere with 
latency! 
high throughput 
MapReduce 
HBase Scanner 
Put, Incr, Append 
HBase Client 
HBase 
Replication 
Bulk Import 
HBase Client 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
51 
Operational with Analytical access pattern 
Get, Scan 
HBase Client 
HBase 
Replication 
low latency 
Isolated from full scans 
high throughput 
MapReduce 
HBase Scanner 
Put, Incr, Append 
HBase Client 
HBase 
Replication 
Bulk Import 
HBase Client 
high throughput 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
52 
MR over Table Snapshots (0.98, CDH5.0) 
• Previously MapReduce jobs over 
HBase required online full table 
scan 
• Take a snapshot and run MR job 
over snapshot files 
– Doesn’t use HBase client 
– Avoid affecting HBase caches 
– 3-5x perf boost. 
– Still requires more IOPs than HDFS 
raw files 
map 
map 
map 
map 
map 
map 
map 
map 
reduce 
reduce 
reduce 
map 
map 
map 
map 
map 
map 
map 
map 
reduce 
reduce 
reduce 
snapshot 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
53 
Analytic Archive access pattern 
Put, Incr, Append 
HBase Client 
Get, Scan 
HBase Client 
Bulk Import 
HBase Client 
HBase 
Replication 
HBase 
Replication 
low latency 
high throughput 
Gets 
Short scan 
Full Scan, 
MapReduce 
HBase Scanner 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
54 
Analytic Archive Snapshot access pattern 
HDFS 
Put, Incr, Append 
HBase Client 
HBase Client 
Snapshot Scan, 
MR 
HBase Scanner 
Bulk Import 
HBase Client 
HBase 
Replication 
HBase 
Replication 
low latency 
Table snapshot 
Higher throughput 
Gets 
Short scan 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
55 
Request Scheduling 
• We want to MR for analytics while 
serving low-latency requests in one 
cluster. 
• Performance Isolation (proposed) 
– Limit performance impact load on one 
table has on others. (HBASE-6721) 
• Request prioritization and 
scheduling 
– Current default is FIFO, added 
Deadline 
– Prioritize short requests before long 
scans (HBASE-10994) 
• Throttling 
– Limit the request throughput of MR 
jobs. 
Mixed workload 
Delayed by 
long scan 
requests 
1 1 2 1 1 3 1 
1 1 1 1 1 2 3 
Rescheduled 
so new request 
get priority 
Isolated 
workload 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
56 
Conclusions
57 
Big Data Workloads 
Low 
latency 
Batch 
HDFS 
+ Impala 
HDFS + MR 
(Hive/Pig) 
HBase 
HBase + MR 
HBase + Snapshots 
-> HDFS + MR 
Random Access Short Scan Full Scan 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
58 
Big Data Workloads 
Low 
latency 
Batch 
HDFS 
+ Impala 
Analytic archive 
Entity Time series 
Hybrid Entity Time 
series 
+ Rollup generation 
HDFS + MR 
(Hive/Pig) 
Simple Entities 
Graph data 
HBase 
Current Metrics 
Messages 
HBase + MR 
Hybrid Entity Time 
series 
+ Rollup serving 
HBase + Snapshots 
-> HDFS + MR 
Index building 
Random Access Short Scan Full Scan 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
59 
HBase is evolving to be an Operational 
Database 
• Excels at consistent row-centric operations 
– Dev efforts aimed at using all machine resources efficiently, reducing MTTR, 
and improving latency predictability. 
– Projects built on HBase that enable secondary indexing and multi-row 
transactions 
– Apache Phoenix or Impala provide a SQL skin for simplified application 
development 
– Evolution towards OLTP workloads 
• Analytic workloads? 
– Can be done but will be beaten by direct HDFS + MR/Spark/Impala 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
60 
Join the Discussion 
Get community 
help or provide 
feedback 
cloudera.com/communi 
ty 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
61 
Try Hadoop 
Now 
cloudera.com/live 
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
Thank you! 
More questions? 
Join us at Office 
Hours 
4pm @ Table B

More Related Content

What's hot

Apache Spark on Apache HBase: Current and Future
Apache Spark on Apache HBase: Current and Future Apache Spark on Apache HBase: Current and Future
Apache Spark on Apache HBase: Current and Future HBaseCon
 
Apache HBase - Just the Basics
Apache HBase - Just the BasicsApache HBase - Just the Basics
Apache HBase - Just the BasicsHBaseCon
 
Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBaseHortonworks
 
Big Data Fundamentals in the Emerging New Data World
Big Data Fundamentals in the Emerging New Data WorldBig Data Fundamentals in the Emerging New Data World
Big Data Fundamentals in the Emerging New Data WorldJongwook Woo
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base InstallCloudera, Inc.
 
HBase Schema Design - HBase-Con 2012
HBase Schema Design - HBase-Con 2012HBase Schema Design - HBase-Con 2012
HBase Schema Design - HBase-Con 2012Ian Varley
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionCloudera, Inc.
 
Data Evolution in HBase
Data Evolution in HBaseData Evolution in HBase
Data Evolution in HBaseHBaseCon
 
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012larsgeorge
 
Facebook - Jonthan Gray - Hadoop World 2010
Facebook - Jonthan Gray - Hadoop World 2010Facebook - Jonthan Gray - Hadoop World 2010
Facebook - Jonthan Gray - Hadoop World 2010Cloudera, Inc.
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...Simplilearn
 
Hadoop and HBase in the Real World
Hadoop and HBase in the Real WorldHadoop and HBase in the Real World
Hadoop and HBase in the Real WorldCloudera, Inc.
 
HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the BasicsHBaseCon
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practicelarsgeorge
 
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBaseHBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBaseCloudera, Inc.
 
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon
 
HBaseCon 2013: Integration of Apache Hive and HBase
HBaseCon 2013: Integration of Apache Hive and HBaseHBaseCon 2013: Integration of Apache Hive and HBase
HBaseCon 2013: Integration of Apache Hive and HBaseCloudera, Inc.
 
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)Suman Srinivasan
 

What's hot (20)

Apache Spark on Apache HBase: Current and Future
Apache Spark on Apache HBase: Current and Future Apache Spark on Apache HBase: Current and Future
Apache Spark on Apache HBase: Current and Future
 
Apache HBase - Just the Basics
Apache HBase - Just the BasicsApache HBase - Just the Basics
Apache HBase - Just the Basics
 
Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBase
 
Big Data Fundamentals in the Emerging New Data World
Big Data Fundamentals in the Emerging New Data WorldBig Data Fundamentals in the Emerging New Data World
Big Data Fundamentals in the Emerging New Data World
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base Install
 
HBase Schema Design - HBase-Con 2012
HBase Schema Design - HBase-Con 2012HBase Schema Design - HBase-Con 2012
HBase Schema Design - HBase-Con 2012
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
 
Data Evolution in HBase
Data Evolution in HBaseData Evolution in HBase
Data Evolution in HBase
 
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
 
Facebook - Jonthan Gray - Hadoop World 2010
Facebook - Jonthan Gray - Hadoop World 2010Facebook - Jonthan Gray - Hadoop World 2010
Facebook - Jonthan Gray - Hadoop World 2010
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
 
Hadoop and HBase in the Real World
Hadoop and HBase in the Real WorldHadoop and HBase in the Real World
Hadoop and HBase in the Real World
 
HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the Basics
 
Impala for PhillyDB Meetup
Impala for PhillyDB MeetupImpala for PhillyDB Meetup
Impala for PhillyDB Meetup
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
 
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBaseHBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
 
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
 
HBaseCon 2013: Integration of Apache Hive and HBase
HBaseCon 2013: Integration of Apache Hive and HBaseHBaseCon 2013: Integration of Apache Hive and HBase
HBaseCon 2013: Integration of Apache Hive and HBase
 
Introduction to HBase
Introduction to HBaseIntroduction to HBase
Introduction to HBase
 
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
 

Similar to Apache HBase Application Archetypes

Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2tcloudcomputing-tw
 
Hadoop @ eBay: Past, Present, and Future
Hadoop @ eBay: Past, Present, and FutureHadoop @ eBay: Past, Present, and Future
Hadoop @ eBay: Past, Present, and FutureRyan Hennig
 
Apache hadoop: POSH Meetup Palo Alto, CA April 2014
Apache hadoop: POSH Meetup Palo Alto, CA April 2014Apache hadoop: POSH Meetup Palo Alto, CA April 2014
Apache hadoop: POSH Meetup Palo Alto, CA April 2014Kevin Crocker
 
OpenStack Trove Day (19 Aug 2014, Cambridge MA) - Sahara
OpenStack Trove Day (19 Aug 2014, Cambridge MA)  - SaharaOpenStack Trove Day (19 Aug 2014, Cambridge MA)  - Sahara
OpenStack Trove Day (19 Aug 2014, Cambridge MA) - Saharaspinningmatt
 
Real Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeReal Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeMapR Technologies
 
Real Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeReal Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeDataWorks Summit
 
HBase and Hadoop at Urban Airship
HBase and Hadoop at Urban AirshipHBase and Hadoop at Urban Airship
HBase and Hadoop at Urban Airshipdave_revell
 
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...Chris Huang
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...StreamNative
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo pptPhil Young
 
The Nuts and Bolts of Hadoop and it's Ever-changing Ecosystem, Presented by J...
The Nuts and Bolts of Hadoop and it's Ever-changing Ecosystem, Presented by J...The Nuts and Bolts of Hadoop and it's Ever-changing Ecosystem, Presented by J...
The Nuts and Bolts of Hadoop and it's Ever-changing Ecosystem, Presented by J...NashvilleTechCouncil
 
Big_data_1674238705.ppt is a basic background
Big_data_1674238705.ppt is a basic backgroundBig_data_1674238705.ppt is a basic background
Big_data_1674238705.ppt is a basic backgroundNidhiAhuja30
 
Big Data/Hadoop Option Analysis
Big Data/Hadoop Option AnalysisBig Data/Hadoop Option Analysis
Big Data/Hadoop Option Analysiszafarali1981
 
Interactive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroDataInteractive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroDataOfir Manor
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 DataWorks Summit
 
Survey of Real-time Processing Systems for Big Data
Survey of Real-time Processing Systems for Big DataSurvey of Real-time Processing Systems for Big Data
Survey of Real-time Processing Systems for Big DataLuiz Henrique Zambom Santana
 
Storage Engine Considerations for Your Apache Spark Applications with Mladen ...
Storage Engine Considerations for Your Apache Spark Applications with Mladen ...Storage Engine Considerations for Your Apache Spark Applications with Mladen ...
Storage Engine Considerations for Your Apache Spark Applications with Mladen ...Spark Summit
 

Similar to Apache HBase Application Archetypes (20)

Using Apache Drill
Using Apache DrillUsing Apache Drill
Using Apache Drill
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
 
Hadoop @ eBay: Past, Present, and Future
Hadoop @ eBay: Past, Present, and FutureHadoop @ eBay: Past, Present, and Future
Hadoop @ eBay: Past, Present, and Future
 
Apache hadoop: POSH Meetup Palo Alto, CA April 2014
Apache hadoop: POSH Meetup Palo Alto, CA April 2014Apache hadoop: POSH Meetup Palo Alto, CA April 2014
Apache hadoop: POSH Meetup Palo Alto, CA April 2014
 
OpenStack Trove Day (19 Aug 2014, Cambridge MA) - Sahara
OpenStack Trove Day (19 Aug 2014, Cambridge MA)  - SaharaOpenStack Trove Day (19 Aug 2014, Cambridge MA)  - Sahara
OpenStack Trove Day (19 Aug 2014, Cambridge MA) - Sahara
 
Real Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeReal Time and Big Data – It’s About Time
Real Time and Big Data – It’s About Time
 
Real Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeReal Time and Big Data – It’s About Time
Real Time and Big Data – It’s About Time
 
HBase and Hadoop at Urban Airship
HBase and Hadoop at Urban AirshipHBase and Hadoop at Urban Airship
HBase and Hadoop at Urban Airship
 
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
 
201305 hadoop jpl-v3
201305 hadoop jpl-v3201305 hadoop jpl-v3
201305 hadoop jpl-v3
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo ppt
 
The Nuts and Bolts of Hadoop and it's Ever-changing Ecosystem, Presented by J...
The Nuts and Bolts of Hadoop and it's Ever-changing Ecosystem, Presented by J...The Nuts and Bolts of Hadoop and it's Ever-changing Ecosystem, Presented by J...
The Nuts and Bolts of Hadoop and it's Ever-changing Ecosystem, Presented by J...
 
Big_data_1674238705.ppt is a basic background
Big_data_1674238705.ppt is a basic backgroundBig_data_1674238705.ppt is a basic background
Big_data_1674238705.ppt is a basic background
 
Big Data/Hadoop Option Analysis
Big Data/Hadoop Option AnalysisBig Data/Hadoop Option Analysis
Big Data/Hadoop Option Analysis
 
Big Data Concepts
Big Data ConceptsBig Data Concepts
Big Data Concepts
 
Interactive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroDataInteractive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroData
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
 
Survey of Real-time Processing Systems for Big Data
Survey of Real-time Processing Systems for Big DataSurvey of Real-time Processing Systems for Big Data
Survey of Real-time Processing Systems for Big Data
 
Storage Engine Considerations for Your Apache Spark Applications with Mladen ...
Storage Engine Considerations for Your Apache Spark Applications with Mladen ...Storage Engine Considerations for Your Apache Spark Applications with Mladen ...
Storage Engine Considerations for Your Apache Spark Applications with Mladen ...
 

More from Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Apache HBase Application Archetypes

  • 1. Apache HBase Application Archetypes Strata + Hadoop World Barcelona. November 20h , 2014 Lars George | @larsgeorge | Cloudera EMEA Chief Architect | HBase PMC Jonathan Hsieh | @jmhsieh | Cloudera HBase Tech lead | HBase PMC
  • 2. 2 About Lars and Jon Lars George • EMEA Chief Architect @Cloudera – Apache HBase PMC – O’Reilly Author of HBase – The Definitive Guide • Contact – lars@cloudera.com – @larsgeorge Jon Hsieh • Tech Lead HBase Team @Cloudera – Apache HBase PMC – Apache Flume founder • Contact – jon@cloudera.com – @jmhsieh 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 3. 3 About Supporting HBase at Cloudera • Supporting Customers using HBase since 2011 – HBase Training – Professional Services • Team has experience supporting and running HBase since 2009 – 9 committers on staff – 2 HBase book authors • As of Jan 2014, ~20,000 HBase nodes (in aggregate) under management • Information in this presentation is either aggregated customer data or from public sources. 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 4. 4 An Apache HBase Timeline 2008 2009 2010 2011 2012 2013 2014 Apr’11: CDH3 GA with HBase 0.90.1 May ‘12: HBaseCon 2012 Jun ‘13: HBaseCon 2013 Summer‘11: Messages on HBase Summer ‘09 StumbleUpon goes production on HBase ~0.20 Nov ‘11: Cassini on HBase Jan ‘13 Phoenix on HBase Summer‘11: Web Crawl Cache Sept’11: HBase TDG published Nov’12: HBase in Action published May ‘14: HBaseCon 2014 Aug ‘13 Flurry 1k-1k node cluster replication 2015 Fall’14/Winter ‘15 HBase v1.0.0 released Jan’14: Cloudera has ~20k HBase nodes under management 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 5. 5 Apache HBase “Nascar” Slide 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 6. 6 Outline • Definitions • Archetypes –The Good –The Bad –The Maybe • Conclusion 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 7. 7 Definitions A vocabulary for HBase Archetypes
  • 8. 8 Defining HBase Archetypes • There are a lot of HBase applications – Some successful, some less so – They have common architecture patterns – They have common tradeoffs • Archetypes are common architecture patterns – Common across multiple use-cases – Extracted to be repeatable • Our Goal: Define patterns à la “Gang of Four” (Gamma, Helm, Johnson, Vlissides) 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 9. 9 So you want to use HBase? • What data is being stored? – Entity data – Event data • Why is the data being stored? – Operational use cases – Analytical use cases • How does the data get in and out? – Real time vs. Batch – Random vs. Sequential 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 10. 10 What is being stored? There are primarly two kinds of big data workloads. They have different storage requirements. Entities Events 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 11. 11 Entity Centric Data • Entity data is information about current state – Generally real time reads and writes • Examples: – Accounts – Users – Geolocation points – Click Counts and Metrics – Current Sensors Reading • Scales up with # of Humans and # of Machines/Sensors – Billions of distinct entities 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 12. 12 Event Centric Data • Event centric data are time-series data points recording successive points spaced over time intervals. – Generally real time write, some combination of real time read or batch read • Examples: – Sensor data over time – Historical Stock Ticker data – Historical Metrics – Click time-series • Scales up due to finer grained intervals, retention policies, and the passage of time 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 13. 13 Events about Entities • Majority Big Data use cases are dealing with event-based data – |Entities| * |Events| = Big data • When you ask questions, do you hone in on entity first? • When you ask questions, do you hone in on time ranges first? • Your answer will help you determine where and how to store your data. 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 14. 14 Why are you storing the data? • So what kind of questions are you asking the data? • Entity-centric questions – Give me everything about entity e – Give me the most recent event v about entity e – Give me the n most recent events V about entity e – Give me all events V about e between time [t1,t2] • Event and Time-centric questions – Give me an aggregate for each entity between time [t1,t2] – Give me an aggregate for each time interval for entity e – Find events V that match some other given criteria 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 15. 15 How does data get in and out of HBase? Put, Incr, Append HBase Client Gets Short scan HBase Client Full Scan, MapReduce HBase Scanner Bulk Import HBase Client HBase Replication HBase Replication 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 16. 16 How does data get in and out of HBase? Put, Incr, Append HBase Client Get, Scan HBase Client Bulk Import HBase Client HBase Replication HBase Replication low latency high throughput Gets Short scan Full Scan, MapReduce HBase Scanner 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 17. 17 What system is most efficient? • It is all physics • You have a limited I/O budget – Use all your I/O by parallelizing access and read/write sequentially. – Choose the system and features that reduces I/O in general • Pick the system that is best for your workload IOPs/s/disk 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 18. 18 The physics of Hadoop Storage Systems Workload HBase HDFS Low Latency ms, cached mins, MR + seconds, Impala Random Read primary index - index?, small files problem Short Scan sorted + partition Full Scan 0 live table + (MR on snapshots) MR, Hive, Impala Random Write log structured - Not supported Sequential Write hbase overhead bulk load minimal overhead Updates log structured - Not supported 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 19. 19 The physics of Hadoop Storage Systems Workload HBase HDFS Low Latency ms, cached mins, MR + seconds, Impala Random Read primary index - index?, small files problem Short Scan sorted + partition Full Scan 0 live table + (MR on snapshots) MR, Hive, Impala Random Write log structured - Not supported Sequential Write hbase overhead bulk load minimal overhead Updates log structured - Not supported 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 20. 20 The physics of Hadoop Storage Systems Workload HBase HDFS Low Latency ms, cached mins, MR + seconds, Impala Random Read primary index - index?, small files problem Short Scan sorted + partition Full Scan 0 live table + (MR on snapshots) MR, Hive, Impala Random Write log structured - not supported Sequential Write HBase overhead bulk load minimal overhead Updates log structured - not supported 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 21. 21 The Archetypes HBase Applications
  • 22. 22 HBase application use cases • The Good – Simple Entities – Messaging Store – Graph Store – Metrics Store • The Bad – Large Blobs – Naïve RDBMS port – Analytic Archive • The Maybe – Time series DB – Combined workloads 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 23. 23 Archetypes: The Good HBase, you are my soul mate.
  • 24. 24 Archetype: Simple Entities • Purely entity data, no relation between entities – Batch or real-time, random writes – Real-time, random reads – Could be a well-done denormalized RDBMS port – Often from many different sources, with poly-structured data • Schema: – Row per entity – Row key => entity ID, or hash of entity ID – Col qualifier => Property / field, possibly time stamp • Examples: – Geolocation data – Search index building – Use solr to make text data searchable 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 25. 25 Simple Entities access pattern Put, Incr, Append HBase Client Get, Scan HBase Client HBase Replication Bulk Import HBase Client low latency high throughput Gets Short scan HBase Replication Full Scan, MapReduce HBase Scanner Solr 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 26. 26 Archetype: Messaging Store • Messaging Data: – Realtime random writes: Emails, SMS, MMS, IM – Realtime random updates: Msg read, starred, moved, deleted – Reading of top-N entries, sorted by time – Records are of varying size – Some time series, but mostly random read/write • Schema: – Row = users/feed/inbox – Row key = UID or UID + time – Column Qualifier = time or conversation id + time. – Use CF’s for indexes. • Examples: – Facebook Messages, Xiaomi Messages – Telco SMS/MMS services – Feeds like tumblr, pinterest 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 27. 27 Facebook Messages - Statistics Source: HBaseCon 2012 - Anshuman Singh 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 28. 28 Messages Access Pattern Put, Incr, Append HBase Client Get, Scan HBase Client Bulk Import HBase Client HBase Replication HBase Replication low latency high throughput Gets Short scan Full Scan, MapReduce HBase Scanner 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 29. 29 Archetype: Graph Data • Graph Data: All entities and relations – Batch or realtime, random writes – Batch or realtime, random reads – Its an entity with relation edges • Schema: – Row = Node. – Row key => Node ID. – Col qualifier => Edge ID, or properties:values • Examples: – Web Caches – Yahoo!, Trend Micro – Titan Graph DB with HBase storage backend – Sessionization (financial transactions, clicks streams, network traffic) – Government (connect the bad guy) 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 30. 30 Graph Data Access Pattern Put, Incr, Append HBase Client Get, Scan HBase Client Bulk Import HBase Client HBase Replication HBase Replication low latency high throughput Gets Short scan Full Scan, MapReduce HBase Scanner 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 31. 31 Archetype: Metrics • Frequently updated metrics – Increments – Roll ups generated by MR and bulk loaded to HBase • Schema – Row: Entity for a time period – Row key: entity-<yymmddhh> (granular time) – Col Qualifier: property -> count • Examples – Campaign Impression/Click counts (Ad tech) – Sensor data (Energy, Manufacturing, Auto) 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 32. 32 Metrics Access Pattern Put, Incr, Append HBase Client Get, Scan HBase Client Bulk Import HBase Client HBase Replication HBase Replication low latency high throughput Gets Short scan Full Scan, MapReduce HBase Scanner 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 33. 33 Archetypes: The Bad These are not the droids you are looking for
  • 34. 34 Current HBase weak spots • HBase’s architecture can handle a lot – Engineering tradeoffs optimize for some usecases and against others – HBase can still do things it is not optimal for – However, other systems are fundamentally more efficient for some workloads • We’ve seen folks forcing apps into HBase – If there is only one workloads on the data, consider another system – If there is a mixed workload, some of cases become “maybes” • Just because it is not good today, doesn’t mean it can’t be better tomorrow! 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 35. 35 Bad Archetype: Large Blob Store • Saving large objects >3MB per cell • Schema: – Normal entity pattern, but with some columns with large cells • Examples – Raw photo or video storage in HBase – Large frequently updated structs as a single cell • Problems: – Write amplification when reoptimizing data for read (compactions on large unchanging data) – Write amplification when large structs rewritten to update subfields. Cells are atomic, and HBase must rewrite an entire cell • Note: Medium Binary Object (MOB) support coming (lots of 100KB-10MB cells) – See HBASE-11339 for more details. 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 36. 36 Bad Archetype: Naïve RDBMS port • A naïve port of an RDBMS into HBase, directly copying the schema • Schema – Many tables, just like an RDBMS schema – Row key: primary key or auto-incrementing key, like RDBMS schema – Column qualifiers: field names – Manually do joins, or secondary indexes (not consistent) • Solution: – HBase is not a SQL Database – No multi-region/multi-table in HBase transactions (yet) – No built in join support. Must to denormalize your schema to use HBase 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 37. 37 Large blob store, Naïve RDBMS port access patterns Put, Incr, Append HBase Client Get, Scan HBase Client Bulk Import HBase Client HBase Replication HBase Replication low latency high throughput Gets Short scan Full Scan, MapReduce HBase Scanner 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 38. 38 Bad Archetype: Analytic archive • Store purely chronological data, partitioned by time – Real time writes, chronological time as primary index – Column-centric aggregations over all rows – Bulk reads out, generally for generating periodic reports • Schema – Row key: date+xxx or salt+date+xxx – Column qualifiers: properties with data or counters • Example – Machine logs organized by date (causes write hotspotting) – Full fidelity clickstream organzied by date (as opposed to campaign) 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 39. 39 Bad Archetype: Analytic Archive Problems • HBase non-optimal as primary use case – Will get crushed by frequent full table scans – Will get crushed by large compactions – Will get crushed by write-side region hot spotting • Solution: – Store in HDFS; Use Parquet columnar data storage + Impala/Hive – Build rollups in HDFS+MR; store and serve rollups in HBase 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 40. 40 Analytic Archive access patterns Put, Incr, Append HBase Client Get, Scan HBase Client Bulk Import HBase Client HBase Replication HBase Replication low latency high throughput Gets Short scan Full Scan, MapReduce HBase Scanner 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 41. 41 Archetypes: The Maybe And this is crazy | But here’s my data, | serve it, maybe!
  • 42. 42 The Maybe’s • For some applications, doing it right gets complicated. • More sophisticated or nuanced cases • Require considering these questions: – When do you choose HBase vs HDFS storage for time series data? – Are there times where bad archetypes are ok? 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 43. 43 Time Series: in HBase or HDFS? • Timeseries IO Pattern Physics: – Reads: Collocate related data • Make reads cheap and fast – Writes: Spread writes out as much as possible • Maximize write throughput • HBase: Tension between these goals – Spreading writes spreads data making reads inefficient – Colocating on write causes hotspots, underutilizes resources by limiting write throughput • HDFS: The sweet spot – Sequential writes and and sequential read – Just write more files in date-dirs; physically spreads writes but logically groups data – Reads for time centric queries: just read files in date-dir 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 44. 44 Time Series data flows • Ingest – Flume or similar direct tool via app • HDFS for historical – No real time serving – Batch queries and generate rollups in Hive/MR – Faster queries in Impala • HBase for recent – Serve individual events – Serve pre-computed aggregates 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 45. 45 Archetype: Entity Time Series • Full fidelity historical record of metrics – Random write to event data, random read specific event or aggregate data • Schema: – Rowkey: entity-timestamp or hash(entity)-timestamp, possibly with salt added after entity – Col qualifiers: granular time stamps -> value – Use custom aggregation to consolidate old data – Use TTL’s to bound and age off old data • Examples: – OpenTSDB is a system on HBase that handles this for numeric values • Lazily aggregates cells for better performance – Facebook Insights, ODS 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 46. 46 Entity Time Series access pattern Put, Incr, Append HBase Client Get, Scan HBase Client Bulk Import HBase Client HBase Replication HBase Replication low latency high throughput Gets Short scan Full Scan, MapReduce HBase Scanner Flume Custom App 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 47. 47 Archetypes: Hybrid Entity Time Series • Essentially a combo of the Metric Archetype and Entity Time Series Archetype, with bulk loads of rollups via HDFS – Land data in HDFS and HBase – Keep all data in HDFS for future use – Aggregate in HDFS and write to HBase – HBase can do some aggregates too (counters) – Keep serve-able data in HBase – Use TTL to discard old values from HBase 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 48. 48 Hybrid time series access pattern Put, Incr, Append HBase Client Get, Scan HBase Client Hive or MR: Bulk Import HBase Client HBase Replication HBase Replication low latency high throughput Gets Short scan Full Scan, MapReduce HBase Scanner Flume HDFS 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 49. 49 Meta Archetype: Combined workloads • In these cases, the use of HBase depends on workload • Cases where we have multiple workloads styles. – Many cases we want to do multiple things with the same data – Primary use case (real time, random access) – Secondary use case (analytical) – Pick for your primary, here’s some patterns on how to do your secondary. 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 50. 50 Operational with Analytical access pattern Get, Scan HBase Client poor latency! full scans interfere with latency! high throughput MapReduce HBase Scanner Put, Incr, Append HBase Client HBase Replication Bulk Import HBase Client 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 51. 51 Operational with Analytical access pattern Get, Scan HBase Client HBase Replication low latency Isolated from full scans high throughput MapReduce HBase Scanner Put, Incr, Append HBase Client HBase Replication Bulk Import HBase Client high throughput 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 52. 52 MR over Table Snapshots (0.98, CDH5.0) • Previously MapReduce jobs over HBase required online full table scan • Take a snapshot and run MR job over snapshot files – Doesn’t use HBase client – Avoid affecting HBase caches – 3-5x perf boost. – Still requires more IOPs than HDFS raw files map map map map map map map map reduce reduce reduce map map map map map map map map reduce reduce reduce snapshot 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 53. 53 Analytic Archive access pattern Put, Incr, Append HBase Client Get, Scan HBase Client Bulk Import HBase Client HBase Replication HBase Replication low latency high throughput Gets Short scan Full Scan, MapReduce HBase Scanner 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 54. 54 Analytic Archive Snapshot access pattern HDFS Put, Incr, Append HBase Client HBase Client Snapshot Scan, MR HBase Scanner Bulk Import HBase Client HBase Replication HBase Replication low latency Table snapshot Higher throughput Gets Short scan 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 55. 55 Request Scheduling • We want to MR for analytics while serving low-latency requests in one cluster. • Performance Isolation (proposed) – Limit performance impact load on one table has on others. (HBASE-6721) • Request prioritization and scheduling – Current default is FIFO, added Deadline – Prioritize short requests before long scans (HBASE-10994) • Throttling – Limit the request throughput of MR jobs. Mixed workload Delayed by long scan requests 1 1 2 1 1 3 1 1 1 1 1 1 2 3 Rescheduled so new request get priority Isolated workload 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 57. 57 Big Data Workloads Low latency Batch HDFS + Impala HDFS + MR (Hive/Pig) HBase HBase + MR HBase + Snapshots -> HDFS + MR Random Access Short Scan Full Scan 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 58. 58 Big Data Workloads Low latency Batch HDFS + Impala Analytic archive Entity Time series Hybrid Entity Time series + Rollup generation HDFS + MR (Hive/Pig) Simple Entities Graph data HBase Current Metrics Messages HBase + MR Hybrid Entity Time series + Rollup serving HBase + Snapshots -> HDFS + MR Index building Random Access Short Scan Full Scan 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 59. 59 HBase is evolving to be an Operational Database • Excels at consistent row-centric operations – Dev efforts aimed at using all machine resources efficiently, reducing MTTR, and improving latency predictability. – Projects built on HBase that enable secondary indexing and multi-row transactions – Apache Phoenix or Impala provide a SQL skin for simplified application development – Evolution towards OLTP workloads • Analytic workloads? – Can be done but will be beaten by direct HDFS + MR/Spark/Impala 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 60. 60 Join the Discussion Get community help or provide feedback cloudera.com/communi ty 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 61. 61 Try Hadoop Now cloudera.com/live 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
  • 62. Thank you! More questions? Join us at Office Hours 4pm @ Table B

Editor's Notes

  1. Given that Hbase stores a large sorted map, the API looks similar to a map. You can get or put individual rows, or scan a range of rows. There is also a very efficient way of incrementing a particular cell – this can be useful for maintaining high performance counters or statistics. Lastly, it’s possible to write MapReduce jobs that analyze the data in Hbase.
  2. Given that Hbase stores a large sorted map, the API looks similar to a map. You can get or put individual rows, or scan a range of rows. There is also a very efficient way of incrementing a particular cell – this can be useful for maintaining high performance counters or statistics. Lastly, it’s possible to write MapReduce jobs that analyze the data in Hbase.
  3. Given that Hbase stores a large sorted map, the API looks similar to a map. You can get or put individual rows, or scan a range of rows. There is also a very efficient way of incrementing a particular cell – this can be useful for maintaining high performance counters or statistics. Lastly, it’s possible to write MapReduce jobs that analyze the data in Hbase.
  4. Given that Hbase stores a large sorted map, the API looks similar to a map. You can get or put individual rows, or scan a range of rows. There is also a very efficient way of incrementing a particular cell – this can be useful for maintaining high performance counters or statistics. Lastly, it’s possible to write MapReduce jobs that analyze the data in Hbase.
  5. Given that Hbase stores a large sorted map, the API looks similar to a map. You can get or put individual rows, or scan a range of rows. There is also a very efficient way of incrementing a particular cell – this can be useful for maintaining high performance counters or statistics. Lastly, it’s possible to write MapReduce jobs that analyze the data in Hbase.
  6. Given that Hbase stores a large sorted map, the API looks similar to a map. You can get or put individual rows, or scan a range of rows. There is also a very efficient way of incrementing a particular cell – this can be useful for maintaining high performance counters or statistics. Lastly, it’s possible to write MapReduce jobs that analyze the data in Hbase.
  7. Given that Hbase stores a large sorted map, the API looks similar to a map. You can get or put individual rows, or scan a range of rows. There is also a very efficient way of incrementing a particular cell – this can be useful for maintaining high performance counters or statistics. Lastly, it’s possible to write MapReduce jobs that analyze the data in Hbase.
  8. Given that Hbase stores a large sorted map, the API looks similar to a map. You can get or put individual rows, or scan a range of rows. There is also a very efficient way of incrementing a particular cell – this can be useful for maintaining high performance counters or statistics. Lastly, it’s possible to write MapReduce jobs that analyze the data in Hbase.
  9. Given that Hbase stores a large sorted map, the API looks similar to a map. You can get or put individual rows, or scan a range of rows. There is also a very efficient way of incrementing a particular cell – this can be useful for maintaining high performance counters or statistics. Lastly, it’s possible to write MapReduce jobs that analyze the data in Hbase.
  10. Given that Hbase stores a large sorted map, the API looks similar to a map. You can get or put individual rows, or scan a range of rows. There is also a very efficient way of incrementing a particular cell – this can be useful for maintaining high performance counters or statistics. Lastly, it’s possible to write MapReduce jobs that analyze the data in Hbase.
  11. Given that Hbase stores a large sorted map, the API looks similar to a map. You can get or put individual rows, or scan a range of rows. There is also a very efficient way of incrementing a particular cell – this can be useful for maintaining high performance counters or statistics. Lastly, it’s possible to write MapReduce jobs that analyze the data in Hbase.
  12. Given that Hbase stores a large sorted map, the API looks similar to a map. You can get or put individual rows, or scan a range of rows. There is also a very efficient way of incrementing a particular cell – this can be useful for maintaining high performance counters or statistics. Lastly, it’s possible to write MapReduce jobs that analyze the data in Hbase.
  13. Given that Hbase stores a large sorted map, the API looks similar to a map. You can get or put individual rows, or scan a range of rows. There is also a very efficient way of incrementing a particular cell – this can be useful for maintaining high performance counters or statistics. Lastly, it’s possible to write MapReduce jobs that analyze the data in Hbase.
  14. Given that Hbase stores a large sorted map, the API looks similar to a map. You can get or put individual rows, or scan a range of rows. There is also a very efficient way of incrementing a particular cell – this can be useful for maintaining high performance counters or statistics. Lastly, it’s possible to write MapReduce jobs that analyze the data in Hbase.