Introducing the data science sandbox as a service 8.30.18
Apache HBase Application Archetypes
1. Apache HBase
Application
Archetypes
Strata + Hadoop World Barcelona.
November 20h , 2014
Lars George | @larsgeorge | Cloudera EMEA Chief Architect |
HBase PMC
Jonathan Hsieh | @jmhsieh | Cloudera HBase Tech lead | HBase
PMC
2. 2
About Lars and Jon
Lars George
• EMEA Chief Architect
@Cloudera
– Apache HBase PMC
– O’Reilly Author of HBase –
The Definitive Guide
• Contact
– lars@cloudera.com
– @larsgeorge
Jon Hsieh
• Tech Lead HBase Team
@Cloudera
– Apache HBase PMC
– Apache Flume founder
• Contact
– jon@cloudera.com
– @jmhsieh
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
3. 3
About Supporting HBase at Cloudera
• Supporting Customers using HBase since 2011
– HBase Training
– Professional Services
• Team has experience supporting and running HBase since 2009
– 9 committers on staff
– 2 HBase book authors
• As of Jan 2014, ~20,000 HBase nodes (in aggregate) under
management
• Information in this presentation is either aggregated customer data
or from public sources.
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
4. 4
An Apache HBase Timeline
2008 2009 2010 2011 2012 2013 2014
Apr’11: CDH3
GA with HBase
0.90.1
May ‘12:
HBaseCon
2012
Jun ‘13:
HBaseCon
2013
Summer‘11:
Messages
on HBase
Summer ‘09
StumbleUpon
goes production
on HBase ~0.20
Nov ‘11:
Cassini
on HBase
Jan ‘13
Phoenix
on HBase
Summer‘11:
Web Crawl
Cache
Sept’11:
HBase TDG
published
Nov’12:
HBase in
Action
published
May ‘14:
HBaseCon
2014
Aug ‘13
Flurry 1k-1k node
cluster replication
2015
Fall’14/Winter
‘15
HBase v1.0.0
released
Jan’14:
Cloudera has
~20k HBase
nodes under
management
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
5. 5
Apache HBase “Nascar” Slide
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
6. 6
Outline
• Definitions
• Archetypes
–The Good
–The Bad
–The Maybe
• Conclusion
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
8. 8
Defining HBase Archetypes
• There are a lot of HBase applications
– Some successful, some less so
– They have common architecture patterns
– They have common tradeoffs
• Archetypes are common architecture patterns
– Common across multiple use-cases
– Extracted to be repeatable
• Our Goal: Define patterns à la “Gang of Four” (Gamma, Helm,
Johnson, Vlissides)
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
9. 9
So you want to use HBase?
• What data is being stored?
– Entity data
– Event data
• Why is the data being stored?
– Operational use cases
– Analytical use cases
• How does the data get in and out?
– Real time vs. Batch
– Random vs. Sequential
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
10. 10
What is being stored?
There are primarly two kinds of big data workloads. They have
different storage requirements.
Entities Events
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
11. 11
Entity Centric Data
• Entity data is information about current state
– Generally real time reads and writes
• Examples:
– Accounts
– Users
– Geolocation points
– Click Counts and Metrics
– Current Sensors Reading
• Scales up with # of Humans and # of Machines/Sensors
– Billions of distinct entities
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
12. 12
Event Centric Data
• Event centric data are time-series data points recording successive
points spaced over time intervals.
– Generally real time write, some combination of real time read or batch read
• Examples:
– Sensor data over time
– Historical Stock Ticker data
– Historical Metrics
– Click time-series
• Scales up due to finer grained intervals, retention policies, and the
passage of time
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
13. 13
Events about Entities
• Majority Big Data use cases are dealing with event-based data
– |Entities| * |Events| = Big data
• When you ask questions, do you hone in on entity first?
• When you ask questions, do you hone in on time ranges first?
• Your answer will help you determine where and how to store your
data.
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
14. 14
Why are you storing the data?
• So what kind of questions are you asking the data?
• Entity-centric questions
– Give me everything about entity e
– Give me the most recent event v about entity e
– Give me the n most recent events V about entity e
– Give me all events V about e between time [t1,t2]
• Event and Time-centric questions
– Give me an aggregate for each entity between time [t1,t2]
– Give me an aggregate for each time interval for entity e
– Find events V that match some other given criteria
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
15. 15
How does data get in and out of HBase?
Put, Incr, Append
HBase Client
Gets
Short scan
HBase Client
Full Scan,
MapReduce
HBase Scanner
Bulk Import
HBase Client
HBase
Replication
HBase
Replication
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
16. 16
How does data get in and out of HBase?
Put, Incr, Append
HBase Client
Get, Scan
HBase Client
Bulk Import
HBase Client
HBase
Replication
HBase
Replication
low latency
high throughput
Gets
Short scan
Full Scan,
MapReduce
HBase Scanner
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
17. 17
What system is most efficient?
• It is all physics
• You have a limited I/O budget
– Use all your I/O by parallelizing access and
read/write sequentially.
– Choose the system and features that reduces
I/O in general
• Pick the system that is best for your
workload IOPs/s/disk
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
18. 18
The physics of Hadoop Storage Systems
Workload HBase HDFS
Low Latency ms, cached mins, MR
+ seconds, Impala
Random Read primary index - index?, small files
problem
Short Scan sorted + partition
Full Scan 0 live table
+ (MR on snapshots)
MR, Hive, Impala
Random Write log structured - Not supported
Sequential Write hbase overhead
bulk load
minimal overhead
Updates log structured - Not supported
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
19. 19
The physics of Hadoop Storage Systems
Workload HBase HDFS
Low Latency ms, cached mins, MR
+ seconds, Impala
Random Read primary index - index?, small files
problem
Short Scan sorted + partition
Full Scan 0 live table
+ (MR on snapshots)
MR, Hive, Impala
Random Write log structured - Not supported
Sequential Write hbase overhead
bulk load
minimal overhead
Updates log structured - Not supported
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
20. 20
The physics of Hadoop Storage Systems
Workload HBase HDFS
Low Latency ms, cached mins, MR
+ seconds, Impala
Random Read primary index - index?, small files
problem
Short Scan sorted + partition
Full Scan 0 live table
+ (MR on snapshots)
MR, Hive, Impala
Random Write log structured - not supported
Sequential Write HBase overhead
bulk load
minimal overhead
Updates log structured - not supported
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
22. 22
HBase application use cases
• The Good
– Simple Entities
– Messaging Store
– Graph Store
– Metrics Store
• The Bad
– Large Blobs
– Naïve RDBMS port
– Analytic Archive
• The Maybe
– Time series DB
– Combined workloads
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
24. 24
Archetype: Simple Entities
• Purely entity data, no relation between entities
– Batch or real-time, random writes
– Real-time, random reads
– Could be a well-done denormalized RDBMS port
– Often from many different sources, with poly-structured data
• Schema:
– Row per entity
– Row key => entity ID, or hash of entity ID
– Col qualifier => Property / field, possibly time stamp
• Examples:
– Geolocation data
– Search index building
– Use solr to make text data searchable
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
25. 25
Simple Entities access pattern
Put, Incr, Append
HBase Client
Get, Scan
HBase Client
HBase
Replication
Bulk Import
HBase Client
low latency
high throughput
Gets
Short scan
HBase
Replication
Full Scan,
MapReduce
HBase Scanner
Solr
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
26. 26
Archetype: Messaging Store
• Messaging Data:
– Realtime random writes: Emails, SMS, MMS, IM
– Realtime random updates: Msg read, starred, moved, deleted
– Reading of top-N entries, sorted by time
– Records are of varying size
– Some time series, but mostly random read/write
• Schema:
– Row = users/feed/inbox
– Row key = UID or UID + time
– Column Qualifier = time or conversation id + time.
– Use CF’s for indexes.
• Examples:
– Facebook Messages, Xiaomi Messages
– Telco SMS/MMS services
– Feeds like tumblr, pinterest
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
27. 27
Facebook Messages - Statistics
Source: HBaseCon 2012 - Anshuman Singh
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
28. 28
Messages Access Pattern
Put, Incr, Append
HBase Client
Get, Scan
HBase Client
Bulk Import
HBase Client
HBase
Replication
HBase
Replication
low latency
high throughput
Gets
Short scan
Full Scan,
MapReduce
HBase Scanner
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
29. 29
Archetype: Graph Data
• Graph Data: All entities and relations
– Batch or realtime, random writes
– Batch or realtime, random reads
– Its an entity with relation edges
• Schema:
– Row = Node.
– Row key => Node ID.
– Col qualifier => Edge ID, or properties:values
• Examples:
– Web Caches – Yahoo!, Trend Micro
– Titan Graph DB with HBase storage backend
– Sessionization (financial transactions, clicks streams, network traffic)
– Government (connect the bad guy)
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
30. 30
Graph Data Access Pattern
Put, Incr, Append
HBase Client
Get, Scan
HBase Client
Bulk Import
HBase Client
HBase
Replication
HBase
Replication
low latency
high throughput
Gets
Short scan
Full Scan,
MapReduce
HBase Scanner
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
31. 31
Archetype: Metrics
• Frequently updated metrics
– Increments
– Roll ups generated by MR and bulk loaded to HBase
• Schema
– Row: Entity for a time period
– Row key: entity-<yymmddhh> (granular time)
– Col Qualifier: property -> count
• Examples
– Campaign Impression/Click counts (Ad tech)
– Sensor data (Energy, Manufacturing, Auto)
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
32. 32
Metrics Access Pattern
Put, Incr, Append
HBase Client
Get, Scan
HBase Client
Bulk Import
HBase Client
HBase
Replication
HBase
Replication
low latency
high throughput
Gets
Short scan
Full Scan,
MapReduce
HBase Scanner
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
34. 34
Current HBase weak spots
• HBase’s architecture can handle a lot
– Engineering tradeoffs optimize for some usecases and against others
– HBase can still do things it is not optimal for
– However, other systems are fundamentally more efficient for some workloads
• We’ve seen folks forcing apps into HBase
– If there is only one workloads on the data, consider another system
– If there is a mixed workload, some of cases become “maybes”
• Just because it is not good today, doesn’t mean it can’t be better
tomorrow!
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
35. 35
Bad Archetype: Large Blob Store
• Saving large objects >3MB per cell
• Schema:
– Normal entity pattern, but with some columns with large cells
• Examples
– Raw photo or video storage in HBase
– Large frequently updated structs as a single cell
• Problems:
– Write amplification when reoptimizing data for read (compactions on large
unchanging data)
– Write amplification when large structs rewritten to update subfields. Cells are
atomic, and HBase must rewrite an entire cell
• Note: Medium Binary Object (MOB) support coming (lots of 100KB-10MB
cells)
– See HBASE-11339 for more details.
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
36. 36
Bad Archetype: Naïve RDBMS port
• A naïve port of an RDBMS into HBase, directly copying the schema
• Schema
– Many tables, just like an RDBMS schema
– Row key: primary key or auto-incrementing key, like RDBMS schema
– Column qualifiers: field names
– Manually do joins, or secondary indexes (not consistent)
• Solution:
– HBase is not a SQL Database
– No multi-region/multi-table in HBase transactions (yet)
– No built in join support. Must to denormalize your schema to use HBase
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
37. 37
Large blob store, Naïve RDBMS port access
patterns
Put, Incr, Append
HBase Client
Get, Scan
HBase Client
Bulk Import
HBase Client
HBase
Replication
HBase
Replication
low latency
high throughput
Gets
Short scan
Full Scan,
MapReduce
HBase Scanner
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
38. 38
Bad Archetype: Analytic archive
• Store purely chronological data, partitioned by time
– Real time writes, chronological time as primary index
– Column-centric aggregations over all rows
– Bulk reads out, generally for generating periodic reports
• Schema
– Row key: date+xxx or salt+date+xxx
– Column qualifiers: properties with data or counters
• Example
– Machine logs organized by date (causes write hotspotting)
– Full fidelity clickstream organzied by date (as opposed to campaign)
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
39. 39
Bad Archetype: Analytic Archive Problems
• HBase non-optimal as primary use case
– Will get crushed by frequent full table scans
– Will get crushed by large compactions
– Will get crushed by write-side region hot spotting
• Solution:
– Store in HDFS; Use Parquet columnar data storage + Impala/Hive
– Build rollups in HDFS+MR; store and serve rollups in HBase
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
40. 40
Analytic Archive access patterns
Put, Incr, Append
HBase Client
Get, Scan
HBase Client
Bulk Import
HBase Client
HBase
Replication
HBase
Replication
low latency
high throughput
Gets
Short scan
Full Scan,
MapReduce
HBase Scanner
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
41. 41
Archetypes: The
Maybe
And this is crazy | But here’s my data, | serve it,
maybe!
42. 42
The Maybe’s
• For some applications, doing it right gets complicated.
• More sophisticated or nuanced cases
• Require considering these questions:
– When do you choose HBase vs HDFS storage for time series data?
– Are there times where bad archetypes are ok?
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
43. 43
Time Series: in HBase or HDFS?
• Timeseries IO Pattern Physics:
– Reads: Collocate related data
• Make reads cheap and fast
– Writes: Spread writes out as much as possible
• Maximize write throughput
• HBase: Tension between these goals
– Spreading writes spreads data making reads inefficient
– Colocating on write causes hotspots, underutilizes resources by limiting write
throughput
• HDFS: The sweet spot
– Sequential writes and and sequential read
– Just write more files in date-dirs; physically spreads writes but logically groups data
– Reads for time centric queries: just read files in date-dir
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
44. 44
Time Series data flows
• Ingest
– Flume or similar direct tool via app
• HDFS for historical
– No real time serving
– Batch queries and generate rollups in Hive/MR
– Faster queries in Impala
• HBase for recent
– Serve individual events
– Serve pre-computed aggregates
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
45. 45
Archetype: Entity Time Series
• Full fidelity historical record of metrics
– Random write to event data, random read specific event or
aggregate data
• Schema:
– Rowkey: entity-timestamp or hash(entity)-timestamp, possibly
with salt added after entity
– Col qualifiers: granular time stamps -> value
– Use custom aggregation to consolidate old data
– Use TTL’s to bound and age off old data
• Examples:
– OpenTSDB is a system on HBase that handles this for numeric
values
• Lazily aggregates cells for better performance
– Facebook Insights, ODS 11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
46. 46
Entity Time Series access pattern
Put, Incr, Append
HBase Client
Get, Scan
HBase Client
Bulk Import
HBase Client
HBase
Replication
HBase
Replication
low latency
high throughput
Gets
Short scan
Full Scan,
MapReduce
HBase Scanner
Flume
Custom
App
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
47. 47
Archetypes: Hybrid Entity Time Series
• Essentially a combo of the Metric Archetype and Entity Time Series
Archetype, with bulk loads of rollups via HDFS
– Land data in HDFS and HBase
– Keep all data in HDFS for future use
– Aggregate in HDFS and write to HBase
– HBase can do some aggregates too (counters)
– Keep serve-able data in HBase
– Use TTL to discard old values from HBase
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
48. 48
Hybrid time series access pattern
Put, Incr, Append
HBase Client
Get, Scan
HBase Client
Hive or MR:
Bulk Import
HBase Client
HBase
Replication
HBase
Replication
low latency
high throughput
Gets
Short scan
Full Scan,
MapReduce
HBase Scanner
Flume
HDFS
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
49. 49
Meta Archetype: Combined workloads
• In these cases, the use of HBase depends on workload
• Cases where we have multiple workloads styles.
– Many cases we want to do multiple things with the same data
– Primary use case (real time, random access)
– Secondary use case (analytical)
– Pick for your primary, here’s some patterns on how to do your
secondary.
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
50. 50
Operational with Analytical access pattern
Get, Scan
HBase Client
poor latency!
full scans
interfere with
latency!
high throughput
MapReduce
HBase Scanner
Put, Incr, Append
HBase Client
HBase
Replication
Bulk Import
HBase Client
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
51. 51
Operational with Analytical access pattern
Get, Scan
HBase Client
HBase
Replication
low latency
Isolated from full scans
high throughput
MapReduce
HBase Scanner
Put, Incr, Append
HBase Client
HBase
Replication
Bulk Import
HBase Client
high throughput
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
52. 52
MR over Table Snapshots (0.98, CDH5.0)
• Previously MapReduce jobs over
HBase required online full table
scan
• Take a snapshot and run MR job
over snapshot files
– Doesn’t use HBase client
– Avoid affecting HBase caches
– 3-5x perf boost.
– Still requires more IOPs than HDFS
raw files
map
map
map
map
map
map
map
map
reduce
reduce
reduce
map
map
map
map
map
map
map
map
reduce
reduce
reduce
snapshot
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
53. 53
Analytic Archive access pattern
Put, Incr, Append
HBase Client
Get, Scan
HBase Client
Bulk Import
HBase Client
HBase
Replication
HBase
Replication
low latency
high throughput
Gets
Short scan
Full Scan,
MapReduce
HBase Scanner
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
54. 54
Analytic Archive Snapshot access pattern
HDFS
Put, Incr, Append
HBase Client
HBase Client
Snapshot Scan,
MR
HBase Scanner
Bulk Import
HBase Client
HBase
Replication
HBase
Replication
low latency
Table snapshot
Higher throughput
Gets
Short scan
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
55. 55
Request Scheduling
• We want to MR for analytics while
serving low-latency requests in one
cluster.
• Performance Isolation (proposed)
– Limit performance impact load on one
table has on others. (HBASE-6721)
• Request prioritization and
scheduling
– Current default is FIFO, added
Deadline
– Prioritize short requests before long
scans (HBASE-10994)
• Throttling
– Limit the request throughput of MR
jobs.
Mixed workload
Delayed by
long scan
requests
1 1 2 1 1 3 1
1 1 1 1 1 2 3
Rescheduled
so new request
get priority
Isolated
workload
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
57. 57
Big Data Workloads
Low
latency
Batch
HDFS
+ Impala
HDFS + MR
(Hive/Pig)
HBase
HBase + MR
HBase + Snapshots
-> HDFS + MR
Random Access Short Scan Full Scan
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
58. 58
Big Data Workloads
Low
latency
Batch
HDFS
+ Impala
Analytic archive
Entity Time series
Hybrid Entity Time
series
+ Rollup generation
HDFS + MR
(Hive/Pig)
Simple Entities
Graph data
HBase
Current Metrics
Messages
HBase + MR
Hybrid Entity Time
series
+ Rollup serving
HBase + Snapshots
-> HDFS + MR
Index building
Random Access Short Scan Full Scan
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
59. 59
HBase is evolving to be an Operational
Database
• Excels at consistent row-centric operations
– Dev efforts aimed at using all machine resources efficiently, reducing MTTR,
and improving latency predictability.
– Projects built on HBase that enable secondary indexing and multi-row
transactions
– Apache Phoenix or Impala provide a SQL skin for simplified application
development
– Evolution towards OLTP workloads
• Analytic workloads?
– Can be done but will be beaten by direct HDFS + MR/Spark/Impala
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
60. 60
Join the Discussion
Get community
help or provide
feedback
cloudera.com/communi
ty
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
61. 61
Try Hadoop
Now
cloudera.com/live
11/20/14 Strata+Hadoop World Barcelona 2014. George and Hsieh
62. Thank you!
More questions?
Join us at Office
Hours
4pm @ Table B
Editor's Notes
Given that Hbase stores a large sorted map, the API looks similar to a map. You can get or put individual rows, or scan a range of rows. There is also a very efficient way of incrementing a particular cell – this can be useful for maintaining high performance counters or statistics. Lastly, it’s possible to write MapReduce jobs that analyze the data in Hbase.
Given that Hbase stores a large sorted map, the API looks similar to a map. You can get or put individual rows, or scan a range of rows. There is also a very efficient way of incrementing a particular cell – this can be useful for maintaining high performance counters or statistics. Lastly, it’s possible to write MapReduce jobs that analyze the data in Hbase.
Given that Hbase stores a large sorted map, the API looks similar to a map. You can get or put individual rows, or scan a range of rows. There is also a very efficient way of incrementing a particular cell – this can be useful for maintaining high performance counters or statistics. Lastly, it’s possible to write MapReduce jobs that analyze the data in Hbase.
Given that Hbase stores a large sorted map, the API looks similar to a map. You can get or put individual rows, or scan a range of rows. There is also a very efficient way of incrementing a particular cell – this can be useful for maintaining high performance counters or statistics. Lastly, it’s possible to write MapReduce jobs that analyze the data in Hbase.
Given that Hbase stores a large sorted map, the API looks similar to a map. You can get or put individual rows, or scan a range of rows. There is also a very efficient way of incrementing a particular cell – this can be useful for maintaining high performance counters or statistics. Lastly, it’s possible to write MapReduce jobs that analyze the data in Hbase.
Given that Hbase stores a large sorted map, the API looks similar to a map. You can get or put individual rows, or scan a range of rows. There is also a very efficient way of incrementing a particular cell – this can be useful for maintaining high performance counters or statistics. Lastly, it’s possible to write MapReduce jobs that analyze the data in Hbase.
Given that Hbase stores a large sorted map, the API looks similar to a map. You can get or put individual rows, or scan a range of rows. There is also a very efficient way of incrementing a particular cell – this can be useful for maintaining high performance counters or statistics. Lastly, it’s possible to write MapReduce jobs that analyze the data in Hbase.
Given that Hbase stores a large sorted map, the API looks similar to a map. You can get or put individual rows, or scan a range of rows. There is also a very efficient way of incrementing a particular cell – this can be useful for maintaining high performance counters or statistics. Lastly, it’s possible to write MapReduce jobs that analyze the data in Hbase.
Given that Hbase stores a large sorted map, the API looks similar to a map. You can get or put individual rows, or scan a range of rows. There is also a very efficient way of incrementing a particular cell – this can be useful for maintaining high performance counters or statistics. Lastly, it’s possible to write MapReduce jobs that analyze the data in Hbase.
Given that Hbase stores a large sorted map, the API looks similar to a map. You can get or put individual rows, or scan a range of rows. There is also a very efficient way of incrementing a particular cell – this can be useful for maintaining high performance counters or statistics. Lastly, it’s possible to write MapReduce jobs that analyze the data in Hbase.
Given that Hbase stores a large sorted map, the API looks similar to a map. You can get or put individual rows, or scan a range of rows. There is also a very efficient way of incrementing a particular cell – this can be useful for maintaining high performance counters or statistics. Lastly, it’s possible to write MapReduce jobs that analyze the data in Hbase.
Given that Hbase stores a large sorted map, the API looks similar to a map. You can get or put individual rows, or scan a range of rows. There is also a very efficient way of incrementing a particular cell – this can be useful for maintaining high performance counters or statistics. Lastly, it’s possible to write MapReduce jobs that analyze the data in Hbase.
Given that Hbase stores a large sorted map, the API looks similar to a map. You can get or put individual rows, or scan a range of rows. There is also a very efficient way of incrementing a particular cell – this can be useful for maintaining high performance counters or statistics. Lastly, it’s possible to write MapReduce jobs that analyze the data in Hbase.
Given that Hbase stores a large sorted map, the API looks similar to a map. You can get or put individual rows, or scan a range of rows. There is also a very efficient way of incrementing a particular cell – this can be useful for maintaining high performance counters or statistics. Lastly, it’s possible to write MapReduce jobs that analyze the data in Hbase.