6. HBase
open-‐‑source
high-‐‑performance
BigTable
fast
distributed
NoSQL
datastore
scalable
built upon
Hadoop
fault tolerant
Cool and fun to work with!
9. Hadoop stack
By my count — and it’s very possible I’m missing someone —
Hadoop-‐‑based startups have raised $104.5 million since May.
The same set of companies has raised $159.7 million since 2009
when Cloudera closed its first round.
By comparison, the handful of popular NoSQL database vendors,
often lumped into the big data category as well, and similar to
Hadoop in their focus on unstructured data, have announced just
more than $90 million in funding overall.
via (hKp://gigaom.com/cloud/with-‐‑40m-‐‑for-‐‑cloudera-‐‑how-‐‑much-‐‑is-‐‑hadoop-‐‑worth/)
12. Related projects:
• Chukwa
o Log analysis tool
• Hive
o Or, if Hive is slow:
• Pig
o High level data manipulation language
o Don’t write all MapReduce jobs by hand!
19. How to start hacking?
Grab hadoop
http://hadoop.apache.org/
and Hbase
http://hbase.apache.org/
Spend an eon learning more than you wanted about
plumbing
20. How to start hacking?
Better (faster) way:
Grab a VM/packages from
21. Pro tip
Don’t run HBase on or face problems
It’s doable
(http://hbase.apache.org/docs/r0.20.6/cygwin.html)
but VMs are faster!
22. How to start hacking?
Situation will improve, since
23. modes
Develop with
• local mode
o single instance, single JVM
Then
• Pseudo-distributed
o multiple instances, single machine
For production
• Distributed mode
o many nodes
26. Example from X
• Customer-provided user data
• Schema varying between customers
o kept in RDBMS,
• Data in HBase
27. Example from Facebook
HBase drives Facebook messages
• Key: UserId
• Column: Word
• Version: MessageId
See for more details
(http://www.infoq.com/presentations/HBase-at-Facebook)
28. When to use Hbase?
• Lots of key/value data
• Need good scalability
• Need good query times with random access
• Data analytics
29. What is HBase poor at?
• transactions
• relying on indexes
• security