Hbase jdd

•

0 likes•812 views

Andrzej Grzesik

Technology

HBase

open-‐‑source
high-‐‑performance
BigTable

fast
distributed
NoSQL
datastore
scalable
built upon
Hadoop
fault tolerant

Cool and fun to work with!

Hadoop stack
By my count — and it’s very possible I’m missing someone —
Hadoop-‐‑based startups have raised $104.5 million since May.
The same set of companies has raised $159.7 million since 2009
when Cloudera closed its ﬁrst round.

By comparison, the handful of popular NoSQL database vendors,
often lumped into the big data category as well, and similar to
Hadoop in their focus on unstructured data, have announced just
more than $90 million in funding overall.

via (hKp://gigaom.com/cloud/with-‐‑40m-‐‑for-‐‑cloudera-‐‑how-‐‑much-‐‑is-‐‑hadoop-‐‑worth/)

architecture
HBase

Zookeeper

m/r
hdfs
hadoop

servers
node
node
node

Related projects:
•  Chukwa
o  Log analysis tool

•  Hive

o  Or, if Hive is slow:

•  Pig
o  High level data manipulation language
o  Don’t write all MapReduce jobs by hand!

Brewer’s CAP theorem
Availability

HBase
RDBMS

Pick
2
Partition Consistency
Tolerance

CouchDB

Data organisation

Rowkey 1
Rowkey n+1
…
…
Rowkey n
…

Region 1
Region 2

Data organisation

Region

Column family Column family
col1, col2, col3
col1, col2

Column family
Column family

Data organisation
ColumnKey

Region
column1
column2
column3
Timestamp

v1@t1
v1@t1
v1@t1

v1@t2
v1@t2

v1@t3

Integration testing?
Start cluster locally

?
Use a remote one

How to start hacking?
Grab hadoop
http://hadoop.apache.org/

and Hbase
http://hbase.apache.org/

Spend an eon learning more than you wanted about
plumbing

How to start hacking?
Better (faster) way:

Grab a VM/packages from

Pro tip
Don’t run HBase on or face problems

It’s doable
(http://hbase.apache.org/docs/r0.20.6/cygwin.html)
but VMs are faster!

How to start hacking?
Situation will improve, since

modes
Develop with
•  local mode
o  single instance, single JVM

Then
•  Pseudo-distributed
o  multiple instances, single machine

For production
•  Distributed mode
o  many nodes

One more
Befriend some admins, you will need them

Example from X
•  Customer-provided user data
•  Schema varying between customers
o  kept in RDBMS,

•  Data in HBase

Example from Facebook
HBase drives Facebook messages

•  Key: UserId
•  Column: Word
•  Version: MessageId

See for more details
(http://www.infoq.com/presentations/HBase-at-Facebook)

When to use Hbase?
•  Lots of key/value data
•  Need good scalability
•  Need good query times with random access
•  Data analytics

What is HBase poor at?
•  transactions
•  relying on indexes
•  security

Useful
Brewer’s CAP theorem
http://citeseerx.ist.psu.edu/viewdoc/download?
doi=10.1.1.20.1495&rep=rep1&type=pdf

Google BigTable
http://labs.google.com/papers/bigtable-osdi06.pdf

Dzone Refcards
http://refcardz.dzone.com/refcardz/getting-started-apache-hadoop
http://refcardz.dzone.com/refcardz/deploying-hadoop

What's hot

Cassandra/Hadoop IntegrationJeremy Hanna

Hadoop and Cassandra at RackspaceStu Hood

HBaseCon 2013: Compaction Improvements in Apache HBaseCloudera, Inc.

Hadoop+Cassandra_IntegrationJoyabrata Das

Hadoop MapReduce Streaming and PipesHanborq Inc.

HBase Data Modeling and Access Patterns with Kite SDKHBaseCon

NYC Hadoop Meetup - MapR, Architecture, Philosophy and ApplicationsJason Shao

HBaseCon 2015- HBase @ FlipboardMatthew Blair

Apache Spark on Apache HBase: Current and Future HBaseCon

HBaseCon 2013: Apache HBase at Pinterest - Scaling Our Feed StorageCloudera, Inc.

HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseCloudera, Inc.

HBase at Flurryddlatham

Big Data JourneyTugdual Grall

Sept 17 2013 - THUG - HBase a Technical IntroductionAdam Muise

Pig with Cassandra: Adventures in AnalyticsJeremy Hanna

Hw09 Practical HBase Getting The Most From Your H Base InstallCloudera, Inc.

Introduction to Apache HBase, MapR Tables and SecurityMapR Technologies

Nextag talkJoydeep Sen Sarma

Real Time and Big Data – It’s About TimeDataWorks Summit

Building a Scalable Web Crawler with HadoopHadoop User Group

What's hot (20)

Cassandra/Hadoop Integration

Hadoop and Cassandra at Rackspace

HBaseCon 2013: Compaction Improvements in Apache HBase

Hadoop+Cassandra_Integration

Hadoop MapReduce Streaming and Pipes

HBase Data Modeling and Access Patterns with Kite SDK

NYC Hadoop Meetup - MapR, Architecture, Philosophy and Applications

HBaseCon 2015- HBase @ Flipboard

Apache Spark on Apache HBase: Current and Future

HBaseCon 2013: Apache HBase at Pinterest - Scaling Our Feed Storage

HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase

HBase at Flurry

Big Data Journey

Sept 17 2013 - THUG - HBase a Technical Introduction

Pig with Cassandra: Adventures in Analytics

Hw09 Practical HBase Getting The Most From Your H Base Install

Introduction to Apache HBase, MapR Tables and Security

Nextag talk

Real Time and Big Data – It’s About Time

Building a Scalable Web Crawler with Hadoop

Viewers also liked

GitAndrzej Grzesik

Go, the one language to learn in 2014Andrzej Grzesik

The path to Repeatable BuildsAndrzej Grzesik

JDK, the not so hidden treasuresAndrzej Grzesik

JDK not so hidden treasuresAndrzej Grzesik

Java 8: the good parts!Andrzej Grzesik

11 Crazy Marketing StatisticsHubSpot

STEAL THIS PRESENTATION! Jesse Desjardins - @jessedee

Viewers also liked (8)

Git

Go, the one language to learn in 2014

The path to Repeatable Builds

JDK, the not so hidden treasures

JDK not so hidden treasures

Java 8: the good parts!

11 Crazy Marketing Statistics

STEAL THIS PRESENTATION!

Similar to Hbase jdd

HBase and Hadoop at Urban Airshipdave_revell

Facebook keynote-nicolas-qconYiwei Ma

支撑Facebook消息处理的h base存储系统yongboy

Facebook Messages & HBase强王

Architecting the Future of Big Data & Search - Eric Baldeschwielerlucenerevolution

[Hi c2011]building mission critical messaging system(guoqiang jerry)baggioss

Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)VMware Tanzu

HiveManas Nayak

TriHUG - Beyond Batchboorad

Storage Infrastructure Behind Facebook Messagesyarapavan

The ABC of Big DataAndré Faria Gomes

Big data hadoop ecosystem and nosqlKhanderao Kand

מיכאלsqlserver.co.il

Hadoop distributed computing framework for big dataCyanny LIANG

Need for Time series DatabasePramit Choudhary

Cloudera Impala: A Modern SQL Engine for HadoopCloudera, Inc.

Introduction to Hadoop and Big DataJoe Alex

Impala for PhillyDB MeetupShravan (Sean) Pabba

Big Data/Hadoop Infrastructure ConsiderationsRichard McDougall

Real time hadoop + mapreduce introGeoff Hendrey

Similar to Hbase jdd (20)

HBase and Hadoop at Urban Airship

Facebook keynote-nicolas-qcon

支撑Facebook消息处理的h base存储系统

Facebook Messages & HBase

Architecting the Future of Big Data & Search - Eric Baldeschwieler

[Hi c2011]building mission critical messaging system(guoqiang jerry)

Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)

Hive

TriHUG - Beyond Batch

Storage Infrastructure Behind Facebook Messages

The ABC of Big Data

Big data hadoop ecosystem and nosql

מיכאל

Hadoop distributed computing framework for big data

Need for Time series Database

Cloudera Impala: A Modern SQL Engine for Hadoop

Introduction to Hadoop and Big Data

Impala for PhillyDB Meetup

Big Data/Hadoop Infrastructure Considerations

Real time hadoop + mapreduce intro

Recently uploaded

My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer

"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays

Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm

Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst

Anypoint Exchange: It’s Not Just a Repo!Manik S Magar

Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson

Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software

Artificial intelligence in cctv survelliance.pptxhariprasad279825

Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited

Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar

WordPress Websites for Engineers: Elevate Your Brandgvaughan

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero

CloudStudio User manual (basic edition):comworks

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxnull - The Open Security Community

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski

Powerpoint exploring the locations used in television show Time Clashcharlottematthew16

The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays

Recently uploaded (20)

My INSURER PTE LTD - Insurtech Innovation Award 2024

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn

Streamlining Python Development: A Guide to a Modern Project Setup

Human Factors of XR: Using Human Factors to Design XR Systems

Anypoint Exchange: It’s Not Just a Repo!

Are Multi-Cloud and Serverless Good or Bad?

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation

Artificial intelligence in cctv survelliance.pptx

Ensuring Technical Readiness For Copilot in Microsoft 365

Unleash Your Potential - Namagunga Girls Coding Club

WordPress Websites for Engineers: Elevate Your Brand

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

SIP trunking in Janus @ Kamailio World 2024

CloudStudio User manual (basic edition):

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...

Powerpoint exploring the locations used in television show Time Clash

The Future of Software Development - Devin AI Innovative Approach.pdf

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...

Hbase jdd

1. HBase Tame your BigData Andrzej Grzesik LunarLogicPolska

2. me: present past

3. Questions? Ask them right away!

5. So

6. HBase open-‐‑source high-‐‑performance BigTable fast distributed NoSQL datastore scalable built upon Hadoop fault tolerant Cool and fun to work with!

7. Who uses Hbase?

8. Beware! Lots of text

9. Hadoop stack By my count — and it’s very possible I’m missing someone — Hadoop-‐‑based startups have raised $104.5 million since May. The same set of companies has raised $159.7 million since 2009 when Cloudera closed its ﬁrst round. By comparison, the handful of popular NoSQL database vendors, often lumped into the big data category as well, and similar to Hadoop in their focus on unstructured data, have announced just more than $90 million in funding overall. via (hKp://gigaom.com/cloud/with-‐‑40m-‐‑for-‐‑cloudera-‐‑how-‐‑much-‐‑is-‐‑hadoop-‐‑worth/)

10. Some theory

11. architecture HBase Zookeeper m/r hdfs hadoop servers node node node

12. Related projects: •  Chukwa o  Log analysis tool •  Hive o  Or, if Hive is slow: •  Pig o  High level data manipulation language o  Don’t write all MapReduce jobs by hand!

13. Brewer’s CAP theorem Availability HBase RDBMS Pick 2 Partition Consistency Tolerance CouchDB

14. Data organisation Rowkey 1 Rowkey n+1 … … Rowkey n … Region 1 Region 2

15. Data organisation Region Column family Column family col1, col2, col3 col1, col2 Column family Column family

16. Data organisation ColumnKey Region column1 column2 column3 Timestamp v1@t1 v1@t1 v1@t1 v1@t2 v1@t2 v1@t3

17. Let’s see some code?

18. Integration testing? Start cluster locally ? Use a remote one

19. How to start hacking? Grab hadoop http://hadoop.apache.org/ and Hbase http://hbase.apache.org/ Spend an eon learning more than you wanted about plumbing

20. How to start hacking? Better (faster) way: Grab a VM/packages from

21. Pro tip Don’t run HBase on or face problems It’s doable (http://hbase.apache.org/docs/r0.20.6/cygwin.html) but VMs are faster!

22. How to start hacking? Situation will improve, since

23. modes Develop with •  local mode o  single instance, single JVM Then •  Pseudo-distributed o  multiple instances, single machine For production •  Distributed mode o  many nodes

24. One more Befriend some admins, you will need them

25. Use cases?

26. Example from X •  Customer-provided user data •  Schema varying between customers o  kept in RDBMS, •  Data in HBase

27. Example from Facebook HBase drives Facebook messages •  Key: UserId •  Column: Word •  Version: MessageId See for more details (http://www.infoq.com/presentations/HBase-at-Facebook)

28. When to use Hbase? •  Lots of key/value data •  Need good scalability •  Need good query times with random access •  Data analytics

29. What is HBase poor at? •  transactions •  relying on indexes •  security

30. T(h)ank you!

31. Useful Brewer’s CAP theorem http://citeseerx.ist.psu.edu/viewdoc/download? doi=10.1.1.20.1495&rep=rep1&type=pdf Google BigTable http://labs.google.com/papers/bigtable-osdi06.pdf Dzone Refcards http://refcardz.dzone.com/refcardz/getting-started-apache-hadoop http://refcardz.dzone.com/refcardz/deploying-hadoop

Hbase jdd

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (8)

Similar to Hbase jdd

Similar to Hbase jdd (20)

Recently uploaded

Recently uploaded (20)

Hbase jdd