SlideShare a Scribd company logo
1 of 24
Dancing With The Elephant
We will discuss
• Introduction to Hadoop
• HBase: Definition, Storage Model, Usecases
• Basic Data Access from shell
• Hands-on with HBase API
What is Hadoop
• Framework for distributed processing of large
datasets(BigData)
• HDFS+MapReduce
• HDFS: (Data)
 Distributed Filesystem responsible for storing data across
cluster
 Provides replication on cheap commodity hardware
 Namenode and DataNode processes
• MapReduce: (Processing)
 May be a future session
HBase: What
• a sparse, distributed, persistent, multidimensional, sorted
map ( defined by Google’s paper on BigTable)
• Distributed NoSQL Database designed on top of HDFS
RDBMS Woes (with massive data)
• Scaling is Hard and Expensive
• Turn off relational features/secondary indexes.. to scale
• Hard to do quick reads at larger tables sizes(500 GB)
• Single point of failures
• Schema changes
HBase: Why
• Scalable: Just add nodes as your data grows
• Distributed: Leveraging Hadoop’s HDFS advantages
• Built on top of Hadoop : Being part of the
ecosystem, can be integrated to multiple tools
• High performance for read/write
 Short-Circuit reads
 Single reads: 1 to 10 ms, Scan for: 100s of rows in 10ms
• Schema less
• Production-Ready where data is in order of petabytes
HBase: Storage Model 1
HTable
• Tables are split into regions
• Region: Data with continuous range of RowKeys from
[Start to End) sorted Order
• Regions split as Table grows (Region size can be
configured)
• Table Schema defines Column Families
• (Table, RowKey, ColumnFamily, ColumnName, Timestamp) 
Value
HTable(Data Structure)
• SortedMap(
RowKey, List(
SortedMap(
Column, List(
Value, Timestamp
)
)
)
)
HBase: Data Read/Write
• Get: Random read
• Scan: Sequential read
• Put: Write/Update
HBase: Data Access Clients
• Demo of HBase shell
• Java API
HBase: API
• Connection
• DDL
• DML
• Filters
• Hands-On
HBase: API
• Configuration: holds details where to find the cluster
and tunable setting .
• Hconnection : represent connection to the cluster.
• HBaseAdmin: handles DDL
operations(create, list,drop,alter).
• Htable (HTableInterface) :is a handle on a single Hbase
table. Send “command” to the table (Put , Get , Scan
, Delete , Increment)
HBase: API:DDL
Group name: ddl (Data Defination Language)
Commands:
alter, create, describe, disable, drop, enable, exists, is_di
sabled, is_enabled, list
HBase: API:DDL
HBaseConfiguration conf = new HBaseConfiguration();
conf.set("hbase.master","localhost:60010");
HBaseAdmin hbase = new HBaseAdmin(conf);
HTableDescriptor desc = new HTableDescriptor(" testtable ");
HColumnDescriptor meta = new HColumnDescriptor(" colfam1
".getBytes());
HColumnDescriptor prefix = new HColumnDescriptor(" colfam2
".getBytes());
desc.addFamily(meta);
desc.addFamily(prefix);
hbase.createTable(desc);
HBase: API:DML
Group name: dml (Data Manipulation Language)
Commands:
count, delete, deleteall, get, get_counter, incr, put, scan,
truncate
HBase: API:DML PUT
HTable table = new HTable(conf, "testtable");
Put put = new Put(Bytes.toBytes("row1"));
put.add(Bytes.toBytes("colfam1"), Bytes.toBytes("qual1"),
Bytes.toBytes("val1"));
put.add(Bytes.toBytes("colfam1"), Bytes.toBytes("qual2"),
Bytes.toBytes("val2"));
table.put(put);
HBase: API:DML GET
Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf, "testtable");
Get get = new Get(Bytes.toBytes("row1"));
get.addColumn(Bytes.toBytes("colfam1"), Bytes.toBytes("q
ual1"));
Result result = table.get(get);
byte[] val = result.getValue(Bytes.toBytes("colfam1"),
Bytes.toBytes("qual1"));
System.out.println("Value: " + Bytes.toString(val));
HBase: API:DML SCAN
Scan scan1 = new Scan();
ResultScanner scanner1 = table.getScanner(scan1);
for (Result res : scanner1) {
System.out.println(res);
}
scanner1.close();
Other Projects around HBase
• SQL Layer: Phoenix, Hive, Impala
• Object Persistence: Lily, Kundera
FollowUp
• Part2:
 Building KeyValue Data store in HBase
 Challenges we faced in SMART
• {Rahul, vinay}@briotribes.com
Shoutout To
HBase: Usecase (Facebook)
• Facebook Messaging:
 Titan
 1.5 M ops per second at peak
 6B+ messages per day
 16 columns per operation across diff. families
• Facebook insights:
 Puma
 provides developers and Page owners with metrics about their
content
 > 1 M counter increments per second
Dancing with the elephant   h base1_final

More Related Content

What's hot

What's hot (20)

Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix clusterFive major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
 
HBaseCon 2015: Analyzing HBase Data with Apache Hive
HBaseCon 2015: Analyzing HBase Data with Apache  HiveHBaseCon 2015: Analyzing HBase Data with Apache  Hive
HBaseCon 2015: Analyzing HBase Data with Apache Hive
 
A Survey of HBase Application Archetypes
A Survey of HBase Application ArchetypesA Survey of HBase Application Archetypes
A Survey of HBase Application Archetypes
 
Apache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAseApache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAse
 
HBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2013: Full-Text Indexing for Apache HBaseHBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2013: Full-Text Indexing for Apache HBase
 
Apache phoenix
Apache phoenixApache phoenix
Apache phoenix
 
Hadoop hbase mapreduce
Hadoop hbase mapreduceHadoop hbase mapreduce
Hadoop hbase mapreduce
 
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHarmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
 
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big DataHBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
 
Apache Phoenix + Apache HBase
Apache Phoenix + Apache HBaseApache Phoenix + Apache HBase
Apache Phoenix + Apache HBase
 
Data Evolution in HBase
Data Evolution in HBaseData Evolution in HBase
Data Evolution in HBase
 
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region Replicas
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0
 
HBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the ArtHBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the Art
 
The Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBaseThe Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBase
 
Mar 2012 HUG: Hive with HBase
Mar 2012 HUG: Hive with HBaseMar 2012 HUG: Hive with HBase
Mar 2012 HUG: Hive with HBase
 

Viewers also liked

Ppdb 2013 2014 4
Ppdb 2013 2014 4Ppdb 2013 2014 4
Ppdb 2013 2014 4
johanstupa
 
Ppdb 2013 2014 3
Ppdb 2013 2014 3Ppdb 2013 2014 3
Ppdb 2013 2014 3
johanstupa
 
Ppdb 2013 2014 1
Ppdb 2013 2014 1Ppdb 2013 2014 1
Ppdb 2013 2014 1
johanstupa
 
Ppdb 2013 2014 2 juli 2013
Ppdb 2013 2014 2 juli 2013Ppdb 2013 2014 2 juli 2013
Ppdb 2013 2014 2 juli 2013
johanstupa
 
Ppdb 2013 2014 5
Ppdb 2013 2014 5Ppdb 2013 2014 5
Ppdb 2013 2014 5
johanstupa
 
Trabajo de investigación 1 diurno nocturno
Trabajo de investigación 1 diurno nocturnoTrabajo de investigación 1 diurno nocturno
Trabajo de investigación 1 diurno nocturno
darwinproyectoilustrador
 
Stcw cir. no. 02 s2014 Marina
Stcw cir. no. 02 s2014 MarinaStcw cir. no. 02 s2014 Marina
Stcw cir. no. 02 s2014 Marina
Gello Hembz
 
Cardiac development & fetal circulation (2)
Cardiac development & fetal  circulation (2)Cardiac development & fetal  circulation (2)
Cardiac development & fetal circulation (2)
Deeptha Premnath
 
Electric Servo Motor
Electric Servo MotorElectric Servo Motor
Electric Servo Motor
Gello Hembz
 

Viewers also liked (16)

Ppdb 2013 2014 4
Ppdb 2013 2014 4Ppdb 2013 2014 4
Ppdb 2013 2014 4
 
2013 RFS AMCLC
2013 RFS AMCLC2013 RFS AMCLC
2013 RFS AMCLC
 
Ppdb 2013 2014 3
Ppdb 2013 2014 3Ppdb 2013 2014 3
Ppdb 2013 2014 3
 
Ppdb 2013 2014 1
Ppdb 2013 2014 1Ppdb 2013 2014 1
Ppdb 2013 2014 1
 
Ppdb 2013 2014 2 juli 2013
Ppdb 2013 2014 2 juli 2013Ppdb 2013 2014 2 juli 2013
Ppdb 2013 2014 2 juli 2013
 
Ppdb 2013 2014 5
Ppdb 2013 2014 5Ppdb 2013 2014 5
Ppdb 2013 2014 5
 
ACR RFS Overview
ACR RFS OverviewACR RFS Overview
ACR RFS Overview
 
Trabajo de investigación 1 diurno nocturno
Trabajo de investigación 1 diurno nocturnoTrabajo de investigación 1 diurno nocturno
Trabajo de investigación 1 diurno nocturno
 
Stcw cir. no. 02 s2014 Marina
Stcw cir. no. 02 s2014 MarinaStcw cir. no. 02 s2014 Marina
Stcw cir. no. 02 s2014 Marina
 
Методичний супровід проектної діяльності вчителя та учнів
Методичний супровід проектної діяльності вчителя та учнівМетодичний супровід проектної діяльності вчителя та учнів
Методичний супровід проектної діяльності вчителя та учнів
 
Symposium koha 2016 nouveautés 16.05
Symposium koha 2016 nouveautés 16.05Symposium koha 2016 nouveautés 16.05
Symposium koha 2016 nouveautés 16.05
 
студия 2016
студия  2016студия  2016
студия 2016
 
Cardiac development & fetal circulation (2)
Cardiac development & fetal  circulation (2)Cardiac development & fetal  circulation (2)
Cardiac development & fetal circulation (2)
 
Презентація до семінару дистанційне навчання (1)
Презентація до семінару дистанційне навчання (1)Презентація до семінару дистанційне навчання (1)
Презентація до семінару дистанційне навчання (1)
 
Electric Servo Motor
Electric Servo MotorElectric Servo Motor
Electric Servo Motor
 
Интегрированный урок
Интегрированный урокИнтегрированный урок
Интегрированный урок
 

Similar to Dancing with the elephant h base1_final

The Family of Hadoop
The Family of HadoopThe Family of Hadoop
The Family of Hadoop
Nam Nham
 
Dynamic Namespace Partitioning with Giraffa File System
Dynamic Namespace Partitioning with Giraffa File SystemDynamic Namespace Partitioning with Giraffa File System
Dynamic Namespace Partitioning with Giraffa File System
DataWorks Summit
 

Similar to Dancing with the elephant h base1_final (20)

HBase.pptx
HBase.pptxHBase.pptx
HBase.pptx
 
Introduction to HBase
Introduction to HBaseIntroduction to HBase
Introduction to HBase
 
Hadoop - Apache Hbase
Hadoop - Apache HbaseHadoop - Apache Hbase
Hadoop - Apache Hbase
 
מיכאל
מיכאלמיכאל
מיכאל
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
Geo-based content processing using hbase
Geo-based content processing using hbaseGeo-based content processing using hbase
Geo-based content processing using hbase
 
Hbase 20141003
Hbase 20141003Hbase 20141003
Hbase 20141003
 
SQL on Hadoop for the Oracle Professional
SQL on Hadoop for the Oracle ProfessionalSQL on Hadoop for the Oracle Professional
SQL on Hadoop for the Oracle Professional
 
BIG DATA: Apache Hadoop
BIG DATA: Apache HadoopBIG DATA: Apache Hadoop
BIG DATA: Apache Hadoop
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars George
 
Hbase
HbaseHbase
Hbase
 
Intro to Hadoop
Intro to HadoopIntro to Hadoop
Intro to Hadoop
 
The Family of Hadoop
The Family of HadoopThe Family of Hadoop
The Family of Hadoop
 
Hive and querying data
Hive and querying dataHive and querying data
Hive and querying data
 
Hypertable Distilled by edydkim.github.com
Hypertable Distilled by edydkim.github.comHypertable Distilled by edydkim.github.com
Hypertable Distilled by edydkim.github.com
 
hive_slides_Webinar_Session_1.pptx
hive_slides_Webinar_Session_1.pptxhive_slides_Webinar_Session_1.pptx
hive_slides_Webinar_Session_1.pptx
 
What's New Tajo 0.10 and Its Beyond
What's New Tajo 0.10 and Its BeyondWhat's New Tajo 0.10 and Its Beyond
What's New Tajo 0.10 and Its Beyond
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptx
 
Dynamic Namespace Partitioning with Giraffa File System
Dynamic Namespace Partitioning with Giraffa File SystemDynamic Namespace Partitioning with Giraffa File System
Dynamic Namespace Partitioning with Giraffa File System
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data Processing
 

Recently uploaded

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Recently uploaded (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

Dancing with the elephant h base1_final

  • 1. Dancing With The Elephant
  • 2. We will discuss • Introduction to Hadoop • HBase: Definition, Storage Model, Usecases • Basic Data Access from shell • Hands-on with HBase API
  • 3. What is Hadoop • Framework for distributed processing of large datasets(BigData) • HDFS+MapReduce • HDFS: (Data)  Distributed Filesystem responsible for storing data across cluster  Provides replication on cheap commodity hardware  Namenode and DataNode processes • MapReduce: (Processing)  May be a future session
  • 4. HBase: What • a sparse, distributed, persistent, multidimensional, sorted map ( defined by Google’s paper on BigTable) • Distributed NoSQL Database designed on top of HDFS
  • 5. RDBMS Woes (with massive data) • Scaling is Hard and Expensive • Turn off relational features/secondary indexes.. to scale • Hard to do quick reads at larger tables sizes(500 GB) • Single point of failures • Schema changes
  • 6. HBase: Why • Scalable: Just add nodes as your data grows • Distributed: Leveraging Hadoop’s HDFS advantages • Built on top of Hadoop : Being part of the ecosystem, can be integrated to multiple tools • High performance for read/write  Short-Circuit reads  Single reads: 1 to 10 ms, Scan for: 100s of rows in 10ms • Schema less • Production-Ready where data is in order of petabytes
  • 8. HTable • Tables are split into regions • Region: Data with continuous range of RowKeys from [Start to End) sorted Order • Regions split as Table grows (Region size can be configured) • Table Schema defines Column Families • (Table, RowKey, ColumnFamily, ColumnName, Timestamp)  Value
  • 9. HTable(Data Structure) • SortedMap( RowKey, List( SortedMap( Column, List( Value, Timestamp ) ) ) )
  • 10. HBase: Data Read/Write • Get: Random read • Scan: Sequential read • Put: Write/Update
  • 11. HBase: Data Access Clients • Demo of HBase shell • Java API
  • 12. HBase: API • Connection • DDL • DML • Filters • Hands-On
  • 13. HBase: API • Configuration: holds details where to find the cluster and tunable setting . • Hconnection : represent connection to the cluster. • HBaseAdmin: handles DDL operations(create, list,drop,alter). • Htable (HTableInterface) :is a handle on a single Hbase table. Send “command” to the table (Put , Get , Scan , Delete , Increment)
  • 14. HBase: API:DDL Group name: ddl (Data Defination Language) Commands: alter, create, describe, disable, drop, enable, exists, is_di sabled, is_enabled, list
  • 15. HBase: API:DDL HBaseConfiguration conf = new HBaseConfiguration(); conf.set("hbase.master","localhost:60010"); HBaseAdmin hbase = new HBaseAdmin(conf); HTableDescriptor desc = new HTableDescriptor(" testtable "); HColumnDescriptor meta = new HColumnDescriptor(" colfam1 ".getBytes()); HColumnDescriptor prefix = new HColumnDescriptor(" colfam2 ".getBytes()); desc.addFamily(meta); desc.addFamily(prefix); hbase.createTable(desc);
  • 16. HBase: API:DML Group name: dml (Data Manipulation Language) Commands: count, delete, deleteall, get, get_counter, incr, put, scan, truncate
  • 17. HBase: API:DML PUT HTable table = new HTable(conf, "testtable"); Put put = new Put(Bytes.toBytes("row1")); put.add(Bytes.toBytes("colfam1"), Bytes.toBytes("qual1"), Bytes.toBytes("val1")); put.add(Bytes.toBytes("colfam1"), Bytes.toBytes("qual2"), Bytes.toBytes("val2")); table.put(put);
  • 18. HBase: API:DML GET Configuration conf = HBaseConfiguration.create(); HTable table = new HTable(conf, "testtable"); Get get = new Get(Bytes.toBytes("row1")); get.addColumn(Bytes.toBytes("colfam1"), Bytes.toBytes("q ual1")); Result result = table.get(get); byte[] val = result.getValue(Bytes.toBytes("colfam1"), Bytes.toBytes("qual1")); System.out.println("Value: " + Bytes.toString(val));
  • 19. HBase: API:DML SCAN Scan scan1 = new Scan(); ResultScanner scanner1 = table.getScanner(scan1); for (Result res : scanner1) { System.out.println(res); } scanner1.close();
  • 20. Other Projects around HBase • SQL Layer: Phoenix, Hive, Impala • Object Persistence: Lily, Kundera
  • 21. FollowUp • Part2:  Building KeyValue Data store in HBase  Challenges we faced in SMART • {Rahul, vinay}@briotribes.com
  • 23. HBase: Usecase (Facebook) • Facebook Messaging:  Titan  1.5 M ops per second at peak  6B+ messages per day  16 columns per operation across diff. families • Facebook insights:  Puma  provides developers and Page owners with metrics about their content  > 1 M counter increments per second