SlideShare une entreprise Scribd logo
1  sur  35
Télécharger pour lire hors ligne
Apache 
HBase 
Crazy dances on the 
elephant back 
Roman Nikitchenko, 16.10.2014
YARN 
www.vitech.com.ua 2
FIRST EVER 
DATA OS 
10.000 nodes computer... 
Recent technology changes are focused on 
higher scale. Better resource usage and 
control, lower MTTR, higher security, 
redundancy, fault tolerance. 
www.vitech.com.ua 3
● Hadoop is open source 
framework for big 
data. Both distributed 
storage and 
processing. 
● Hadoop is reliable and 
fault tolerant with no 
rely on hardware for 
these properties. 
● Hadoop has unique 
horisontal scalability. 
Currently — from 
single computer up to 
thousands of cluster 
nodes. 
www.vitech.com.ua 4
What is HADOOP INDEED? 
BIG 
DATA BIG 
= 
+ 
x MAX 
DATA 
BIG 
DATA 
BIG 
DATA 
BIG 
DATA 
BIG 
DATA 
BIG 
DATA 
BIG 
DATA 
BIG 
DATA 
BIG 
DATA 
www.vitech.com.ua Why hadoop? 5
HBase 
motivation 
Beware... 
● Hadoop is designed for 
throughput, not for latency. 
● HDFS blocks are expected to be 
large. There is issue with lot of 
small files. 
● Write once, read many times 
ideology. 
● MapReduce is not so flexible so 
any database built on top of it. 
● How about realtime? 
www.vitech.com.ua 6
LATENCY, SPEED and all 
Hadoop properties. 
HBase 
motivation 
BUT WE 
OFTEN 
NEED... 
www.vitech.com.ua 7
HBASE as is 
Architecture, data 
model, features. 
Something 
special 
But we are 
always special, 
don't you? 
INTEGRATION 
It's all not only 
about Hbase. 
www.vitech.com.ua Agenda 8
MANIFEST 
● Open source Google BigTable implementation with 
appropriate infrastructure place. 
● Limited but strict ACID guarantees. 
● Realtime, low latency, linear scalability. 
● Distributed, reliable and fault tolerant. 
● Natural integration with Hadoop infrastructure. 
● Really good for massive scans. 
● Server side user operations. 
● No any SQL. 
● Secondary indexing is pretty complex. 
www.vitech.com.ua 9
High layer applications 
Resource management 
YARN 
Distributed file system 
www.vitech.com.ua 10
KEY USERS 
www.vitech.com.ua 11
HBase: the story begins with ... 
2006 2007 2008 2009 2010 … 2014 … future 
November 2010, Facebook 
elected HBase to implement 
new messaging platform 
2010, HBase becomes 
Apache top-level project 
HBase 0.92 is considered 
production ready release 
2006, Google BigTable 
paper is published. HBase 
development starts 
2007, First code is 
released as part of 
Hadoop 0.15. Focus is on 
offline, crawl data storage 
2008, HBase goes OLTP (online transaction 
processing). 0.20 is first performance release 
www.vitech.com.ua 12
Loose data structure 
HBase: it is NoSQL 
Book: title, author, 
pages, price 
Ball: color, size, 
material, price 
Toy car: color, type, 
radio control, price 
Kind Price Title Author Pages Color Size Material Type Radio 
control 
Book + + + + 
Ball + + + + 
Toy car + + + + 
Book #1: Kind, Price, Title, Author, Pages 
Book #2: Kind, Price, Title, Author 
Ball #1: Kind, Price, Color, Size, Material 
Toy car #1: Price, Color, Type +Radio control 
● Data looks like tables with large number of columns. 
● Columns set can vary from row to row. 
● No table modification is needed to add column to row. 
www.vitech.com.ua 13
Logical data model 
Table 
Region 
Region 
Every row 
consists of 
columns. 
Row 
Key Family #1 Family #2 ... 
Column Column ... ... 
... 
... 
... 
Data is 
placed in 
tables. 
Tables are split 
into regions 
based on row 
key ranges. 
Columns are 
grouped into 
Every table row families. 
is identified by 
unique row key. 
www.vitech.com.ua 14
Table 
Region 
Real data 
model 
● Data is stored in HFile. 
● Families are stored on 
disk in separate files. 
● Row keys are 
indexed in memory. 
● Column includes key, 
qualifier, value and timestamp. 
● No column limit. 
● Storage is block based. 
Region 
Row 
Key Family #1 Family #2 ... 
Column Column ... ... 
... 
HFile: family #1 
Row key Column Value TS 
... ... ... ... 
... ... ... ... 
● Delete is just another 
marker record. 
● Periodic compaction is 
required. 
HFile: family #2 
Row key Column Value TS 
... ... ... ... 
... ... ... ... 
www.vitech.com.ua 15
Hbase: infrastructure view 
Zookeeper coordinates 
distributed elements and 
is primary contact point 
for client. 
META 
DATA 
Master server keeps metadata and 
manages data distribution over 
Region servers. 
Zookeeper Master 
RS RS RS RS 
Client 
Region servers 
manage data 
table regions. 
Clients directly 
communicate 
with region 
server for data. 
Clients locate master 
through ZooKeeper 
then needed regions 
through master. 
www.vitech.com.ua 16
Zookeeper 
coordinates 
distributed 
elements and is 
primary contact 
point for client. 
META 
DATA 
RS RS 
DN DN 
Rack 
RS RS 
DN DN 
Rack 
RS RS 
DN DN 
Rack 
NameNode 
www.vitech.com.ua 17 
Client 
Master 
Zookeeper 
Master server keeps 
metadata and manages data 
distribution over Region 
servers. 
Region servers 
manage data 
table regions. 
Actual data 
storage service 
including 
replication is on 
HDFS data 
nodes. 
Clients directly 
communicate 
with region 
server for data. 
Clients locate 
master through 
ZooKeeper then 
needed regions 
through master. 
Together with HDFS
KEY 
OPERATIONS 
GET 
PUT 
SCAN 
DELETE 
No difference if we add data 
or replace existing one. 
Get data eleent by key: rows, 
columns. 
Massive GET with key range. 
DELETE single object 
BATCH OPERATIONS ARE POSSIBLE 
www.vitech.com.ua 18
CLOSER 
VIEW 
www.vitech.com.ua 19
● Actual write is to region server. Master is not involved. 
● All requests are coming to WAL (write ahead log) to 
provide recovery. 
● Region server keeps MemStore as temporary storage. 
● Only when needed write is flushed to disk (into HFile). 
www.vitech.com.ua 20
WHY CRUD: IS Put IT and FAST? 
Delete 
Memory is intensively used. 
Writes are logged and cached in 
memory. Reads are just cached. 
● Lower layer is WRITE ONLY filesystem 
(HDFS). So both PUT and DELETE path 
is identical. DELETE is just another 
marker added. 
● Both PUT and DELETE requests are per 
row key. No row key range for DELETE. 
● Actual DELETE is performed during 
compactions. 
www.vitech.com.ua 21
CRUD: Get and Scan 
Get operation is 
implemented through Scan. 
● Get operation is simple data 
request by row key. 
● Scan operation is performed 
based on row key range which 
could involve several table 
regions. 
● Both Get and Scan can include client filters — 
expressions that are processed on server side 
and can seriously limit results so traffic. 
● Both Scan and Get operations can be performed 
on several column families. 
www.vitech.com.ua 22
SERVER SIDE TRICKS 
● Coprocessors is feature that allows to extend 
HBase without product code modification. 
● RegionObserver can attach code to operations 
on region level. 
● Similar functionality exists for Master. 
● Endpoints is the way to provide functionality 
equal to stored procedure. 
● Together coprocessor infrastructure can bring 
realtime distributed processing framework 
(lightweight MapReduce). 
www.vitech.com.ua 23
Region observers 
can be stacked. 
Region observer 
works like hook on 
region operations. Region observer Region observer Region observer Region observer 
Request 
Coprocessors: 
Region observer 
Client 
Table 
Region observer Region observer 
Result 
Region Region 
RegionServer RegionServer 
www.vitech.com.ua 24
Coprocessors: 
Endpoints 
Direct communication 
via separate protocol. 
Endpoint Endpoint 
Region Region 
RegionServer RegionServer 
Request (RPC) 
Response 
Client Table 
Your commands 
can have effect on 
table regions. 
www.vitech.com.ua 25
WHY SERVER SIDE 
IS BLACK MAGIC? 
YOU ARE MODIFYING REGION 
SERVER OR MASTER CODE 
ANY MISTAKE 
LEADS TO HELL 
JAVA CLASS LOADER REQUIRES 
SERVICE RESTART ON RELOAD 
ANY MODIFICATION 
LEADS TO HELL 
www.vitech.com.ua 26
Integration with MapReduce 
INTEGRATION 
www.vitech.com.ua 27
MAP+REDUCE + HBASE 
HMaster RegionServer Ofen single node 
JobTracker NameNode TaskTracker 
www.vitech.com.ua 28 
META 
DATA 
Integration with MapReduce 
● HBase provides number of classes for native 
MapReduce integration. Main point is data locality. 
● TableInputFormat allows massive MapReduce table 
processing (maps table with one region per mapper). 
● HBase classes like Result (Get / Scan result) or Put (Put 
request) can be passed between MapReduce job stages. 
● Not so much difference between MR1 and YARN here. 
DataNode 
so data is local
Bulk load 
MAP REDUCE CLASSICS 
● Hbase table data is mapped. One mapper per table 
region so mapped data are processed locally. 
● After local (!) mapping data is reduced. This can be 
non-local processing but it is much more light. 
● So we receive almost 100% distributed local data 
processing around the Hadoop cluster. 
HBase 
table 
Mappers Reducers 
Mapper 
Mapper 
Mapper 
Table region 
Table region 
Table region 
Reducer 
www.vitech.com.ua 29
BULK LOAD 
Bulk load 
● There is ability to load data in table MUCH FASTER. 
● Hbase internal storage files (HFile) are prepared. 
● It is preferable to generate one HFile per table 
region. MapReduce can be used. 
● Prepared HFile is merged with table storage on 
maximum speed. 
Mappers Reducers 
Data 
importers 
HFile generator 
HFile generator 
HFile generator 
Table region 
Table region 
Table region 
HFile 
HFile 
HFile 
www.vitech.com.ua 30
Table 
SECONDARY INDEX 
THROUGH COPROCESSORS 
Region 
Region 
Client Index 
table 
Put / Delete observer Index update 
Scan with filter 
Index search 
● HBase has no secondary indexing out-of-the-box. 
● Coprocessor (RegionObserver) is used to track Put 
and Delete operations and update index table. 
● Scan operations with index column filter are 
intercepted and processed based on index table 
content. 
www.vitech.com.ua 31
INDEX ALTERNATIVE: SOLR 
INDEX UPDATE 
INDEX QUERY 
Search responses 
Index update request is 
analyzed, tokenized, 
transformed... and the 
same is for queries. 
● SOLR indexes documents. What is stored into 
SOLR index is not what you index. SOLR is NOT A 
STORAGE, ONLY INDEX 
● But it can index ANYTHING. Search result is 
document ID 
www.vitech.com.ua 32
● HBase handles user data change online 
requests. 
● NGData Lily indexer handles stream of changes 
and transforms them into SOLR index change 
requests. 
● Indexes are built on SOLR so HBase data are 
searchable. 
www.vitech.com.ua 33
HBase: Data and search integration 
Replication can be 
set up to column 
HBase regions 
HDFS 
Data update 
www.vitech.com.ua 34 
Client 
User just puts (or 
deletes) data. 
Search responses 
Lily HBase 
NRT indexer 
family level. 
REPLICATION 
HBase 
cluster 
Translates data 
changes into SOLR 
index updates. 
SOLR cloud 
Search requests (HTTP) 
Apache 
Zookeeper does 
all coordination 
Finally provides 
search 
Serves low level 
file system.
Questions and discussion 
www.vitech.com.ua 35

Contenu connexe

Tendances

Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBaseCloudera, Inc.
 
Icons and Stencils for Hadoop
Icons and Stencils for HadoopIcons and Stencils for Hadoop
Icons and Stencils for HadoopHortonworks
 
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix clusterFive major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix clustermas4share
 
Apache Big Data EU 2015 - HBase
Apache Big Data EU 2015 - HBaseApache Big Data EU 2015 - HBase
Apache Big Data EU 2015 - HBaseNick Dimiduk
 
Session 01 - Into to Hadoop
Session 01 - Into to HadoopSession 01 - Into to Hadoop
Session 01 - Into to HadoopAnandMHadoop
 
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...SpringPeople
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionCloudera, Inc.
 
HBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2013: Full-Text Indexing for Apache HBaseHBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2013: Full-Text Indexing for Apache HBaseCloudera, Inc.
 
HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the BasicsHBaseCon
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base InstallCloudera, Inc.
 
White paper hadoop performancetuning
White paper hadoop performancetuningWhite paper hadoop performancetuning
White paper hadoop performancetuningAnil Reddy
 
Hive Data Modeling and Query Optimization
Hive Data Modeling and Query OptimizationHive Data Modeling and Query Optimization
Hive Data Modeling and Query OptimizationEyad Garelnabi
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresScaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresDataWorks Summit
 
Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)alexbaranau
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars GeorgeJAX London
 
Hadoop distributed computing framework for big data
Hadoop distributed computing framework for big dataHadoop distributed computing framework for big data
Hadoop distributed computing framework for big dataCyanny LIANG
 
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon
 

Tendances (20)

Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBase
 
Icons and Stencils for Hadoop
Icons and Stencils for HadoopIcons and Stencils for Hadoop
Icons and Stencils for Hadoop
 
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix clusterFive major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
 
Apache Big Data EU 2015 - HBase
Apache Big Data EU 2015 - HBaseApache Big Data EU 2015 - HBase
Apache Big Data EU 2015 - HBase
 
Apache hadoop hbase
Apache hadoop hbaseApache hadoop hbase
Apache hadoop hbase
 
Session 01 - Into to Hadoop
Session 01 - Into to HadoopSession 01 - Into to Hadoop
Session 01 - Into to Hadoop
 
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
 
HBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2013: Full-Text Indexing for Apache HBaseHBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2013: Full-Text Indexing for Apache HBase
 
HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the Basics
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base Install
 
White paper hadoop performancetuning
White paper hadoop performancetuningWhite paper hadoop performancetuning
White paper hadoop performancetuning
 
Hive Data Modeling and Query Optimization
Hive Data Modeling and Query OptimizationHive Data Modeling and Query Optimization
Hive Data Modeling and Query Optimization
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresScaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value Stores
 
Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars George
 
Apache hive
Apache hiveApache hive
Apache hive
 
Hadoop distributed computing framework for big data
Hadoop distributed computing framework for big dataHadoop distributed computing framework for big data
Hadoop distributed computing framework for big data
 
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
 
Hadoop - Apache Hbase
Hadoop - Apache HbaseHadoop - Apache Hbase
Hadoop - Apache Hbase
 

En vedette

Hindex: Secondary indexes for faster HBase queries
Hindex: Secondary indexes for faster HBase queriesHindex: Secondary indexes for faster HBase queries
Hindex: Secondary indexes for faster HBase queriesRajeshbabu Chintaguntla
 
HBase Coprocessor Introduction
HBase Coprocessor IntroductionHBase Coprocessor Introduction
HBase Coprocessor IntroductionSchubert Zhang
 
HBase Secondary Indexing
HBase Secondary Indexing HBase Secondary Indexing
HBase Secondary Indexing Gino McCarty
 
HBaseCon 2013: A Developer’s Guide to Coprocessors
HBaseCon 2013: A Developer’s Guide to CoprocessorsHBaseCon 2013: A Developer’s Guide to Coprocessors
HBaseCon 2013: A Developer’s Guide to CoprocessorsCloudera, Inc.
 
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...Cloudera, Inc.
 
Big data, map reduce and beyond
Big data, map reduce and beyondBig data, map reduce and beyond
Big data, map reduce and beyonddatasalt
 

En vedette (6)

Hindex: Secondary indexes for faster HBase queries
Hindex: Secondary indexes for faster HBase queriesHindex: Secondary indexes for faster HBase queries
Hindex: Secondary indexes for faster HBase queries
 
HBase Coprocessor Introduction
HBase Coprocessor IntroductionHBase Coprocessor Introduction
HBase Coprocessor Introduction
 
HBase Secondary Indexing
HBase Secondary Indexing HBase Secondary Indexing
HBase Secondary Indexing
 
HBaseCon 2013: A Developer’s Guide to Coprocessors
HBaseCon 2013: A Developer’s Guide to CoprocessorsHBaseCon 2013: A Developer’s Guide to Coprocessors
HBaseCon 2013: A Developer’s Guide to Coprocessors
 
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
 
Big data, map reduce and beyond
Big data, map reduce and beyondBig data, map reduce and beyond
Big data, map reduce and beyond
 

Similaire à HBase, crazy dances on the elephant back.

Big data: current technology scope.
Big data: current technology scope.Big data: current technology scope.
Big data: current technology scope.Roman Nikitchenko
 
Google Bigtable Paper Presentation
Google Bigtable Paper PresentationGoogle Bigtable Paper Presentation
Google Bigtable Paper Presentationvanjakom
 
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Modern Data Stack France
 
HDFS presented by VIJAY
HDFS presented by VIJAYHDFS presented by VIJAY
HDFS presented by VIJAYthevijayps
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewNisanth Simon
 
Java/Scala Lab: Роман Никитченко - Big Data - Big Pitfalls.
Java/Scala Lab: Роман Никитченко - Big Data - Big Pitfalls.Java/Scala Lab: Роман Никитченко - Big Data - Big Pitfalls.
Java/Scala Lab: Роман Никитченко - Big Data - Big Pitfalls.GeeksLab Odessa
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHitendra Kumar
 
Introduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopIntroduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopGERARDO BARBERENA
 
Learn what is Hadoop-and-BigData
Learn  what is Hadoop-and-BigDataLearn  what is Hadoop-and-BigData
Learn what is Hadoop-and-BigDataThanusha154
 
An Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache HadoopAn Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache HadoopChicago Hadoop Users Group
 
Intro to Apache Hadoop
Intro to Apache HadoopIntro to Apache Hadoop
Intro to Apache HadoopSufi Nawaz
 

Similaire à HBase, crazy dances on the elephant back. (20)

Big data: current technology scope.
Big data: current technology scope.Big data: current technology scope.
Big data: current technology scope.
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
Google Bigtable Paper Presentation
Google Bigtable Paper PresentationGoogle Bigtable Paper Presentation
Google Bigtable Paper Presentation
 
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
 
Hbase: an introduction
Hbase: an introductionHbase: an introduction
Hbase: an introduction
 
HDFS presented by VIJAY
HDFS presented by VIJAYHDFS presented by VIJAY
HDFS presented by VIJAY
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
 
Java/Scala Lab: Роман Никитченко - Big Data - Big Pitfalls.
Java/Scala Lab: Роман Никитченко - Big Data - Big Pitfalls.Java/Scala Lab: Роман Никитченко - Big Data - Big Pitfalls.
Java/Scala Lab: Роман Никитченко - Big Data - Big Pitfalls.
 
Big Data - Big Pitfalls.
Big Data - Big Pitfalls.Big Data - Big Pitfalls.
Big Data - Big Pitfalls.
 
getFamiliarWithHadoop
getFamiliarWithHadoopgetFamiliarWithHadoop
getFamiliarWithHadoop
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
Hbase 20141003
Hbase 20141003Hbase 20141003
Hbase 20141003
 
Hadoop ppt2
Hadoop ppt2Hadoop ppt2
Hadoop ppt2
 
Introduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopIntroduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to Hadoop
 
Learn what is Hadoop-and-BigData
Learn  what is Hadoop-and-BigDataLearn  what is Hadoop-and-BigData
Learn what is Hadoop-and-BigData
 
An Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache HadoopAn Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache Hadoop
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
 
Training
TrainingTraining
Training
 
Intro to Apache Hadoop
Intro to Apache HadoopIntro to Apache Hadoop
Intro to Apache Hadoop
 
HBase introduction talk
HBase introduction talkHBase introduction talk
HBase introduction talk
 

Plus de Roman Nikitchenko

Public presentations for software engineers
Public presentations for software engineersPublic presentations for software engineers
Public presentations for software engineersRoman Nikitchenko
 
BIG DATA: From mammoth to elephant
BIG DATA: From mammoth to elephantBIG DATA: From mammoth to elephant
BIG DATA: From mammoth to elephantRoman Nikitchenko
 
Big data & frameworks: no book for you anymore.
Big data & frameworks: no book for you anymore.Big data & frameworks: no book for you anymore.
Big data & frameworks: no book for you anymore.Roman Nikitchenko
 
Elephant grooming: quality with Hadoop
Elephant grooming: quality with HadoopElephant grooming: quality with Hadoop
Elephant grooming: quality with HadoopRoman Nikitchenko
 
Big Data: fall seven times, stand up eight!
Big Data: fall seven times, stand up eight!Big Data: fall seven times, stand up eight!
Big Data: fall seven times, stand up eight!Roman Nikitchenko
 
Big data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureBig data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureRoman Nikitchenko
 

Plus de Roman Nikitchenko (6)

Public presentations for software engineers
Public presentations for software engineersPublic presentations for software engineers
Public presentations for software engineers
 
BIG DATA: From mammoth to elephant
BIG DATA: From mammoth to elephantBIG DATA: From mammoth to elephant
BIG DATA: From mammoth to elephant
 
Big data & frameworks: no book for you anymore.
Big data & frameworks: no book for you anymore.Big data & frameworks: no book for you anymore.
Big data & frameworks: no book for you anymore.
 
Elephant grooming: quality with Hadoop
Elephant grooming: quality with HadoopElephant grooming: quality with Hadoop
Elephant grooming: quality with Hadoop
 
Big Data: fall seven times, stand up eight!
Big Data: fall seven times, stand up eight!Big Data: fall seven times, stand up eight!
Big Data: fall seven times, stand up eight!
 
Big data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureBig data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructure
 

Dernier

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 

Dernier (20)

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 

HBase, crazy dances on the elephant back.

  • 1. Apache HBase Crazy dances on the elephant back Roman Nikitchenko, 16.10.2014
  • 3. FIRST EVER DATA OS 10.000 nodes computer... Recent technology changes are focused on higher scale. Better resource usage and control, lower MTTR, higher security, redundancy, fault tolerance. www.vitech.com.ua 3
  • 4. ● Hadoop is open source framework for big data. Both distributed storage and processing. ● Hadoop is reliable and fault tolerant with no rely on hardware for these properties. ● Hadoop has unique horisontal scalability. Currently — from single computer up to thousands of cluster nodes. www.vitech.com.ua 4
  • 5. What is HADOOP INDEED? BIG DATA BIG = + x MAX DATA BIG DATA BIG DATA BIG DATA BIG DATA BIG DATA BIG DATA BIG DATA BIG DATA www.vitech.com.ua Why hadoop? 5
  • 6. HBase motivation Beware... ● Hadoop is designed for throughput, not for latency. ● HDFS blocks are expected to be large. There is issue with lot of small files. ● Write once, read many times ideology. ● MapReduce is not so flexible so any database built on top of it. ● How about realtime? www.vitech.com.ua 6
  • 7. LATENCY, SPEED and all Hadoop properties. HBase motivation BUT WE OFTEN NEED... www.vitech.com.ua 7
  • 8. HBASE as is Architecture, data model, features. Something special But we are always special, don't you? INTEGRATION It's all not only about Hbase. www.vitech.com.ua Agenda 8
  • 9. MANIFEST ● Open source Google BigTable implementation with appropriate infrastructure place. ● Limited but strict ACID guarantees. ● Realtime, low latency, linear scalability. ● Distributed, reliable and fault tolerant. ● Natural integration with Hadoop infrastructure. ● Really good for massive scans. ● Server side user operations. ● No any SQL. ● Secondary indexing is pretty complex. www.vitech.com.ua 9
  • 10. High layer applications Resource management YARN Distributed file system www.vitech.com.ua 10
  • 12. HBase: the story begins with ... 2006 2007 2008 2009 2010 … 2014 … future November 2010, Facebook elected HBase to implement new messaging platform 2010, HBase becomes Apache top-level project HBase 0.92 is considered production ready release 2006, Google BigTable paper is published. HBase development starts 2007, First code is released as part of Hadoop 0.15. Focus is on offline, crawl data storage 2008, HBase goes OLTP (online transaction processing). 0.20 is first performance release www.vitech.com.ua 12
  • 13. Loose data structure HBase: it is NoSQL Book: title, author, pages, price Ball: color, size, material, price Toy car: color, type, radio control, price Kind Price Title Author Pages Color Size Material Type Radio control Book + + + + Ball + + + + Toy car + + + + Book #1: Kind, Price, Title, Author, Pages Book #2: Kind, Price, Title, Author Ball #1: Kind, Price, Color, Size, Material Toy car #1: Price, Color, Type +Radio control ● Data looks like tables with large number of columns. ● Columns set can vary from row to row. ● No table modification is needed to add column to row. www.vitech.com.ua 13
  • 14. Logical data model Table Region Region Every row consists of columns. Row Key Family #1 Family #2 ... Column Column ... ... ... ... ... Data is placed in tables. Tables are split into regions based on row key ranges. Columns are grouped into Every table row families. is identified by unique row key. www.vitech.com.ua 14
  • 15. Table Region Real data model ● Data is stored in HFile. ● Families are stored on disk in separate files. ● Row keys are indexed in memory. ● Column includes key, qualifier, value and timestamp. ● No column limit. ● Storage is block based. Region Row Key Family #1 Family #2 ... Column Column ... ... ... HFile: family #1 Row key Column Value TS ... ... ... ... ... ... ... ... ● Delete is just another marker record. ● Periodic compaction is required. HFile: family #2 Row key Column Value TS ... ... ... ... ... ... ... ... www.vitech.com.ua 15
  • 16. Hbase: infrastructure view Zookeeper coordinates distributed elements and is primary contact point for client. META DATA Master server keeps metadata and manages data distribution over Region servers. Zookeeper Master RS RS RS RS Client Region servers manage data table regions. Clients directly communicate with region server for data. Clients locate master through ZooKeeper then needed regions through master. www.vitech.com.ua 16
  • 17. Zookeeper coordinates distributed elements and is primary contact point for client. META DATA RS RS DN DN Rack RS RS DN DN Rack RS RS DN DN Rack NameNode www.vitech.com.ua 17 Client Master Zookeeper Master server keeps metadata and manages data distribution over Region servers. Region servers manage data table regions. Actual data storage service including replication is on HDFS data nodes. Clients directly communicate with region server for data. Clients locate master through ZooKeeper then needed regions through master. Together with HDFS
  • 18. KEY OPERATIONS GET PUT SCAN DELETE No difference if we add data or replace existing one. Get data eleent by key: rows, columns. Massive GET with key range. DELETE single object BATCH OPERATIONS ARE POSSIBLE www.vitech.com.ua 18
  • 20. ● Actual write is to region server. Master is not involved. ● All requests are coming to WAL (write ahead log) to provide recovery. ● Region server keeps MemStore as temporary storage. ● Only when needed write is flushed to disk (into HFile). www.vitech.com.ua 20
  • 21. WHY CRUD: IS Put IT and FAST? Delete Memory is intensively used. Writes are logged and cached in memory. Reads are just cached. ● Lower layer is WRITE ONLY filesystem (HDFS). So both PUT and DELETE path is identical. DELETE is just another marker added. ● Both PUT and DELETE requests are per row key. No row key range for DELETE. ● Actual DELETE is performed during compactions. www.vitech.com.ua 21
  • 22. CRUD: Get and Scan Get operation is implemented through Scan. ● Get operation is simple data request by row key. ● Scan operation is performed based on row key range which could involve several table regions. ● Both Get and Scan can include client filters — expressions that are processed on server side and can seriously limit results so traffic. ● Both Scan and Get operations can be performed on several column families. www.vitech.com.ua 22
  • 23. SERVER SIDE TRICKS ● Coprocessors is feature that allows to extend HBase without product code modification. ● RegionObserver can attach code to operations on region level. ● Similar functionality exists for Master. ● Endpoints is the way to provide functionality equal to stored procedure. ● Together coprocessor infrastructure can bring realtime distributed processing framework (lightweight MapReduce). www.vitech.com.ua 23
  • 24. Region observers can be stacked. Region observer works like hook on region operations. Region observer Region observer Region observer Region observer Request Coprocessors: Region observer Client Table Region observer Region observer Result Region Region RegionServer RegionServer www.vitech.com.ua 24
  • 25. Coprocessors: Endpoints Direct communication via separate protocol. Endpoint Endpoint Region Region RegionServer RegionServer Request (RPC) Response Client Table Your commands can have effect on table regions. www.vitech.com.ua 25
  • 26. WHY SERVER SIDE IS BLACK MAGIC? YOU ARE MODIFYING REGION SERVER OR MASTER CODE ANY MISTAKE LEADS TO HELL JAVA CLASS LOADER REQUIRES SERVICE RESTART ON RELOAD ANY MODIFICATION LEADS TO HELL www.vitech.com.ua 26
  • 27. Integration with MapReduce INTEGRATION www.vitech.com.ua 27
  • 28. MAP+REDUCE + HBASE HMaster RegionServer Ofen single node JobTracker NameNode TaskTracker www.vitech.com.ua 28 META DATA Integration with MapReduce ● HBase provides number of classes for native MapReduce integration. Main point is data locality. ● TableInputFormat allows massive MapReduce table processing (maps table with one region per mapper). ● HBase classes like Result (Get / Scan result) or Put (Put request) can be passed between MapReduce job stages. ● Not so much difference between MR1 and YARN here. DataNode so data is local
  • 29. Bulk load MAP REDUCE CLASSICS ● Hbase table data is mapped. One mapper per table region so mapped data are processed locally. ● After local (!) mapping data is reduced. This can be non-local processing but it is much more light. ● So we receive almost 100% distributed local data processing around the Hadoop cluster. HBase table Mappers Reducers Mapper Mapper Mapper Table region Table region Table region Reducer www.vitech.com.ua 29
  • 30. BULK LOAD Bulk load ● There is ability to load data in table MUCH FASTER. ● Hbase internal storage files (HFile) are prepared. ● It is preferable to generate one HFile per table region. MapReduce can be used. ● Prepared HFile is merged with table storage on maximum speed. Mappers Reducers Data importers HFile generator HFile generator HFile generator Table region Table region Table region HFile HFile HFile www.vitech.com.ua 30
  • 31. Table SECONDARY INDEX THROUGH COPROCESSORS Region Region Client Index table Put / Delete observer Index update Scan with filter Index search ● HBase has no secondary indexing out-of-the-box. ● Coprocessor (RegionObserver) is used to track Put and Delete operations and update index table. ● Scan operations with index column filter are intercepted and processed based on index table content. www.vitech.com.ua 31
  • 32. INDEX ALTERNATIVE: SOLR INDEX UPDATE INDEX QUERY Search responses Index update request is analyzed, tokenized, transformed... and the same is for queries. ● SOLR indexes documents. What is stored into SOLR index is not what you index. SOLR is NOT A STORAGE, ONLY INDEX ● But it can index ANYTHING. Search result is document ID www.vitech.com.ua 32
  • 33. ● HBase handles user data change online requests. ● NGData Lily indexer handles stream of changes and transforms them into SOLR index change requests. ● Indexes are built on SOLR so HBase data are searchable. www.vitech.com.ua 33
  • 34. HBase: Data and search integration Replication can be set up to column HBase regions HDFS Data update www.vitech.com.ua 34 Client User just puts (or deletes) data. Search responses Lily HBase NRT indexer family level. REPLICATION HBase cluster Translates data changes into SOLR index updates. SOLR cloud Search requests (HTTP) Apache Zookeeper does all coordination Finally provides search Serves low level file system.
  • 35. Questions and discussion www.vitech.com.ua 35