SlideShare une entreprise Scribd logo
1  sur  31
HBaseCon East 2016
HBase and Spark, state of the art
• Java Message Service => JMS
• Solutions Architect at Cloudera
• A bit of everything…
• Development
• Team/Project manager
• Architect
• O'Reilly author of Architecting HBase Applications
• International
• Worked from Paris to Los Angeles
• More than 100 flights per year
• HBase (and others) contributor
About Jean-Marc Spaggiari
HBaseCon East 2016
Overview
• Where we came from
• Examples of code
• Improvements that are coming up
• Spark Streaming Use case
HBaseCon East 2016
Source of Demand
• Demand started in the field
• Use Cases
• APIs access Gets, Puts, Scans
• MapReduce Mass Scans
• MapReduce Bulk Load
• MapReduce Smart gets and puts
• Spark has all but killed MapReduce
• Spark Streaming has grown in popularity
• Populating Aggregates
• Entity Centric-Time Series data store i.e. OpenTSDB
• Look ups for joins or mutations
HBaseCon East 2016
How it Started
• Started on GitHub
• Andrew Purtell started the effort to put into HBase
• Big call out to Sean B, Jon H, Ted Y, Ted M and Matteo B
• Components
• Normal Spark
• Spark Streaming
• Bulk Load
• SparkSQL
HBaseCon East 2016
Under the covers
HBaseCon East 2016
Driver
Walker Node
Configs
Executor
Static Space
Configs
HConnection
Tasks Tasks
Walker Node
Executor
Static Space
Configs
HConnection
Tasks Tasks
Key Addition: HBaseContext
Create an HBaseContext :
// An Hadoop/HBase Configuration object
val conf = HBaseConfiguration.create()
conf.addResource(new Path("/etc/hbase/conf/core-site.xml"))
conf.addResource(new Path("/etc/hbase/conf/hbase-site.xml"))
// sc is the Spark Context; hbase context corresponds to an HBase Connection
val hbaseContext = new HBaseContext(sc, conf)
HBaseCon East 2016
• Foreach
• Map
• BulkLoad
• BulkLoadThinRows
• BulkGet (aka Multiget)
• BulkDelete
• Most of them in both Java and Scala
Operations on the HBaseContext
HBaseCon East2016
Foreach
Read data in parallel for each partition and compute :
rdd.hbaseForeachPartition(hbaseContext, (it, conn) => {
// do something
val bufferedMutator = conn.getBufferedMutator(TableName.valueOf("t1"))
it.foreach(r => {
... // HBase API put/incr/append/cas calls
}
bufferedMutator.flush()
bufferedMutator.close()
})
HBaseCon East 2016
Foreach
Read data in parallel for each partition and compute :
hbaseContext.foreachPartition(keyValuesPuts,
new VoidFunction<Tuple2<Iterator<Put>, Connection>>() {
@Override
public void call(Tuple2<Iterator<Put>, Connection> t) throws Exception {
BufferedMutator mutator = t._2().getBufferedMutator(TABLE_NAME);
while (t._1().hasNext()) {
... // HBase API put/incr/append/cas calls
}
mutator.flush();
mutator.close();
} }); });
HBaseCon East 2016
Map
Take a dataset and map it in parallel for each partition to produce a new RDD or process it
val getRdd = rdd.hbaseMapPartitions(hbaseContext, (it, conn) => {
val table = conn.getTable(TableName.valueOf("t1"))
var res = mutable.MutableList[String]()
it.map( r => {
... // HBase API Scan Results
}
})
HBaseCon East 2016
BulkLoad
Bulk load a data set into Hbase (for all cases, generally wide tables) (Scala only)
rdd.hbaseBulkLoad(hbaseContext, tableName,
t => {
val rowKey = t._1
val fam:Array[Byte] = t._2._1
val qual = t._2._2
val value = t._2._3
val keyFamilyQualifier= new KeyFamilyQualifier(rowKey, fam, qual)
Seq((keyFamilyQualifier, value)).iterator
}, stagingFolder)
val load = new LoadIncrementalHFiles(config)
load.run(Array(stagingFolder, tableNameString))
HBaseCon East 2016
BulkLoad
Bulk load a data set into Hbase (for all cases, generally wide tables) (Scala only)
rdd.hbaseBulkLoad(hbaseContext, tableName,
t => {
val rowKey = t._1
val fam:Array[Byte] = t._2._1
val qual = t._2._2
val value = t._2._3
val keyFamilyQualifier= new KeyFamilyQualifier(rowKey, fam, qual)
Seq((keyFamilyQualifier, value)).iterator
}, stagingFolder)
val load = new LoadIncrementalHFiles(config)
load.run(Array(stagingFolder, tableNameString))
HBaseCon East 2016
Bulk load a data set into Hbase (for all cases, generally wide tables) (Scala only)
rdd.hbaseBulkLoad(hbaseContext, tableName,
t => {
val rowKey = t._1
val fam:Array[Byte] = t._2._1
val qual = t._2._2
val value = t._2._3
val keyFamilyQualifier= new KeyFamilyQualifier(rowKey, fam, qual)
Seq((keyFamilyQualifier, value)).iterator
}, stagingFolder)
val load = new LoadIncrementalHFiles(config)
load.run(Array(stagingFolder, tableNameString))
BulkLoad
HBaseCon East 2016
BulkLoadThinRows
Bulk load a data set into HBase (for skinny tables, <10k cols)
hbaseContext.bulkLoadThinRows[(String, Iterable[(Array[Byte], Array[Byte],
Array[Byte])])] (rdd, TableName.valueOf(tableName), t => {
val rowKey = Bytes.toBytes(t._1)
val familyQualifiersValues = new FamiliesQualifiersValues
t._2.foreach(f => {
val family:Array[Byte] = f._1
val qualifier = f._2
val value:Array[Byte] = f._3
familyQualifiersValues +=(family, qualifier, value)
})
(new ByteArrayWrapper(rowKey), familyQualifiersValues)
}, stagingFolder.getPath)
HBaseCon East 2016
BulkLoadThinRows
Bulk load a data set into HBase (for skinny tables, <10k cols)
hbaseContext.bulkLoadThinRows[(String, Iterable[(Array[Byte], Array[Byte],
Array[Byte])])] (rdd, TableName.valueOf(tableName), t => {
val rowKey = Bytes.toBytes(t._1)
val familyQualifiersValues = new FamiliesQualifiersValues
t._2.foreach(f => {
val family:Array[Byte] = f._1
val qualifier = f._2
val value:Array[Byte] = f._3
familyQualifiersValues +=(family, qualifier, value)
})
(new ByteArrayWrapper(rowKey), familyQualifiersValues)
}, stagingFolder.getPath)
HBaseCon East 2016
BulkPut
Parallelized HBase Multiput :
hbaseContext.bulkPut[(Array[Byte], Array[(Array[Byte], Array[Byte],
Array[Byte])])](rdd, tableName, (putRecord) => {
val put = new Put(putRecord._1)
putRecord._2.foreach((putValue) =>
put.add(putValue._1, putValue._2, putValue._3))
put
}
HBaseCon East 2016
BulkPut
Parallelized HBase Multiput :
hbaseContext.bulkPut(textFile, TABLE_NAME, new Function<String, Put>() {
@Override
public Put call(String v1) throws Exception {
String[] tokens = v1.split("|");
Put put = new Put(Bytes.toBytes(tokens[0]));
put.addColumn(Bytes.toBytes("segment"),
Bytes.toBytes(tokens[1]),
Bytes.toBytes(tokens[2]));
return put;
}
});
HBaseCon East 2016
BulkDelete
Parallelized HBase Multi-deletes
hbaseContext.bulkDelete[Array[Byte]](rdd, tableName,
putRecord => new Delete(putRecord),
4) // batch size
rdd.hbaseBulkDelete(hbaseContext, tableName,
putRecord => new Delete(putRecord),
4) // batch size
HBaseCon East 2016
Table RDD
How to materialize a table as a Spark RDD.
HBaseCon East 2016
Table RDD
How to materialize a table as a Spark RDD.
HBaseCon East 2016
Table RDD
How to materialize a table as a Spark RDD.
HBaseCon East 2016
Table RDD
How to materialize a table as a Spark RDD.
HBaseCon East 2016
What Improvements Have We Made?
▪ Combine Spark and HBase
• Spark Catalyst Engine for Query Plan and Optimization
• HBase for Fast Access KV Store
• Implement Standard External Data Source with Built-in Filter
▪ High Performance
• Data Locality: Move Computation to Data
• Partition Pruning: Task only Performed in RS Holding Requested Data
• Column Pruning / Predicate Pushdown: Reduce Network Overhead
▪ Full Fledged DataFrame Support
• Spark-SQL
• Integrated Language Query
▪ Run on Top of Existing HBase Table
• Native Support Java Primitive Types
▪ Still some work and improvements to be done
• HBASE-16638 Reduce the number of Connections
• HBASE-14217 Add Java access to Spark bulk load functionality
HBaseCon East 2016
Data frame + HBase
WIP... 2.0?
HBaseCon East 2016
Usage - Define the Catalog
def catalog = s"""{
|"table":{"namespace":"default", "name":"table1"},
|"rowkey":"key",
|"columns":{
|"col0":{"cf":"rowkey", "col":"key", "type":"string"},
|"col1":{"cf":"cf1", "col":"col1", "type":"boolean"},
|"col2":{"cf":"cf1", "col":"col2", "type":"string"}
|}
|}""".stripMargin
HBase table to dataframe table catalog mapping:
HBaseCon East 2016
Usage – Write to HBase
sc.parallelize(data)
.toDF
.write
.options(Map(HBaseTableCatalog.tableCatalog -> catalog,
HBaseTableCatalog.newTable -> "5"))
.format("org.apache.spark.sql.execution.datasources.hbase")
.save()
Export RDD into a new HBase table with DataFrame
HBaseCon East 2016
Usage– Construct DataFrame
def withCatalog(cat: String): DataFrame = {
sqlContext
.read
.options(Map(HBaseTableCatalog.tableCatalog->catalog))
.format("org.apache.spark.sql.execution.datasources.hbase")
.load()
}
Import RDD into a new HBase table with DataFrame
HBaseCon East 2016
Usage - Language Integrate Query/SQL
val df = withCatalog(catalog)
val s = df.filter((($"col0" <= "row050" && $"col0" > "row040") ||
$"col0" === "row005" ||
$"col0" === "row020" ||
$"col0" === "r20" ||
$"col0" <= "row005") &&
($"col2" === 1 ||
$"col2" === 42))
.select("col0", "col1", "col2")
s.show
Import RDD into a new HBase table with DataFrame
HBaseCon East 2016
Usage - Language Integrate Query/SQL
// Load the dataframe
val df = withCatalog(catalog)
//SQL example
df.registerTempTable("table")
sqlContext.sql("select count(col1) from table").show
Import RDD into a new HBase table with DataFrame
HBaseCon East 2016
Spark Streaming Example
KafkaProducer
Spark
Streaming
HBase SOLR
HBaseCon East 2016

Contenu connexe

Tendances

Apache Big Data EU 2015 - HBase
Apache Big Data EU 2015 - HBaseApache Big Data EU 2015 - HBase
Apache Big Data EU 2015 - HBaseNick Dimiduk
 
Apache Big Data EU 2015 - Phoenix
Apache Big Data EU 2015 - PhoenixApache Big Data EU 2015 - Phoenix
Apache Big Data EU 2015 - PhoenixNick Dimiduk
 
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Chicago Hadoop Users Group
 
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseApache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseJosh Elser
 
Dancing with the elephant h base1_final
Dancing with the elephant   h base1_finalDancing with the elephant   h base1_final
Dancing with the elephant h base1_finalasterix_smartplatf
 
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon
 
Apache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New FeaturesApache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New FeaturesHBaseCon
 
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...Cloudera, Inc.
 
Mapreduce over snapshots
Mapreduce over snapshotsMapreduce over snapshots
Mapreduce over snapshotsenissoz
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem DataWorks Summit/Hadoop Summit
 
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...Yahoo Developer Network
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBill Liu
 
Apache Phoenix: Transforming HBase into a SQL Database
Apache Phoenix: Transforming HBase into a SQL DatabaseApache Phoenix: Transforming HBase into a SQL Database
Apache Phoenix: Transforming HBase into a SQL DatabaseDataWorks Summit
 
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix clusterFive major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix clustermas4share
 
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...DataWorks Summit/Hadoop Summit
 
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks
 

Tendances (20)

Apache Big Data EU 2015 - HBase
Apache Big Data EU 2015 - HBaseApache Big Data EU 2015 - HBase
Apache Big Data EU 2015 - HBase
 
Apache Big Data EU 2015 - Phoenix
Apache Big Data EU 2015 - PhoenixApache Big Data EU 2015 - Phoenix
Apache Big Data EU 2015 - Phoenix
 
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
 
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseApache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
 
April 2014 HUG : Apache Phoenix
April 2014 HUG : Apache PhoenixApril 2014 HUG : Apache Phoenix
April 2014 HUG : Apache Phoenix
 
Dancing with the elephant h base1_final
Dancing with the elephant   h base1_finalDancing with the elephant   h base1_final
Dancing with the elephant h base1_final
 
Spark + HBase
Spark + HBase Spark + HBase
Spark + HBase
 
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
 
Apache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New FeaturesApache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New Features
 
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
 
Mapreduce over snapshots
Mapreduce over snapshotsMapreduce over snapshots
Mapreduce over snapshots
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem
 
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
 
Apache Phoenix: Transforming HBase into a SQL Database
Apache Phoenix: Transforming HBase into a SQL DatabaseApache Phoenix: Transforming HBase into a SQL Database
Apache Phoenix: Transforming HBase into a SQL Database
 
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix clusterFive major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
 
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
 
Apache Hive on ACID
Apache Hive on ACIDApache Hive on ACID
Apache Hive on ACID
 
Apache Phoenix + Apache HBase
Apache Phoenix + Apache HBaseApache Phoenix + Apache HBase
Apache Phoenix + Apache HBase
 
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix
 

En vedette

Apache HBase 入門 (第2回)
Apache HBase 入門 (第2回)Apache HBase 入門 (第2回)
Apache HBase 入門 (第2回)tatsuya6502
 
Apache Spark streaming and HBase
Apache Spark streaming and HBaseApache Spark streaming and HBase
Apache Spark streaming and HBaseCarol McDonald
 
HBaseとSparkでセンサーデータを有効活用 #hbasejp
HBaseとSparkでセンサーデータを有効活用 #hbasejpHBaseとSparkでセンサーデータを有効活用 #hbasejp
HBaseとSparkでセンサーデータを有効活用 #hbasejpFwardNetwork
 
Apache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to UnderstandApache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to UnderstandJosh Elser
 
Free Code Friday - Spark Streaming with HBase
Free Code Friday - Spark Streaming with HBaseFree Code Friday - Spark Streaming with HBase
Free Code Friday - Spark Streaming with HBaseMapR Technologies
 
Apache HBase 入門 (第1回)
Apache HBase 入門 (第1回)Apache HBase 入門 (第1回)
Apache HBase 入門 (第1回)tatsuya6502
 

En vedette (6)

Apache HBase 入門 (第2回)
Apache HBase 入門 (第2回)Apache HBase 入門 (第2回)
Apache HBase 入門 (第2回)
 
Apache Spark streaming and HBase
Apache Spark streaming and HBaseApache Spark streaming and HBase
Apache Spark streaming and HBase
 
HBaseとSparkでセンサーデータを有効活用 #hbasejp
HBaseとSparkでセンサーデータを有効活用 #hbasejpHBaseとSparkでセンサーデータを有効活用 #hbasejp
HBaseとSparkでセンサーデータを有効活用 #hbasejp
 
Apache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to UnderstandApache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to Understand
 
Free Code Friday - Spark Streaming with HBase
Free Code Friday - Spark Streaming with HBaseFree Code Friday - Spark Streaming with HBase
Free Code Friday - Spark Streaming with HBase
 
Apache HBase 入門 (第1回)
Apache HBase 入門 (第1回)Apache HBase 入門 (第1回)
Apache HBase 入門 (第1回)
 

Similaire à HBaseConEast2016: HBase and Spark, State of the Art

H base introduction & development
H base introduction & developmentH base introduction & development
H base introduction & developmentShashwat Shriparv
 
Data Processing with Cascading Java API on Apache Hadoop
Data Processing with Cascading Java API on Apache HadoopData Processing with Cascading Java API on Apache Hadoop
Data Processing with Cascading Java API on Apache HadoopHikmat Dhamee
 
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant) Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant) BigDataEverywhere
 
Cascading Map-Side Joins over HBase for Scalable Join Processing
Cascading Map-Side Joins over HBase for Scalable Join ProcessingCascading Map-Side Joins over HBase for Scalable Join Processing
Cascading Map-Side Joins over HBase for Scalable Join ProcessingAlexander Schätzle
 
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaAdvance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaDesing Pathshala
 
Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14Jeremy Walsh
 
Introduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and SecurityIntroduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and SecurityMapR Technologies
 
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkHBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkMichael Stack
 
Map-Reduce and Apache Hadoop
Map-Reduce and Apache HadoopMap-Reduce and Apache Hadoop
Map-Reduce and Apache HadoopSvetlin Nakov
 
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsCassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsDataStax Academy
 
Работа с документами в JavaScript
Работа с документами в JavaScriptРабота с документами в JavaScript
Работа с документами в JavaScriptДмитрий Радыно
 
Staying Ahead of the Curve with Spring and Cassandra 4
Staying Ahead of the Curve with Spring and Cassandra 4Staying Ahead of the Curve with Spring and Cassandra 4
Staying Ahead of the Curve with Spring and Cassandra 4VMware Tanzu
 
Staying Ahead of the Curve with Spring and Cassandra 4 (SpringOne 2020)
Staying Ahead of the Curve with Spring and Cassandra 4 (SpringOne 2020)Staying Ahead of the Curve with Spring and Cassandra 4 (SpringOne 2020)
Staying Ahead of the Curve with Spring and Cassandra 4 (SpringOne 2020)Alexandre Dutra
 
Spark Streaming, Machine Learning and meetup.com streaming API.
Spark Streaming, Machine Learning and  meetup.com streaming API.Spark Streaming, Machine Learning and  meetup.com streaming API.
Spark Streaming, Machine Learning and meetup.com streaming API.Sergey Zelvenskiy
 
Nov. 4, 2011 o reilly webcast-hbase- lars george
Nov. 4, 2011 o reilly webcast-hbase- lars georgeNov. 4, 2011 o reilly webcast-hbase- lars george
Nov. 4, 2011 o reilly webcast-hbase- lars georgeO'Reilly Media
 
Data science at the command line
Data science at the command lineData science at the command line
Data science at the command lineSharat Chikkerur
 

Similaire à HBaseConEast2016: HBase and Spark, State of the Art (20)

H base introduction & development
H base introduction & developmentH base introduction & development
H base introduction & development
 
NoSQL & HBase overview
NoSQL & HBase overviewNoSQL & HBase overview
NoSQL & HBase overview
 
Data Processing with Cascading Java API on Apache Hadoop
Data Processing with Cascading Java API on Apache HadoopData Processing with Cascading Java API on Apache Hadoop
Data Processing with Cascading Java API on Apache Hadoop
 
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant) Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
 
Cascading Map-Side Joins over HBase for Scalable Join Processing
Cascading Map-Side Joins over HBase for Scalable Join ProcessingCascading Map-Side Joins over HBase for Scalable Join Processing
Cascading Map-Side Joins over HBase for Scalable Join Processing
 
מיכאל
מיכאלמיכאל
מיכאל
 
Hspark index conf
Hspark index confHspark index conf
Hspark index conf
 
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaAdvance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
 
Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14
 
Introduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and SecurityIntroduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and Security
 
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkHBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
 
Map-Reduce and Apache Hadoop
Map-Reduce and Apache HadoopMap-Reduce and Apache Hadoop
Map-Reduce and Apache Hadoop
 
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsCassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
 
Работа с документами в JavaScript
Работа с документами в JavaScriptРабота с документами в JavaScript
Работа с документами в JavaScript
 
20170126 big data processing
20170126 big data processing20170126 big data processing
20170126 big data processing
 
Staying Ahead of the Curve with Spring and Cassandra 4
Staying Ahead of the Curve with Spring and Cassandra 4Staying Ahead of the Curve with Spring and Cassandra 4
Staying Ahead of the Curve with Spring and Cassandra 4
 
Staying Ahead of the Curve with Spring and Cassandra 4 (SpringOne 2020)
Staying Ahead of the Curve with Spring and Cassandra 4 (SpringOne 2020)Staying Ahead of the Curve with Spring and Cassandra 4 (SpringOne 2020)
Staying Ahead of the Curve with Spring and Cassandra 4 (SpringOne 2020)
 
Spark Streaming, Machine Learning and meetup.com streaming API.
Spark Streaming, Machine Learning and  meetup.com streaming API.Spark Streaming, Machine Learning and  meetup.com streaming API.
Spark Streaming, Machine Learning and meetup.com streaming API.
 
Nov. 4, 2011 o reilly webcast-hbase- lars george
Nov. 4, 2011 o reilly webcast-hbase- lars georgeNov. 4, 2011 o reilly webcast-hbase- lars george
Nov. 4, 2011 o reilly webcast-hbase- lars george
 
Data science at the command line
Data science at the command lineData science at the command line
Data science at the command line
 

Plus de Michael Stack

hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloud
hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloudhbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloud
hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on CloudMichael Stack
 
hbaseconasia2019 Recent work on HBase at Pinterest
hbaseconasia2019 Recent work on HBase at Pinteresthbaseconasia2019 Recent work on HBase at Pinterest
hbaseconasia2019 Recent work on HBase at PinterestMichael Stack
 
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltdhbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., LtdMichael Stack
 
hbaseconasia2019 HBase at Didi
hbaseconasia2019 HBase at Didihbaseconasia2019 HBase at Didi
hbaseconasia2019 HBase at DidiMichael Stack
 
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...Michael Stack
 
hbaseconasia2019 HBase at Tencent
hbaseconasia2019 HBase at Tencenthbaseconasia2019 HBase at Tencent
hbaseconasia2019 HBase at TencentMichael Stack
 
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...Michael Stack
 
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...Michael Stack
 
hbaseconasia2019 Pharos as a Pluggable Secondary Index Component
hbaseconasia2019 Pharos as a Pluggable Secondary Index Componenthbaseconasia2019 Pharos as a Pluggable Secondary Index Component
hbaseconasia2019 Pharos as a Pluggable Secondary Index ComponentMichael Stack
 
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibabahbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at AlibabaMichael Stack
 
hbaseconasia2019 OpenTSDB at Xiaomi
hbaseconasia2019 OpenTSDB at Xiaomihbaseconasia2019 OpenTSDB at Xiaomi
hbaseconasia2019 OpenTSDB at XiaomiMichael Stack
 
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Sparkhbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and SparkMichael Stack
 
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBasehbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBaseMichael Stack
 
hbaseconasia2019 Distributed Bitmap Index Solution
hbaseconasia2019 Distributed Bitmap Index Solutionhbaseconasia2019 Distributed Bitmap Index Solution
hbaseconasia2019 Distributed Bitmap Index SolutionMichael Stack
 
hbaseconasia2019 HBase Bucket Cache on Persistent Memory
hbaseconasia2019 HBase Bucket Cache on Persistent Memoryhbaseconasia2019 HBase Bucket Cache on Persistent Memory
hbaseconasia2019 HBase Bucket Cache on Persistent MemoryMichael Stack
 
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACLhbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACLMichael Stack
 
hbaseconasia2019 BDS: A data synchronization platform for HBase
hbaseconasia2019 BDS: A data synchronization platform for HBasehbaseconasia2019 BDS: A data synchronization platform for HBase
hbaseconasia2019 BDS: A data synchronization platform for HBaseMichael Stack
 
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...Michael Stack
 
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...Michael Stack
 
HBaseConAsia2019 Keynote
HBaseConAsia2019 KeynoteHBaseConAsia2019 Keynote
HBaseConAsia2019 KeynoteMichael Stack
 

Plus de Michael Stack (20)

hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloud
hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloudhbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloud
hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloud
 
hbaseconasia2019 Recent work on HBase at Pinterest
hbaseconasia2019 Recent work on HBase at Pinteresthbaseconasia2019 Recent work on HBase at Pinterest
hbaseconasia2019 Recent work on HBase at Pinterest
 
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltdhbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd
 
hbaseconasia2019 HBase at Didi
hbaseconasia2019 HBase at Didihbaseconasia2019 HBase at Didi
hbaseconasia2019 HBase at Didi
 
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...
 
hbaseconasia2019 HBase at Tencent
hbaseconasia2019 HBase at Tencenthbaseconasia2019 HBase at Tencent
hbaseconasia2019 HBase at Tencent
 
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
 
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
 
hbaseconasia2019 Pharos as a Pluggable Secondary Index Component
hbaseconasia2019 Pharos as a Pluggable Secondary Index Componenthbaseconasia2019 Pharos as a Pluggable Secondary Index Component
hbaseconasia2019 Pharos as a Pluggable Secondary Index Component
 
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibabahbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
 
hbaseconasia2019 OpenTSDB at Xiaomi
hbaseconasia2019 OpenTSDB at Xiaomihbaseconasia2019 OpenTSDB at Xiaomi
hbaseconasia2019 OpenTSDB at Xiaomi
 
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Sparkhbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
 
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBasehbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
 
hbaseconasia2019 Distributed Bitmap Index Solution
hbaseconasia2019 Distributed Bitmap Index Solutionhbaseconasia2019 Distributed Bitmap Index Solution
hbaseconasia2019 Distributed Bitmap Index Solution
 
hbaseconasia2019 HBase Bucket Cache on Persistent Memory
hbaseconasia2019 HBase Bucket Cache on Persistent Memoryhbaseconasia2019 HBase Bucket Cache on Persistent Memory
hbaseconasia2019 HBase Bucket Cache on Persistent Memory
 
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACLhbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
 
hbaseconasia2019 BDS: A data synchronization platform for HBase
hbaseconasia2019 BDS: A data synchronization platform for HBasehbaseconasia2019 BDS: A data synchronization platform for HBase
hbaseconasia2019 BDS: A data synchronization platform for HBase
 
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
 
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
 
HBaseConAsia2019 Keynote
HBaseConAsia2019 KeynoteHBaseConAsia2019 Keynote
HBaseConAsia2019 Keynote
 

Dernier

College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGSIVASHANKAR N
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 

Dernier (20)

College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 

HBaseConEast2016: HBase and Spark, State of the Art

  • 1. HBaseCon East 2016 HBase and Spark, state of the art
  • 2. • Java Message Service => JMS • Solutions Architect at Cloudera • A bit of everything… • Development • Team/Project manager • Architect • O'Reilly author of Architecting HBase Applications • International • Worked from Paris to Los Angeles • More than 100 flights per year • HBase (and others) contributor About Jean-Marc Spaggiari HBaseCon East 2016
  • 3. Overview • Where we came from • Examples of code • Improvements that are coming up • Spark Streaming Use case HBaseCon East 2016
  • 4. Source of Demand • Demand started in the field • Use Cases • APIs access Gets, Puts, Scans • MapReduce Mass Scans • MapReduce Bulk Load • MapReduce Smart gets and puts • Spark has all but killed MapReduce • Spark Streaming has grown in popularity • Populating Aggregates • Entity Centric-Time Series data store i.e. OpenTSDB • Look ups for joins or mutations HBaseCon East 2016
  • 5. How it Started • Started on GitHub • Andrew Purtell started the effort to put into HBase • Big call out to Sean B, Jon H, Ted Y, Ted M and Matteo B • Components • Normal Spark • Spark Streaming • Bulk Load • SparkSQL HBaseCon East 2016
  • 6. Under the covers HBaseCon East 2016 Driver Walker Node Configs Executor Static Space Configs HConnection Tasks Tasks Walker Node Executor Static Space Configs HConnection Tasks Tasks
  • 7. Key Addition: HBaseContext Create an HBaseContext : // An Hadoop/HBase Configuration object val conf = HBaseConfiguration.create() conf.addResource(new Path("/etc/hbase/conf/core-site.xml")) conf.addResource(new Path("/etc/hbase/conf/hbase-site.xml")) // sc is the Spark Context; hbase context corresponds to an HBase Connection val hbaseContext = new HBaseContext(sc, conf) HBaseCon East 2016
  • 8. • Foreach • Map • BulkLoad • BulkLoadThinRows • BulkGet (aka Multiget) • BulkDelete • Most of them in both Java and Scala Operations on the HBaseContext HBaseCon East2016
  • 9. Foreach Read data in parallel for each partition and compute : rdd.hbaseForeachPartition(hbaseContext, (it, conn) => { // do something val bufferedMutator = conn.getBufferedMutator(TableName.valueOf("t1")) it.foreach(r => { ... // HBase API put/incr/append/cas calls } bufferedMutator.flush() bufferedMutator.close() }) HBaseCon East 2016
  • 10. Foreach Read data in parallel for each partition and compute : hbaseContext.foreachPartition(keyValuesPuts, new VoidFunction<Tuple2<Iterator<Put>, Connection>>() { @Override public void call(Tuple2<Iterator<Put>, Connection> t) throws Exception { BufferedMutator mutator = t._2().getBufferedMutator(TABLE_NAME); while (t._1().hasNext()) { ... // HBase API put/incr/append/cas calls } mutator.flush(); mutator.close(); } }); }); HBaseCon East 2016
  • 11. Map Take a dataset and map it in parallel for each partition to produce a new RDD or process it val getRdd = rdd.hbaseMapPartitions(hbaseContext, (it, conn) => { val table = conn.getTable(TableName.valueOf("t1")) var res = mutable.MutableList[String]() it.map( r => { ... // HBase API Scan Results } }) HBaseCon East 2016
  • 12. BulkLoad Bulk load a data set into Hbase (for all cases, generally wide tables) (Scala only) rdd.hbaseBulkLoad(hbaseContext, tableName, t => { val rowKey = t._1 val fam:Array[Byte] = t._2._1 val qual = t._2._2 val value = t._2._3 val keyFamilyQualifier= new KeyFamilyQualifier(rowKey, fam, qual) Seq((keyFamilyQualifier, value)).iterator }, stagingFolder) val load = new LoadIncrementalHFiles(config) load.run(Array(stagingFolder, tableNameString)) HBaseCon East 2016
  • 13. BulkLoad Bulk load a data set into Hbase (for all cases, generally wide tables) (Scala only) rdd.hbaseBulkLoad(hbaseContext, tableName, t => { val rowKey = t._1 val fam:Array[Byte] = t._2._1 val qual = t._2._2 val value = t._2._3 val keyFamilyQualifier= new KeyFamilyQualifier(rowKey, fam, qual) Seq((keyFamilyQualifier, value)).iterator }, stagingFolder) val load = new LoadIncrementalHFiles(config) load.run(Array(stagingFolder, tableNameString)) HBaseCon East 2016
  • 14. Bulk load a data set into Hbase (for all cases, generally wide tables) (Scala only) rdd.hbaseBulkLoad(hbaseContext, tableName, t => { val rowKey = t._1 val fam:Array[Byte] = t._2._1 val qual = t._2._2 val value = t._2._3 val keyFamilyQualifier= new KeyFamilyQualifier(rowKey, fam, qual) Seq((keyFamilyQualifier, value)).iterator }, stagingFolder) val load = new LoadIncrementalHFiles(config) load.run(Array(stagingFolder, tableNameString)) BulkLoad HBaseCon East 2016
  • 15. BulkLoadThinRows Bulk load a data set into HBase (for skinny tables, <10k cols) hbaseContext.bulkLoadThinRows[(String, Iterable[(Array[Byte], Array[Byte], Array[Byte])])] (rdd, TableName.valueOf(tableName), t => { val rowKey = Bytes.toBytes(t._1) val familyQualifiersValues = new FamiliesQualifiersValues t._2.foreach(f => { val family:Array[Byte] = f._1 val qualifier = f._2 val value:Array[Byte] = f._3 familyQualifiersValues +=(family, qualifier, value) }) (new ByteArrayWrapper(rowKey), familyQualifiersValues) }, stagingFolder.getPath) HBaseCon East 2016
  • 16. BulkLoadThinRows Bulk load a data set into HBase (for skinny tables, <10k cols) hbaseContext.bulkLoadThinRows[(String, Iterable[(Array[Byte], Array[Byte], Array[Byte])])] (rdd, TableName.valueOf(tableName), t => { val rowKey = Bytes.toBytes(t._1) val familyQualifiersValues = new FamiliesQualifiersValues t._2.foreach(f => { val family:Array[Byte] = f._1 val qualifier = f._2 val value:Array[Byte] = f._3 familyQualifiersValues +=(family, qualifier, value) }) (new ByteArrayWrapper(rowKey), familyQualifiersValues) }, stagingFolder.getPath) HBaseCon East 2016
  • 17. BulkPut Parallelized HBase Multiput : hbaseContext.bulkPut[(Array[Byte], Array[(Array[Byte], Array[Byte], Array[Byte])])](rdd, tableName, (putRecord) => { val put = new Put(putRecord._1) putRecord._2.foreach((putValue) => put.add(putValue._1, putValue._2, putValue._3)) put } HBaseCon East 2016
  • 18. BulkPut Parallelized HBase Multiput : hbaseContext.bulkPut(textFile, TABLE_NAME, new Function<String, Put>() { @Override public Put call(String v1) throws Exception { String[] tokens = v1.split("|"); Put put = new Put(Bytes.toBytes(tokens[0])); put.addColumn(Bytes.toBytes("segment"), Bytes.toBytes(tokens[1]), Bytes.toBytes(tokens[2])); return put; } }); HBaseCon East 2016
  • 19. BulkDelete Parallelized HBase Multi-deletes hbaseContext.bulkDelete[Array[Byte]](rdd, tableName, putRecord => new Delete(putRecord), 4) // batch size rdd.hbaseBulkDelete(hbaseContext, tableName, putRecord => new Delete(putRecord), 4) // batch size HBaseCon East 2016
  • 20. Table RDD How to materialize a table as a Spark RDD. HBaseCon East 2016
  • 21. Table RDD How to materialize a table as a Spark RDD. HBaseCon East 2016
  • 22. Table RDD How to materialize a table as a Spark RDD. HBaseCon East 2016
  • 23. Table RDD How to materialize a table as a Spark RDD. HBaseCon East 2016
  • 24. What Improvements Have We Made? ▪ Combine Spark and HBase • Spark Catalyst Engine for Query Plan and Optimization • HBase for Fast Access KV Store • Implement Standard External Data Source with Built-in Filter ▪ High Performance • Data Locality: Move Computation to Data • Partition Pruning: Task only Performed in RS Holding Requested Data • Column Pruning / Predicate Pushdown: Reduce Network Overhead ▪ Full Fledged DataFrame Support • Spark-SQL • Integrated Language Query ▪ Run on Top of Existing HBase Table • Native Support Java Primitive Types ▪ Still some work and improvements to be done • HBASE-16638 Reduce the number of Connections • HBASE-14217 Add Java access to Spark bulk load functionality HBaseCon East 2016
  • 25. Data frame + HBase WIP... 2.0? HBaseCon East 2016
  • 26. Usage - Define the Catalog def catalog = s"""{ |"table":{"namespace":"default", "name":"table1"}, |"rowkey":"key", |"columns":{ |"col0":{"cf":"rowkey", "col":"key", "type":"string"}, |"col1":{"cf":"cf1", "col":"col1", "type":"boolean"}, |"col2":{"cf":"cf1", "col":"col2", "type":"string"} |} |}""".stripMargin HBase table to dataframe table catalog mapping: HBaseCon East 2016
  • 27. Usage – Write to HBase sc.parallelize(data) .toDF .write .options(Map(HBaseTableCatalog.tableCatalog -> catalog, HBaseTableCatalog.newTable -> "5")) .format("org.apache.spark.sql.execution.datasources.hbase") .save() Export RDD into a new HBase table with DataFrame HBaseCon East 2016
  • 28. Usage– Construct DataFrame def withCatalog(cat: String): DataFrame = { sqlContext .read .options(Map(HBaseTableCatalog.tableCatalog->catalog)) .format("org.apache.spark.sql.execution.datasources.hbase") .load() } Import RDD into a new HBase table with DataFrame HBaseCon East 2016
  • 29. Usage - Language Integrate Query/SQL val df = withCatalog(catalog) val s = df.filter((($"col0" <= "row050" && $"col0" > "row040") || $"col0" === "row005" || $"col0" === "row020" || $"col0" === "r20" || $"col0" <= "row005") && ($"col2" === 1 || $"col2" === 42)) .select("col0", "col1", "col2") s.show Import RDD into a new HBase table with DataFrame HBaseCon East 2016
  • 30. Usage - Language Integrate Query/SQL // Load the dataframe val df = withCatalog(catalog) //SQL example df.registerTempTable("table") sqlContext.sql("select count(col1) from table").show Import RDD into a new HBase table with DataFrame HBaseCon East 2016