SlideShare a Scribd company logo
1 of 25
Apache Kylin on HBase
Shaofeng Shi | 史少锋
Apache Kylin Committer & PMC
August 17,2018
Extreme OLAP Engine for Big Data
Content 01
02
04
03
What is Apache Kylin
Apache Kylin introduction, architecture and relationship with others
Why Apache HBase
Key factors that Kylin selects HBase as the storage engine
How to use HBase in
OLAP
How Kylin works on HBase
Use Cases
Apache Kylin typical use cases
What is Apache Kylin01
What is Apache Kylin
OLAP engine for Big Data
Apache Kylin is an extreme fast OLAP engine for
big data.
What is Apache Kylin
Key characters
Ease of Use
No programing; User-friendly Web GUI;
Seamless BI Integration
JDBC/ODBC/REST API; SupportsTableau, MSTR, Qlik
Sense, Power BI, Excel and others
Real Interactive
Trillion rows data, 99% queries < 1.3 seconds,
from Meituan.com
ANSI-SQL
SQL on Hadoop, supports most ANSI SQL
query functions
Hadoop Native
Compute and store data with
MapReduce/Spark/HBase, fully scalable architecture;
MOLAP Cube
User can define a data model and pre-build in Kylin
with more than 10+ billions of raw data records
Apache Kylin Architecture
Native on Hadoop, Horizontal Scalable
BI Tools, Web App…
ANSI SQL
OLAP Cube
Why Kylin is fast
Pre-calculation + Random access
Convert SQL query to Cube visiting
Sort
Cuboid
Filter
Sort
Aggr
Filter
Tables
Join
O(1)
Aggregated Data
Offline build
Online
calculation
Join and aggregate data to Cube in offline
No join at query time Filter with index and do online
calculation within memory
Why Apache HBase02
Criteria of the Storage engine for Kylin
Hadoop Native
Wide Adoption
Low Latency
Easy to use API
High Capacity
Active User Community
Only HBase Can
Hadoop Native
Wide Adoption
Most Hadoop users are
running HBase;
Low Latency
Easy to use API
HBase is built on Hadoop
technologies; It Integrates well
with HDFS, MapReduce and
other components.
Block cache and Bloom Filters
for real-time queries.
High Capacity
Supports very large data
volume, TB to PB data in one
table.
Active User Community
HBase has many active
users, which provides many
good articles and best
practices.
HBase in Kylin
HBase acts four roles in Kylin
Massive Storage for Cube
Kylin persists OLAP Cube in
HBase, for low latency access.
Meta Store
Kylin uses HBase to persist its
metadata.
MPP for online calculation
Kylin pushes down calculations
to HBase region servers for
parallel computing.
HBase Master
HBase
Region Server
Metadata
Scan &
pushdown
…
HBase
Region Server
Coproces
sor
Coproces
sor
Cache
Kylin caches big lookup table in
HBase.
How to use HBase in OLAP03
How Cube be Built
Cube building process
Kylin uses MR or Spark to aggregate source data into Cube;
Calculate N-Dimension cuboid first, and then calculate N-1.
The typical algorithm is by-layer cubing;
0-D cuboid
1-D cuboid
2-D cuboid
3-D cuboid
4-D cuboid
Full Data
MR
MR
MR
MR
MR
How Cube be Built
The whole Cube building process
Cubing
Algorithm
Hive
Intermedia
te Hive
table
Dimension
Dictionary
HDFS
Files
HFile
HDFS
Files
HDFS
Files
HDFS
Files
Hive
Query
Build
Dimension
Dictionary
Build
Cube
Job
MR
Job
MR
Job
MR
Job
MR
Job
HBase
Table
…
Fast
Cubing
Bulk
Load
0 D(Base) CuboidN-1 D(Base) CuboidN D(Base) Cuboid
Layer
Cubing
Cube in HBase
Cube Structure
Cube is partitioned into multiple segments by time range
Each segment is a HBase Table
Table is pre-split into multiple regions
Cube is converted into HFile in batch job, and then bulk load to HBase
Seg 1 Seg 2 Seg N…Cube
Rowkey Value
Table 1
Rowkey Value
Table N
HBase Table Format
Rowkey + Value
Shard ID Cuboid ID
(long)
Dim 1 Dim 2 … Dim N Measure 1 … Measure N
Rowkey Value
Dimension values are encoded to bytes (via dictionary or others)
Measures are serialized into bytes
HBase Rowkey format: Shard ID (2 bytes) + Cuboid ID (8 bytes) + Dimensions
HBase Value: measures serialized bytes
Table is split into regions by Shard ID
User can group measures into 1 or multiple column families
2 bytes 8 bytes
How Cube be Persisted in HBase
Example: a Tiny Cube
2 dimensions and 1 measure; total 4 cuboids: 00,01,10,11
Please note the Shard ID is not appeared here
Source Table Dictionary Encoding
Aggregated Table (Cube) HBase KV Format
How Cube be Queried
Kylin parses SQL to get the dimension and measures;
Identify the Model and Cube;
Identify the Cuboid to scan;
Identify the Cube segments;
Leverage filter condition to narrow down scan range;
Send aggregation logic to HBase coprocessor, do storage-side filtering and
aggregation;
On each HBase RS returned, de-code and do final processing in Calcite
select city , sum(price)
from table where year=
1993 group by year
Dimension: city, year
Cuboid ID: 00000011
Filter: year=1993 (encoded value: 0)
Scan start: 00000011|0
Scan end: 00000011|1
Good and not-good of HBase for OLAP
 Native on Hadoop.
 Great performance
(when search pattern matches
row key design)
 High concurrency
HBase is a little complicated for OLAP scenario; HBase supports both massive write
+ read; while OLAP is read-only.
 Not a columnar storage
 No secondary index
 Downtime for upgrade
(update coprocessor need to
disable table first)
Use Cases04
1000+ Global Users
Telecom
• China
Mobile
• China
Telecom
• China
Unicom
• AT & T
• …
Internet
• eBay
• Yahoo! Japan
• Baidu 百度
• Meituan 美团
• NetEase 网易
• Expedia
• JD 京东
• VIP 唯品会
• 360
• TOUTIAO 头条
• …
Others
• MachineZone
• Glispa
• Inovex
• Adobe
• iFLY TEK科大
讯飞
• …
Manufacturing
• SAIC 上汽集
团
• Huawei 华为
• Lenovo 联想
• OPPO
• XiaoMi 小米
• VIVO
• MeiZu 魅族
• …
FSI
• CCB 建设银
行
• CMB 招商银
行
• SPDB 浦发银
行
• CPIC 太平洋
保险
• CITIC BANK
中信银行
• UnionPay 中
国银联
• HuaTai 华泰
证券
• GuoTaiJunAn
国泰君安证券
• …
Use Case – OLAP on Hadoop
Meituan: Top O2O company in China
Challenge
• Slow performance with previous MySQL
option Heavy development efforts with Hive
solution
• Huge resources for Hive job
• Analysts can’t access directly for data on
Hadoop
Solution
• Apache Kylin as core OLAP on
Hadoop solution
• SQL interface for internal users
• Active participate in open source
Kylin community
Supporting all Meituan business lines
973 Cubes, 8.9+ trillion rows, Cube size 971 TB
3,800,000 queries/day, TP90 < 1.2 second
(Data in 2018/08)
Use Case – Shopping Reporting
Yahoo! Japan: the most visited website in Japan
Our reporting system used
Impala as a backend database
previously. It took a long time
(about 60 sec) to show Web UI.
In order to lower the latency, we
moved to Apache Kylin.
Average latency < 1sec for most
cases
Thanks to low latency with Kylin,
we become possible to focus on
adding functions for users.
We provide a reporting system that
show statistics for store owners.
e. g. impressions, clicks and sales.
We Are Hiring
WeChat: Kyligence WeChat: Apache Kylin
Thanks

More Related Content

What's hot

Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache icebergAlluxio, Inc.
 
Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Kevin Weil
 
NoSQL Databases: Why, what and when
NoSQL Databases: Why, what and whenNoSQL Databases: Why, what and when
NoSQL Databases: Why, what and whenLorenzo Alberton
 
Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)Takrim Ul Islam Laskar
 
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
 Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng ShiDatabricks
 
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceHBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceCloudera, Inc.
 
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache IcebergData Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache IcebergAnant Corporation
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyRohit Dubey
 
MySQL InnoDB Cluster and NDB Cluster
MySQL InnoDB Cluster and NDB ClusterMySQL InnoDB Cluster and NDB Cluster
MySQL InnoDB Cluster and NDB ClusterMario Beck
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Simplilearn
 
Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance DataWorks Summit/Hadoop Summit
 
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...Edureka!
 
Building Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks DeltaBuilding Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks DeltaDatabricks
 

What's hot (20)

Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
Kudu Deep-Dive
Kudu Deep-DiveKudu Deep-Dive
Kudu Deep-Dive
 
Key-Value NoSQL Database
Key-Value NoSQL DatabaseKey-Value NoSQL Database
Key-Value NoSQL Database
 
Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
NoSQL Databases: Why, what and when
NoSQL Databases: Why, what and whenNoSQL Databases: Why, what and when
NoSQL Databases: Why, what and when
 
Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)
 
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
 Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
 
NoSql
NoSqlNoSql
NoSql
 
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceHBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
 
Presto: SQL-on-anything
Presto: SQL-on-anythingPresto: SQL-on-anything
Presto: SQL-on-anything
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
 
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache IcebergData Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
 
TiDB Introduction
TiDB IntroductionTiDB Introduction
TiDB Introduction
 
MySQL InnoDB Cluster and NDB Cluster
MySQL InnoDB Cluster and NDB ClusterMySQL InnoDB Cluster and NDB Cluster
MySQL InnoDB Cluster and NDB Cluster
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
 
Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance
 
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
 
Building Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks DeltaBuilding Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks Delta
 

Similar to Apache Kylin on HBase: An Extreme OLAP Engine for Big Data

HBaseConAsia2018 Track2-2: Apache Kylin on HBase: Extreme OLAP for big data
HBaseConAsia2018  Track2-2: Apache Kylin on HBase: Extreme OLAP for big dataHBaseConAsia2018  Track2-2: Apache Kylin on HBase: Extreme OLAP for big data
HBaseConAsia2018 Track2-2: Apache Kylin on HBase: Extreme OLAP for big dataMichael Stack
 
Apache Kylin’s Performance Boost from Apache HBase
Apache Kylin’s Performance Boost from Apache HBaseApache Kylin’s Performance Boost from Apache HBase
Apache Kylin’s Performance Boost from Apache HBaseHBaseCon
 
Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015Seshu Adunuthula
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveXu Jiang
 
Building Enterprise OLAP on Hadoop for FSI
Building Enterprise OLAP on Hadoop for FSIBuilding Enterprise OLAP on Hadoop for FSI
Building Enterprise OLAP on Hadoop for FSILuke Han
 
Kylin and Druid Presentation
Kylin and Druid PresentationKylin and Druid Presentation
Kylin and Druid Presentationargonauts007
 
PCM18 (Big Data Analytics)
PCM18 (Big Data Analytics)PCM18 (Big Data Analytics)
PCM18 (Big Data Analytics)Stratebi
 
Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonDremio Corporation
 
Apache Kylin: Hadoop OLAP Engine, 2014 Dec
Apache Kylin: Hadoop OLAP Engine, 2014 DecApache Kylin: Hadoop OLAP Engine, 2014 Dec
Apache Kylin: Hadoop OLAP Engine, 2014 DecYang Li
 
Cloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeCloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeDatabricks
 
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP  Engine for HadoopHBaseCon 2015: Apache Kylin - Extreme OLAP  Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for HadoopHBaseCon
 
ApacheKylin_HBaseCon2015
ApacheKylin_HBaseCon2015ApacheKylin_HBaseCon2015
ApacheKylin_HBaseCon2015Luke Han
 
Unlocking big data with Hadoop + MySQL
Unlocking big data with Hadoop + MySQLUnlocking big data with Hadoop + MySQL
Unlocking big data with Hadoop + MySQLRicky Setyawan
 
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with HadoopBig Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with HadoopCaserta
 
There and back_again_oracle_and_big_data_16x9
There and back_again_oracle_and_big_data_16x9There and back_again_oracle_and_big_data_16x9
There and back_again_oracle_and_big_data_16x9Gleb Otochkin
 
[db tech showcase Tokyo 2017] C13:There and back again or how to connect Orac...
[db tech showcase Tokyo 2017] C13:There and back again or how to connect Orac...[db tech showcase Tokyo 2017] C13:There and back again or how to connect Orac...
[db tech showcase Tokyo 2017] C13:There and back again or how to connect Orac...Insight Technology, Inc.
 
Advanced Analytics and Big Data (August 2014)
Advanced Analytics and Big Data (August 2014)Advanced Analytics and Big Data (August 2014)
Advanced Analytics and Big Data (August 2014)Thomas W. Dinsmore
 
Advanced Analytics in Hadoop
Advanced Analytics in HadoopAdvanced Analytics in Hadoop
Advanced Analytics in HadoopAnalyticsWeek
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsAndrew Brust
 

Similar to Apache Kylin on HBase: An Extreme OLAP Engine for Big Data (20)

HBaseConAsia2018 Track2-2: Apache Kylin on HBase: Extreme OLAP for big data
HBaseConAsia2018  Track2-2: Apache Kylin on HBase: Extreme OLAP for big dataHBaseConAsia2018  Track2-2: Apache Kylin on HBase: Extreme OLAP for big data
HBaseConAsia2018 Track2-2: Apache Kylin on HBase: Extreme OLAP for big data
 
Apache Kylin’s Performance Boost from Apache HBase
Apache Kylin’s Performance Boost from Apache HBaseApache Kylin’s Performance Boost from Apache HBase
Apache Kylin’s Performance Boost from Apache HBase
 
Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
 
Building Enterprise OLAP on Hadoop for FSI
Building Enterprise OLAP on Hadoop for FSIBuilding Enterprise OLAP on Hadoop for FSI
Building Enterprise OLAP on Hadoop for FSI
 
Kylin and Druid Presentation
Kylin and Druid PresentationKylin and Druid Presentation
Kylin and Druid Presentation
 
PCM18 (Big Data Analytics)
PCM18 (Big Data Analytics)PCM18 (Big Data Analytics)
PCM18 (Big Data Analytics)
 
Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in London
 
Apache Kylin: Hadoop OLAP Engine, 2014 Dec
Apache Kylin: Hadoop OLAP Engine, 2014 DecApache Kylin: Hadoop OLAP Engine, 2014 Dec
Apache Kylin: Hadoop OLAP Engine, 2014 Dec
 
Cloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeCloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data Lake
 
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP  Engine for HadoopHBaseCon 2015: Apache Kylin - Extreme OLAP  Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
 
ApacheKylin_HBaseCon2015
ApacheKylin_HBaseCon2015ApacheKylin_HBaseCon2015
ApacheKylin_HBaseCon2015
 
Unlocking big data with Hadoop + MySQL
Unlocking big data with Hadoop + MySQLUnlocking big data with Hadoop + MySQL
Unlocking big data with Hadoop + MySQL
 
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with HadoopBig Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
 
There and back_again_oracle_and_big_data_16x9
There and back_again_oracle_and_big_data_16x9There and back_again_oracle_and_big_data_16x9
There and back_again_oracle_and_big_data_16x9
 
[db tech showcase Tokyo 2017] C13:There and back again or how to connect Orac...
[db tech showcase Tokyo 2017] C13:There and back again or how to connect Orac...[db tech showcase Tokyo 2017] C13:There and back again or how to connect Orac...
[db tech showcase Tokyo 2017] C13:There and back again or how to connect Orac...
 
Advanced Analytics and Big Data (August 2014)
Advanced Analytics and Big Data (August 2014)Advanced Analytics and Big Data (August 2014)
Advanced Analytics and Big Data (August 2014)
 
Advanced Analytics in Hadoop
Advanced Analytics in HadoopAdvanced Analytics in Hadoop
Advanced Analytics in Hadoop
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI Pros
 
Hive_Pig.pptx
Hive_Pig.pptxHive_Pig.pptx
Hive_Pig.pptx
 

Recently uploaded

Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....ShaimaaMohamedGalal
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 

Recently uploaded (20)

Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 

Apache Kylin on HBase: An Extreme OLAP Engine for Big Data

  • 1. Apache Kylin on HBase Shaofeng Shi | 史少锋 Apache Kylin Committer & PMC August 17,2018 Extreme OLAP Engine for Big Data
  • 2. Content 01 02 04 03 What is Apache Kylin Apache Kylin introduction, architecture and relationship with others Why Apache HBase Key factors that Kylin selects HBase as the storage engine How to use HBase in OLAP How Kylin works on HBase Use Cases Apache Kylin typical use cases
  • 3. What is Apache Kylin01
  • 4. What is Apache Kylin OLAP engine for Big Data Apache Kylin is an extreme fast OLAP engine for big data.
  • 5. What is Apache Kylin Key characters Ease of Use No programing; User-friendly Web GUI; Seamless BI Integration JDBC/ODBC/REST API; SupportsTableau, MSTR, Qlik Sense, Power BI, Excel and others Real Interactive Trillion rows data, 99% queries < 1.3 seconds, from Meituan.com ANSI-SQL SQL on Hadoop, supports most ANSI SQL query functions Hadoop Native Compute and store data with MapReduce/Spark/HBase, fully scalable architecture; MOLAP Cube User can define a data model and pre-build in Kylin with more than 10+ billions of raw data records
  • 6. Apache Kylin Architecture Native on Hadoop, Horizontal Scalable BI Tools, Web App… ANSI SQL OLAP Cube
  • 7. Why Kylin is fast Pre-calculation + Random access Convert SQL query to Cube visiting Sort Cuboid Filter Sort Aggr Filter Tables Join O(1) Aggregated Data Offline build Online calculation Join and aggregate data to Cube in offline No join at query time Filter with index and do online calculation within memory
  • 9. Criteria of the Storage engine for Kylin Hadoop Native Wide Adoption Low Latency Easy to use API High Capacity Active User Community
  • 10. Only HBase Can Hadoop Native Wide Adoption Most Hadoop users are running HBase; Low Latency Easy to use API HBase is built on Hadoop technologies; It Integrates well with HDFS, MapReduce and other components. Block cache and Bloom Filters for real-time queries. High Capacity Supports very large data volume, TB to PB data in one table. Active User Community HBase has many active users, which provides many good articles and best practices.
  • 11. HBase in Kylin HBase acts four roles in Kylin Massive Storage for Cube Kylin persists OLAP Cube in HBase, for low latency access. Meta Store Kylin uses HBase to persist its metadata. MPP for online calculation Kylin pushes down calculations to HBase region servers for parallel computing. HBase Master HBase Region Server Metadata Scan & pushdown … HBase Region Server Coproces sor Coproces sor Cache Kylin caches big lookup table in HBase.
  • 12. How to use HBase in OLAP03
  • 13. How Cube be Built Cube building process Kylin uses MR or Spark to aggregate source data into Cube; Calculate N-Dimension cuboid first, and then calculate N-1. The typical algorithm is by-layer cubing; 0-D cuboid 1-D cuboid 2-D cuboid 3-D cuboid 4-D cuboid Full Data MR MR MR MR MR
  • 14. How Cube be Built The whole Cube building process Cubing Algorithm Hive Intermedia te Hive table Dimension Dictionary HDFS Files HFile HDFS Files HDFS Files HDFS Files Hive Query Build Dimension Dictionary Build Cube Job MR Job MR Job MR Job MR Job HBase Table … Fast Cubing Bulk Load 0 D(Base) CuboidN-1 D(Base) CuboidN D(Base) Cuboid Layer Cubing
  • 15. Cube in HBase Cube Structure Cube is partitioned into multiple segments by time range Each segment is a HBase Table Table is pre-split into multiple regions Cube is converted into HFile in batch job, and then bulk load to HBase Seg 1 Seg 2 Seg N…Cube Rowkey Value Table 1 Rowkey Value Table N
  • 16. HBase Table Format Rowkey + Value Shard ID Cuboid ID (long) Dim 1 Dim 2 … Dim N Measure 1 … Measure N Rowkey Value Dimension values are encoded to bytes (via dictionary or others) Measures are serialized into bytes HBase Rowkey format: Shard ID (2 bytes) + Cuboid ID (8 bytes) + Dimensions HBase Value: measures serialized bytes Table is split into regions by Shard ID User can group measures into 1 or multiple column families 2 bytes 8 bytes
  • 17. How Cube be Persisted in HBase Example: a Tiny Cube 2 dimensions and 1 measure; total 4 cuboids: 00,01,10,11 Please note the Shard ID is not appeared here Source Table Dictionary Encoding Aggregated Table (Cube) HBase KV Format
  • 18. How Cube be Queried Kylin parses SQL to get the dimension and measures; Identify the Model and Cube; Identify the Cuboid to scan; Identify the Cube segments; Leverage filter condition to narrow down scan range; Send aggregation logic to HBase coprocessor, do storage-side filtering and aggregation; On each HBase RS returned, de-code and do final processing in Calcite select city , sum(price) from table where year= 1993 group by year Dimension: city, year Cuboid ID: 00000011 Filter: year=1993 (encoded value: 0) Scan start: 00000011|0 Scan end: 00000011|1
  • 19. Good and not-good of HBase for OLAP  Native on Hadoop.  Great performance (when search pattern matches row key design)  High concurrency HBase is a little complicated for OLAP scenario; HBase supports both massive write + read; while OLAP is read-only.  Not a columnar storage  No secondary index  Downtime for upgrade (update coprocessor need to disable table first)
  • 21. 1000+ Global Users Telecom • China Mobile • China Telecom • China Unicom • AT & T • … Internet • eBay • Yahoo! Japan • Baidu 百度 • Meituan 美团 • NetEase 网易 • Expedia • JD 京东 • VIP 唯品会 • 360 • TOUTIAO 头条 • … Others • MachineZone • Glispa • Inovex • Adobe • iFLY TEK科大 讯飞 • … Manufacturing • SAIC 上汽集 团 • Huawei 华为 • Lenovo 联想 • OPPO • XiaoMi 小米 • VIVO • MeiZu 魅族 • … FSI • CCB 建设银 行 • CMB 招商银 行 • SPDB 浦发银 行 • CPIC 太平洋 保险 • CITIC BANK 中信银行 • UnionPay 中 国银联 • HuaTai 华泰 证券 • GuoTaiJunAn 国泰君安证券 • …
  • 22. Use Case – OLAP on Hadoop Meituan: Top O2O company in China Challenge • Slow performance with previous MySQL option Heavy development efforts with Hive solution • Huge resources for Hive job • Analysts can’t access directly for data on Hadoop Solution • Apache Kylin as core OLAP on Hadoop solution • SQL interface for internal users • Active participate in open source Kylin community Supporting all Meituan business lines 973 Cubes, 8.9+ trillion rows, Cube size 971 TB 3,800,000 queries/day, TP90 < 1.2 second (Data in 2018/08)
  • 23. Use Case – Shopping Reporting Yahoo! Japan: the most visited website in Japan Our reporting system used Impala as a backend database previously. It took a long time (about 60 sec) to show Web UI. In order to lower the latency, we moved to Apache Kylin. Average latency < 1sec for most cases Thanks to low latency with Kylin, we become possible to focus on adding functions for users. We provide a reporting system that show statistics for store owners. e. g. impressions, clicks and sales.
  • 24. We Are Hiring WeChat: Kyligence WeChat: Apache Kylin

Editor's Notes

  1. Kylin connects your business application with big data platform via SQL.
  2. “+”is used to separate Cuboid ID and dimensions; In real case there is no separator.