JackHare- a framework for SQL to NoSQL translation using MapReduce

JackHare
a framework for SQL to NoSQL translation using MapReduce
Wu-Chun Chung·Hung-Pin Lin·
Shih-Chang Chen·Mon-Fong Jiang·
Yeh-Ching Chung
Received: 15 December 2012 / Accepted: 6 September 2013
© Springer Science+Business Media New York 2013

Presented by 康志強
2013.10.22
1

Outline
• Introduction
• Related work
• The JackHare framework architecture
• Unstructured data processing in HBase
• Experimental results
• Conclusions

2

Introduction
• BigData 的問題 (massive data)
– 資料的存取速度
– 資料合併的問題
平行處理時資料的即時性、正確性。

• Hadoop MapReduce
– to process the massive data in parallel.

• Hadoop distributed file system
– difficult to update data frequently

3

Introduction
• Hbase
– to place the data over a scale-out storage system
– to manipulate the changeable data in a transparent
way
– the Hbase interface is not friendly

• JackHare
– 遵守ANSI-SQL和JDBC-4.0規格的API，用來操作
Apache Hbase
– using MapReduce framework for processing the
unstructured data in HBase
4

Introduction
• 資料的存取速度
– 1990, 硬碟可存1,370M,傳輸速度4.4MB/s
– 現在,1 TB,傳輸速度 100MB/s
– 平行進行資料讀取及寫入,加快速度

• Hadoop Distributed File System
– difficult to update data frequently in such file
system

5

Introduction
• 資料合併的問題
– 正確性

• MapReduce
– 分散式程式框架
– Map就是將一個工作分到多個Node
– Reduce就是將各個Node的結果再重新結合成最後
的結果
– 資料本地化
– 運用高階的查詢語言 (Pig, Hive)
6

Introduction
• Hbase
– 架構在HDFS上的分散式資料庫
– 使用列 (row) 和行 (column) 為索引存取資料值
– 每一筆資料都有一個時間戳記 (timestamp)，因此
同一個欄位可依不同時間存在多筆資料。
(Version)
– HBase的資料表 (table) 是由許多row及數個column
family組成
– 可供MapReduce的程式當作資料來源或儲存媒介
8

Introduction
• NoSQL資料庫
• http://www.ithome.com.tw/itadm/article.php?c=6336
0&s=5

10

Introduction
• JackHare
– allowing users to use the ANSI-SQL queries to
manipulate large-scale data
– 遵守ANSI-SQL和JDBC-4.0規格的API，用來操作
Apache Hbase
– using MapReduce framework for processing the
unstructured data in Hbase

11

Related work
• Pig
– HDFS 與 MapReduce 叢集環境中執行
– Pig Latin - a simpler procedural language
– http://pig.apache.org/docs/r0.12.0/basic.html#nest
edblock

• Hive
– 提供類似SQL的查詢語言來查詢資料(HiveQL)
– 可管理HDFS的資料
– https://cwiki.apache.org/confluence/display/Hive/T
utorial
12

Related work
• YSmart
– An SQL-to-MapReduce Translator
– http://ysmart.cse.ohio-state.edu/

• S2MART
– Smart Sql to Map-Reduce Translators

13

Related work
• HadoopDB
– An Architectural Hybrid of MapReduce and DBMS
Technologies for Analytical
– HadoopDB provides SQL query via a translation
called SQL-MR-SQL (SMS), based on Hive.
– http://db.cs.yale.edu/hadoopdb/hadoopdb.html

• Clydesdale
– structured data processing on MapReduce
– focuses on processing the data fitting a star schema
14

Related work
• SQL查詢轉換為MapReduce
• Hbase
– 滿足頻繁的數據更新
– 維持NoSQL數據庫的可擴展性和可靠性

15

The JackHare framework architecture

16

• User submits an ANSI-SQL query by SQL client
application.
• The compiler scans and parses the ANSI-SQL
query.
• Lookup the related table name, column families
and column qualifier of HBase.
• Generate MapReduce code according to the
query commands and metadata.
17

• Access HBase and execute the MapReduce job.
• The results wrapped back from the back-end.
• The returned results are shown on SQL client
application according to RDB schema.

18

SQuirreL

19

Unstructured data processing in
HBase
• remap the data in relational database to HBase

20

HBase
• remap the data in relational database to HBase

21

HBase
• Analysis of SQL clauses
– SELECT, FROM and WHERE clauses
– Extended clauses
•
•
•
•
•

GROUP BY
HAVING
ORDER BY
JOIN
AGGREGATE FUNCTIONs

22

Experimental results
• Experimental environment
– two Intel Xeon L5640 CPU, 24 GB ram and
3 TB HD
– 16-node virtual machine cluster on four physical
machines
– Hadoop 0.20.203 (15 October, 2013: release 2.2.0 available)
– Hbase 0.92.0 (2013-09-20 | Version: 0.97.0-SNAPSHOT)
– Hive 0.9.0
– JAVA 1.6.0, maximum heap size is 512 MB
23

• Experimental environment
– Node : two cores at 2 GHz with 4 GB ram and 400
GB storage space
– MySQL : two cores at 2 GHz, 4 GB ram and
– 800 GB hard disk
– 3 Table : LOT, WAFER and DIE

24

• Results

25

JackHare- a framework for SQL to NoSQL translation using MapReduce

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à JackHare- a framework for SQL to NoSQL translation using MapReduce

Similaire à JackHare- a framework for SQL to NoSQL translation using MapReduce (20)

Plus de 康志強大人

Plus de 康志強大人 (9)

Dernier

Dernier (20)