Contenu connexe Similaire à HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase (20) HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase1. HP © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.1
Trafodion
Integrating Operational SQL into Hadoop
HBaseCon 2015, San Francisco
May 7th
2. HP © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.2
The most mature SQL open source RDBMS on
Hadoop
Operational Heritage
• Sub-second response
times
• High concurrency
• Full ACID distributed
transaction management
• Mission critical availability
• Unparalleled scale before
NoSQL
• ANSI SQL support
• UDFs
BI Heritage
• Parallel everything
• Sophisticated optimizer
• Enterprise level
manageability
• Multi-temperate data
• Materialized Views &
query rewrite
• OLAP & extensive
function support
Open sourced on HBase
• Transaction mgmt for Traf and
HBase tables
• Data type and check
enforcement
• Schema flexibility
• Optional row formats
• Integration of struct, semi-struct,
& unstruct data
• Operational, historical, analytical
deployments on single platform
20+ years in Tandem / NonStop OLTP + Neoview EDW capabilities on MPP
architecture
Operational Heritage
• Sub-second response
times
• High concurrency
• Full ACID distributed
transaction management
• Mission critical availability
• Unparalleled scale before
NoSQL
• ANSI SQL support
• UDFs
BI Heritage
• Parallel everything
• Sophisticated optimizer
• Enterprise level
manageability
• Multi-temperate data
• Materialized Views &
query rewrite
• OLAP & extensive
function support
Open sourced on HBase
• Transaction mgmt for Traf and
HBase tables
• Data type and check
enforcement
• Schema flexibility
• Optional row formats
• Integration of struct, semi-struct,
& unstruct data
• Operational, historical, analytical
deployments on single platform
3. HP © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.3
Client
JDBC ODBC
User and ISV Operational
Applications
Driver
Hive
Native Hive Tables
Multi-Structured
Data Store
Integration
HBase
Native
HBase
Tables KVS,
Columnar
SQL
ESP
CMP Master
ESPDTM
WMS
Compiler and Optimizer
Workload Management
SQL Parallelism
Distributed
Transaction
Management
. . . .
Database Connectivity
UDF
External Communication
HBase
HDFS
Relationa
l Schema
Trafodio
n Tables
Storage
Engines
Layered Architecture
4. HP © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.4
Trafodion
Metadata
Trafodion
Data
Hive
Data
HDFS
Data
Trafodion Node
(DCS,EXE, ESP, CMP, DTM, UDF,
WMS)
Hadoop Data Node
HBase APIs
HBase Region
Server
Hive/HDFS
APIs
Trafodion
Metadata
Trafodion
Data
Hive
Data
HDFS
Data
Trafodion Node
(DCS,EXE, ESP, CMP, DTM, UDF,
WMS)
Hadoop Data Node
HBase APIs
HBase Region
Server
Hive/HDFS
APIs
TCP/IP
TCP/IP
…
TCP/IP
Process architecture
5. HP © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.5
Optimized execution plans based on statistics
Rule-driven and cost-based optimizer
Based on Cascades & Large Scope Rules
Parallel and non-parallel plans
Equal-height histogram stats
Join and aggregation variants
Subquery un-nesting
Optimized inner, left, right, outer joins
6. HP © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.6
Efficient data flow SQL execution
Scan Scan
Join
Group By
• Nested, nested cache, merge, hybrid hash
joins
• Eager & full aggregations incl. hash GROUP
BYs
• Unions, sorts
• I/O operations (scan, update, delete, insert)
In-memory, data flow architecture
• Continuous data flow through in-memory queues
• overflow to disk for hash and sort operations
Reduced data movement
Scheduler driven
Multi-threaded executor
Adaptive Segmentation
Skew Buster
7. HP © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.7
DOP features
• Varying degrees of parallelism
• Salting of rows for even data
distribution
Expression evaluation
• Evaluated close to data
• Fastpaths, prefetch, pcode, LLVM
Scalability
• Parallel execution
• Scales out with Hadoop
Degree of parallelism optimization
Operator
parallelism
Partitioned
parallelism
Pipeline
parallelism
Master
Join
Scan
Group by
Scan
4
0
3
0
2
0
• Support for co-located joins
• repartitioning when necessary
• inner child and outer child
broadcasts
8. HP © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.8
Varying operational workloads
Node 1 Node 2 Node n
Client Application
HDFS
HBase HBase HBaseFILTERS
HDFS HDFS HDFS HDFS
Ethernet
COPROCESSORS
Master
ESP ESP ESP ESP ESP
ESP ESP ESP ESP ESP
Master
Multi-
fragmen
t
Access optimizations
• Random (keyed), Multi-dimensional (MDAM)
Secondary index access
Row format optimizations
• HBase(col per cell), aligned(row per cell)
Reusable ESPs for parallelism
Cached SQL plans
Pushdown (filters + coprocessors)
Service persistence (via Zookeeper)
Automatic query resubmission
9. HP © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.9
YCSB operation speeds that approach HBase (within 20%)
Trafodion performance objective
Meets current
objective!
With max variance at
10.8%
0 128 256 384 512 640 768 896 1,024
Throughput(OPS)
Concurrency (Streams)
YCSB Singleton5050 (Workload A)
Traf 1.1 HBase
10. HP © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.10
YCSB and Order Entry scale linearly!
Trafodion performance objective
Meets
objective!
Transactional
Order Entry
Throughput
YCSB
Selects Updates
50/50
Throughput
Throughput
Throughput
11. HP © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.11
Trafodion Distributed Transaction Management …
1. Multiple row inserts, updates, and deletes to a table
Trafodion
3
Region A
Region B
Region C
Region D
2
Table A
Table B
Table C
1
...
Table A
4
2. Multiple table and SQL insert, update, and delete statements
3. Distributed multiple HBase region ins, upd, del transaction (2-phase commit)
4. Read-only transaction (eliminates commit overhead)
12. HP © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.12
Scalable Architecture, implemented using HBase coprocessors
Transaction Distributed Process Management …
Node n
SQL Process
Transaction
Manager
Library
Resource
Manager
Library
SQL Process
Transaction
Manager
Library
Resource
Manager
Library
SQL Process
Transaction
Manager
Library
Resource
Manager
Library
Transaction
Manager
HBase trx Region Server
HBase Region Server
TLOG
HBase RegionHBase RegionTrx Region Endpoint
coproc
Node 2
SQL Process
Transaction
Manager
Library
Resource
Manager
Library
SQL Process
Transaction
Manager
Library
Resource
Manager
Library
SQL Process
Transaction
Manager
Library
Resource
Manager
Library
Transaction
Manager
HBase trx Region Server
HBase Region Server
TLOG
HBase RegionHBase RegionTrx Region Endpoint
coproc
...
Node 1
SQL Process
Transaction
Manager
Library
Resource
Manager
Library
SQL Process
Transaction
Manager
Library
Resource
Manager
Library
SQL Process
Transaction
Manager
Library
Resource
Manager
Library
Transaction
Manager
HBase trx Region Server
HBase Region Server
TLOG
HBase RegionHBase RegionTrx Region Endpoint
coproc
13. HP © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.13
Minimum distributed transaction management overhead (within 20%)
Trafodion transaction performance objective
Order Entry: multi-statement transactional
workload
• 5 transaction types (New Orders, Payments,
Order Status, Deliver, and Stock Level checks
• On average has about 20 statements per
transaction
0 128 256 384 512 640 768 896 1,024
Throughput(TPM)
Concurrency (Streams)
OrderEntry
Traf 1.1 Autcommit
Meets current
objective!
With max variance at
11.3%
14. HP © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.14
Log
Files
Trafodion manageability overview
• Performant capture and publishing of query
statistics
– Threshold driven
– Aggregation
• Events logged using log4cpp/log4j
• Client access via ODBC/JDBC, REST API, or
HPdsm Trafodion Instance
Database
Administrator
ODBC/JDB
C
REST API
Publications
from Trafodion
Subsystems
Query
Statistics
Events
Repositor
y
Session
Query
AGGR
Query
Log4cpp/log4j
15. HP © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.15
High availability and data integrity: Features &
Testing
Hadoop, HDFS, HBase
• Name Node Redundancy
• HBase Replication (asynchronous)
• HDFS Replication (data block copies)
• HBase Snapshot
• Zookeeper
Trafodion
• Persistent connectivity services
• Automatic Query Retry
• Efficient fully distributed transaction recovery
• Backup and Restore utilities
• Extensive HBase / Trafodion HA testing
+
16. HP © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.16
Query: List of all products, some
product info, current specials, a
summary of their ratings and
reviews
Nested Join for
keyed lookup
into Trafodion
Parallel scan larger
Trafodion tables
Cache of
previous
lookups into
Trafodion
Demo Screenshot: Operational Reporting Queries
17. HP © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.17
Load data from Trafodion tables
to Hive table with insert-select
statement
Source data is detailed order
information obtained by joining
multiple Trafodion tables
Parallel
Join
Trafodion
tables acting
as source
Parallel
insert into
Hive
Hive
table is
the target
Demo Screenshot: Interoperability (Trafodion & Hive/HDFS)
18. HP © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.18
Demo Screenshot: UDFs: User Defined
Functions
19. HP © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.19
Demo Screenshot: Query Monitoring