Contenu connexe Similaire à Drill into Drill – How Providing Flexibility and Performance is Possible (20) Plus de MapR Technologies (20) Drill into Drill – How Providing Flexibility and Performance is Possible1. ®
© 2014 MapR Technologies 1
®
© 2014 MapR Technologies
Drilling Into Drill: Flexiperf
Jacques Nadeau, Architect and VP Apache Drill
February 19, 2015
2. ®
© 2014 MapR Technologies 2© 2014 MapR Technologies
®
“Drill isn’t just about SQL-on-Hadoop. It’s about SQL-on-
pretty-much-anything, immediately, and without formality.”
-Andrew Brust, GigaOM Research, Dec 2014
3. ®
© 2014 MapR Technologies 3
Agenda
• What?
– SQL like mom made
– Punk SQL
• How?
– Flexibility
– Performance
• Who & When
5. ®
© 2014 MapR Technologies 5
SoH Table Stakes: Warehousing and Business Intelligence
ANSI Syntax
• SELECT, FROM, WHERE,
JOIN, HAVING, ORDER BY,
WITH, CTAS, OVER*,
ROLLUP*, CUBE*, ALL,
EXISTS, ANY, IN, SOME
• VarChar, Int, BigInt, Decimal,
VarBinary, Timestamp, Float,
Double, etc.
• Subqueries, scalar subqueries*,
partition pruning, CTE
Interactive SQL Workloads
• Data warehouse offload
• Tableau, ODBC, JDBC
• TPC-H & TPC-DS-like
workloads
*alpha or imminent
Standard Hadoop Tools
• Supports Hive SerDes
• Supports Hive UDFs
• Supports Hive Metastore
7. ®
© 2014 MapR Technologies 7
Punk SQL: SQL for a Hadoop World
Modern Syntax
• Path based queries and
wildcards
– select * from /my/logs/
– select * from /revenue/*/q2
• Modern data types
– Any, Map, Array (JSON)
• Complex Functions and
Relational Operators
– FLATTEN, kvgen,
convert_from, convert_to,
repeated_count, etc
New Workloads
• JSON Sensor analytics
• Complex data analysis
• Alternative DSLs
New Ways to Work
• Query without prep
• Workspaces without admin
intervention
• Expose query as MapReduce
• Expose query as Spark RDD
9. ®
© 2014 MapR Technologies 9
Flexibility
your tool should
be flexible…
so you don’t have to be
10. ®
© 2014 MapR Technologies 10
Flexibility is a Vision, Usability, a Religion
• Deployment
• Data Model
• Schema
• Security
• Access Methods
11. ®
© 2014 MapR Technologies 11
Supporting the Changing Roles of Big Data
Data Dev Circa 2000
1. Developer comes up with
requirements
2. DBA defines tables
3. DBA defines indices
4. DBA defines FK relationships
5. Developer stores data
6. BI builds reports
7. Analyst views reports
8. DBA adds materialized views
Data Today
1. Developer builds app,
defines schema, stores
data
2. Analyst queries data
3. Data engineer fixes
performance problems or
fills functionality gaps
12. ®
© 2014 MapR Technologies 12
SoH YSoH X
Deployment: Traditional Hadoop Query Engines
DFS
MySQL
Metadata
Service
DFS DFS
MR MR MR
Gateway
Service
Gateway
Service
Gateway
Service
MySQL
(replica)
MySQL
Metadata
Service
DFS
EXEC
Catalog
Service
State
Service
MySQL
(replica)
DFS
EXEC
DFS
EXEC
Authorization
Service Authorization
Service
13. ®
© 2014 MapR Technologies 13
Challenges
• Many “special” nodes
• Each service may need special handling
– HA process
– Backup process
• Each system requires you to set up a database on top of which
you run your database
• No easy way to run on your laptop for experimentation
– Some solutions don’t even support non-Linux environments
14. ®
© 2014 MapR Technologies 14
Distributed on Hadoop
Distributed
Single or CLI/Embedded
Drill Deployment is Easy
• Single Daemon for all purposes
• No special considerations for
scaling or availability
• With or without DFS
• Works with other data systems
(Mongo, Cassanrda & JDBC
coming soon)
• Runs on Linux, Mac or Windows
• No separate database
• JSON everything
• Access via HTTP, Java, C, JDBC,
ODBC, CLI
Drillbit
Drillbit Drillbit Drillbit
Drillbit Drillbit Drillbit
DFS DFS DFS
15. ®
© 2014 MapR Technologies 15
Drill Provides A Flexible Data Model
HBase
JSON
BSON
CSV
TSV
Parquet
Avro
Schema-lessFixed schema
Flat
Complex
Flexibility
Name! Gender! Age!
Michael! M! 6!
Jennifer! F! 3!
{!
name: {!
first: Michael,!
last: Smith!
},!
hobbies: [ski, soccer],!
district: Los Altos!
}!
{!
name: {!
first: Jennifer,!
last: Gates!
},!
hobbies: [sing],!
preschool: CCLC!
}!
RDBMS/SQL-on-Hadoop table
Apache Drill table
Flexibility
16. ®
© 2014 MapR Technologies 16
Leave Your Data Where it is. Access it Centrally & Uniformly.
• Drill is storage agnostic
• Interacts to storage through
plugins
• Storage plugins expose optimizer
rules
– Optimizer rules work directly on
logical operation to expose
maximum capabilities
• Reference multiple Hive, HBase,
MongoDB, DFS, etc systems
17. ®
© 2014 MapR Technologies 17
Leverage that Massive Scalable Redundant Infrastructure
Single Store for Data and Metadata
• HDFS is already your single canonical store
• Don’t create a secondary metadata store
Avoid Metadata Management and synchronization
• Store metadata inline
– If you can’t, store it next to files
• Move directories around at will
• Delete things at will
18. ®
© 2014 MapR Technologies 18
Flexibility in how you describe your data
• Drill doesn’t require schema, detects file types based on
– extensions
– magic bytes (e.g. PAR1)
– systems settings
• Query can be planned on any file, anywhere
• Data types are determined as data arrives
• Some formats have known schema
– If they don’t, you can expose them as such through views
– Views are simply JSON files that define view SQL
19. ®
© 2014 MapR Technologies 19
SELECT timestamp, message
FROM dfs1.logs.`AppServerLogs/2014/Jan/`
WHERE errorLevel > 2
This is a cluster in Apache Drill
- DFS
- HBase
- Hive meta-store
A work-space
- Typically a
sub-
directory
A table
- pathnames
- HBase table
- Hive table
That means everything looks like a table…
20. ®
© 2014 MapR Technologies 20
Query a File or a Directory Tree
-- Dynamic queries on files
select errorLevel, count(*)
from dfs.logs.`/AppServerLogs/2014/Jan/
part0001.parquet` group by errorLevel;
-- Dynamic queries on entire directory tree
select errorLevel, count(*) as TotalErrors
from dfs.logs.`/AppServerLogs`
group by errorLevel;
21. ®
© 2014 MapR Technologies 21
Directories are Implicit Partitions
-- Direct
SELECT dir0, SUM(amount)
FROM sales
GROUP BY dir1 IN (q1, q3)
-- View
CREATE VIEW sales_q2 as
SELECT dir0 as year, amount
FROM sales
WHERE dir = ‘q2’
sales
├── 2014
│ ├── q1
│ ├── q2
│ ├── q3
│ └── q4
└── 2015
└── q1
22. ®
© 2014 MapR Technologies 22
Access Hard to Reach Data with Nominal Configuration
-- embedded JSON value inside column donutjson inside
column-family cf1 of an hbase table donuts
SELECT
d.name, COUNT(d.fillings)
FROM (
SELECT convert_from(cf1.donutjson, JSON) as d
FROM hbase.donuts);
23. ®
© 2014 MapR Technologies 23
On the fly binary data interpretation
CONVERT_FROM and CONVERT_TO allows access to common
Hadoop encodings:
• Boolean, byte, byte_be, tinyint, tinyint_be, smallint, smallint_be,
int, int_be, bigint, bigint_be, float, double, int_hadoopv,
bigint_hadoopv, date_epoch_be, date_epoch, time_epoch_be,
time_epoch, utf8, utf16, json
• E.g. CONVERT_FROM(mydata, ‘int_hadoopv’) => Internal INT
format
• E.g. CONVERT_FROM(mydata, ‘JSON’) => UTF8 JSON to
Internal complex object
24. ®
© 2014 MapR Technologies 24
Secure your data without an extra service
• Drill Views
• Ownership chaining with
configurable delegation TTL
• Leverages existing HDFS
ACLs
• Complete security solution
without additional services or
software
MaskedSales.view
owner: Cindy
GrossSales.view
owner: dba
RawSales.parquet
owner: dba
Frank file
view perm
Cindy file
view perm
dba
delegated
read
2-stepownershipchain
SQL by
Frank
26. ®
© 2014 MapR Technologies 26
Performance is about data management, not just wall clock
• How quickly can you incorporate new data
• What types of prep or ETL you need to do
• What you can do without having to read data & calculate
statistics
– Remember, many users never read all their data
• Time to first result
• Time to last result
• Managing Scarce Resources
• Concurrency
27. ®
© 2014 MapR Technologies 27
Core Drill Architectural Goals
• Go fast when you don’t know anything
– And do “the right thing”
• Go faster when you do know things
28. ®
© 2014 MapR Technologies 28
An Optimistic, Pipelined Purpose-Built DAG Engine
• Three-Level DAG
• Major Fragments (phases)
• Minor Fragments (threads)
• Operators (in-thread operations)
> explain plan for select * from customer limit 5;
…
00-00 Screen
00-01 SelectionVectorRemover
00-02 Limit(fetch=[5])
00-03 UnionExchange
01-01 ProducerConsumer
01-02 Scan(groupscan=[ParquetGroupScan […
29. ®
© 2014 MapR Technologies 29
Each phase (MajorFragment) gets Parallelized (MinorFragment)
1:0 1:1 1:2 1:3
0:0
30. ®
© 2014 MapR Technologies 30
Parallel Calcite: Extensible Cost-based Optimization
• Volcano-inspired cost based optimizer
– CPU, IO, memory, network (data locality)
• Pluggable rules per storage subsystem
• Exchanges provide parallelization:
– Hash, Ordered, Broadcast and Merging
• Join and aggregation planning
– Merge Join vs Hash Join
– Partition-based join vs Broadcast-based join
– Streaming Aggregation vs Hash Aggregation
• Relies heavily on the awesome Apache
Calcite project
Query
Optimizer
HBase
Rules
File System
Rules
31. ®
© 2014 MapR Technologies 31
Statistics
• Different Storage systems have different capabilities
• All: Basic estimation of row count
– Allows the first 70% of critical optimization decisions for most Hadoop
workloads
– Either exact numbers or approximations based on best effort
• Some file formats
– Better than basic stats in file, such as Parquet
• Substantial effort to make certain formats ‘nearly native’
– Drill adding support for extended attributes for enhanced file-held
statistics
32. ®
© 2014 MapR Technologies 32
Reading Data Quickly, Moving Data Quickly
• Highly Optimized Native Drill Readers:
– Vectorized Parquet, Text/CSV, JSON
– Also works with all Hive supported formats
• Drill supports partition pruning
– Adding direct physical property exposure soon for highly optimized
cases
• Drill parallelizes to maximum level format allows
– Also balances data locality and maximum parallelization
• Bespoke Asynchronous Zero-Copy RPC Layer
– Built specifically for Drill’s internal data format
33. ®
© 2014 MapR Technologies 33
Parquet: The Format for the next decade
Apache Drill & Apache Parquet communities working together
• Better Ecosystem integration and decentralized metadata
– Self Description capabilities
– Ecosystem support for logical data types
• Enhanced Performance
– New vectorized reading
– Enhanced memory pooling and management
– Indexing*
– Better metadata positioning (for improved page pruning)*
– Enhanced vectorization and late materialization reading*
*In progress
34. ®
© 2014 MapR Technologies 34
DISK
Value Vectors & Record Batches: Drill’s In-memory
Columnar Work Units
• Random access: sort without copy or
restructuring
• Fully specified in memory shredded complex
data structures
• Remove serialization or copy overhead at
node boundaries
• Spool to disk as necessary
• Interact with data in Java or C without copy
or overhead
Drill Bit
Memory
overflow
uses disk
35. ®
© 2014 MapR Technologies 35
Runtime Compilation and Multi-phased planning
• Drill does best effort initial query parsing, planning and validation
– Where Drill doesn’t understand data, it provides support for ANY type,
allowing late type binding.
• At execution time, individual nodes do secondary pass
– Schema <--> Query parsing and validation
– Type casting, coercion and promotion
– Compilation based on schema requirements
• As schema changes, Drill supports recompilation of each
operator as necessary
36. ®
© 2014 MapR Technologies 36
Runtime Compilation Pattern
Custom
Bytecode
Optimization
Bytecode
Merging
Janino
compilation
CodeModel
Generated
Code
Precompiled
Bytecode
Templates Loaded Class
UDFs
Source Code
Transformation
Bytecode
Merging
37. ®
© 2014 MapR Technologies 37
Advanced Compilation Techniques
• Optimization based on observation and assembly
• Drill does a number of pre-machine-code-compilation
optimizations to ensure efficient execution
• Some examples:
– Removal of type and bounds checking
– Direct micro pointers for in-record-batch references
– Little endian data formats
– Bytecode-level scalar replacement
38. ®
© 2014 MapR Technologies 38
Enhanced Interfaces for high-speed Storage and UDF
• All built-in functions in Drill are simply UDFs
• UDF interface works directly with
– compilation engine
– bytecode rewriting algorithms
• Storage Interface is a columnar vectorized transfer interface
– Allows storage system to determine best transformation approach
– Adding support for deferred materialization
– Support for multiple target languages with minimal overhead
39. ®
© 2014 MapR Technologies 39
Drill Does Vectorization & Supports Columnar Functions
• Drill often operates on more than one record at a time
– Word-sized manipulations
– SIMD instructions
– Manually coded algorithms
• Columnar Functions Improve Many Operations
– Bitmaps allow lightning fast null-checks and reduction in branching
– Type demotion to reduce memory and CPU computation overhead
– Direct Conversions where possible
40. ®
© 2014 MapR Technologies 40
Drill provides advanced query telemetry
• What happened during query for all three levels of DAG
execution
• Each profile is stored as JSON file for easy review, sharing and
backup (at end of query execution)
• Profiles can be analyzed using Drill, allows:
– easy longitudinal analysis of workload
– multi-tenancy performance analysis
– impact of configuration changes to benchmark workloads
41. ®
© 2014 MapR Technologies 41
A Color-coded visual layout and Gant timing chart is provided
42. ®
© 2014 MapR Technologies 42
Memory Efficiency
• Drill’s in memory representation is designed to minimize memory
overhead
• Custom implementation of columnar-aware data structures
including hash tables, sort operations, etc.
– For example, entry overhead for hash table is 8 bytes per value
– Sort pointer overhead for sort is 2 to 4 bytes per entry
• Adding support for compressed columnar representation further
to improve compactness
43. ®
© 2014 MapR Technologies 43
Scale and Concurrency
• Drill’s execution model leverages both local and remote exchanges for
changes in parallelization
– Muxing Local, Demuxing Local, Broadcast, Hash to Merge, Hash to Random, Ordered
Partition, Single Merge, Union
• All operations can be parallelized at the thread and node level
• Thread count and parallelization are influenced by data size, query phasing
and system load
– Administrators have basic queuing control to manage workload
• All threads are independent and pipelined, all run in a single process per
node
• Node <> node communication is multiplexed, push-based with sub-socket
back-pressure support
• Testing has proceeded up to 150 nodes. Target is 1000 nodes by GA.
– Drill adapts data transfer size, buffers, muxing and other operations based on query
scale to minimize n2 multiplier effects
44. ®
© 2014 MapR Technologies 44
Who & When
Office Hours
• Now, table B
Apache: join in!
getdrill.org
0.8 next week
1.0 in month or two
@INTJESUS
@ApacheDrill
@MAPR
GigaOM Research, Sector Roadmap: Hadoop/Data Warehouse Interoperability