SlideShare une entreprise Scribd logo
1  sur  67
®
© 2014 MapR Technologies 1
®
© 2014 MapR Technologies
Apache Drill
Andy Pernsteiner
2014-08-20 : Pittsburgh HUG
®
© 2014 MapR Technologies 2
• Pioneering Data Agility for Hadoop
• Apache open source project
• Scale-out execution engine for low-latency queries
• Unified SQL-based API for analytics & operational applications
APACHE DRILL
40+ contributors
150+ years of experience building
databases and distributed systems
®
© 2014 MapR Technologies 3
Active Drill Community
•  Large community, growing rapidly
–  35-40 contributors, 16 committers
–  Microsoft, Linked-in, Oracle, Facebook, Visa,
Lucidworks,Hortonworks, Concurrent, many universities
•  In 2014
–  over 20 meet-ups, many more coming soon
–  2 hackathons, with 40+ participants
•  Encourage you to join, learn, contribute and have fun …
®
© 2014 MapR Technologies 4© 2014 MapR Technologies
®
Why drill?
®
© 2014 MapR Technologies 5
Hadoop an augmentation for EDW—Why?
®
© 2014 MapR Technologies 6
®
© 2014 MapR Technologies 7
®
© 2014 MapR Technologies 8
Consolidating multiple schemas is very hard.
Why? Since schema-on-write, retrieval is pre-determined.
®
© 2014 MapR Technologies 9
Silos make analysis very difficult
•  How do I identify a
unique {customer,
trade} across data
sets?
•  How can I guarantee
the lack of anomalous
behavior if I can’t see
all data?
®
© 2014 MapR Technologies 10
Why Hadoop
®
© 2014 MapR Technologies 11
SQL is here to stay
	
  
®
© 2014 MapR Technologies 12
YOU CAN’T HANDLE REAL SQL
®
© 2014 MapR Technologies 13
SQL
select * from A
where exists (
select 1 from B where B.b < 100 );
•  Did you know Apache HIVE cannot compute it?
–  eg, Hive, Impala, Spark/Shark
®
© 2014 MapR Technologies 14
Self-described Data
select cf.month, cf.year
from hbase.table1;
•  Did you know normal SQL cannot handle the above?
•  Nor can HIVE and its variants like Impala, Shark?
•  Because there’s no meta-store definition available
®
© 2014 MapR Technologies 15
Rethink SQL for Big Data
Preserve
•  ANSI SQL
•  Familiar and ubiquitous
•  Performance
•  Interactive nature crucial for BI/Analytics
•  One technology
•  Painful to manage different technologies
•  Enterprise ready
•  System-of-record, HA, DR, Security, Multi-
tenancy, …
Invent
•  Flexible data-model
•  Allow schemas to evolve rapidly
•  Support semi-structured data types
•  Agility
•  Self-service possible when developer and DBA
is same
•  Scalability
•  In all dimensions: data, speed, schemas,
processes, management
®
© 2014 MapR Technologies 16
Distance to Data
Business
(analysts, developers)
“Plumbing”
development
MapReduce
Business
(analysts, developers)
Modeling and
transformations
Hive and other
SQL-on-Hadoop
Existing approaches
require a middleman (IT)
Data
Data
®
© 2014 MapR Technologies 17
Real-World Data Modeling and Transformations
®
© 2014 MapR Technologies 18
®
© 2014 MapR Technologies 19
Distance to Data
Business
(analysts, developers)
“Plumbing”
development
MapReduce
Hive and other
SQL-on-Hadoop
Business
(analysts, developers)Data Agility
Existing approaches
require a middleman (IT)
Data
Data
Data
Business
(analysts, developers)
Modeling and
transformations
®
© 2014 MapR Technologies 20
Why Improve Distance to Data?
•  Enable rapid data exploration and
application development
•  IT should provide a valuable
service without “getting in the way”
•  Can’t add DBAs to keep up with
the exponential data growth
•  Minimize “unnecessary work” so IT
can focus on value-added activities
and become a partner to the
business users
2Reduce the burden on ITImprove time to value
®
© 2014 MapR Technologies 21© 2014 MapR Technologies
®
Self-Service Data Exploration
®
© 2014 MapR Technologies 22
Evolution Towards Self-Service Data Exploration
Data Modeling and
Transformation
Data Visualization
IT-driven
IT-driven
IT-driven
Self-service
IT-driven
Self-service
Optional
Self-service
Traditional BI
w/ RDBMS
Self-Service BI
w/ RDBMS
SQL-on-Hadoop
Self-Service
Data Exploration
Zero-day analytics
®
© 2014 MapR Technologies 23
(1) Self-Describing Data is Ubiquitous
Flat files in DFS
•  Complex data (Thrift, Avro, protobuf)
•  Columnar data (Parquet, ORC)
•  Loosely defined (JSON)
•  Traditional files (CSV, TSV)
Data stored in NoSQL stores
•  Relational-like (rows, columns)
•  Sparse data (NoSQL maps)
•  Embedded blobs (JSON)
•  Document stores (nested objects)
{!
name: {!
first: Michael,!
last: Smith!
},!
hobbies: [ski, soccer],!
district: Los Altos!
}!
{!
name: {!
first: Jennifer,!
last: Gates!
},!
hobbies: [sing],!
preschool: CCLC!
}!
®
© 2014 MapR Technologies 24
(2) Drill’s Data Model is Flexible
HBase
JSON
BSON
CSV
TSV
Parquet
Avro
Schema-lessFixed schema
Flat
Complex
Flexibility
Flexibility
Name! Gender! Age!
Michael! M! 6!
Jennifer! F! 3!
{!
name: {!
first: Michael,!
last: Smith!
},!
hobbies: [ski, soccer],!
district: Los Altos!
}!
{!
name: {!
first: Jennifer,!
last: Gates!
},!
hobbies: [sing],!
preschool: CCLC!
}!
RDBMS/SQL-on-Hadoop table
Apache Drill table
®
© 2014 MapR Technologies 25
(3) Drill Supports Schema Discovery On-The-Fly
•  Fixed schema
•  Leverage schema in centralized
repository (Hive Metastore)
•  Fixed schema, evolving schema or
schema-less
•  Leverage schema in centralized
repository or self-describing data
2Schema Discovered On-The-FlySchema Declared In Advance
SCHEMA ON
WRITE
SCHEMA
BEFORE READ
SCHEMA ON THE
FLY
®
© 2014 MapR Technologies 26
Seamless integration with Apache Hive
•  Low latency queries on Hive tables
•  Support for 100s of Hive file formats
•  Ability to reuse Hive UDFs
•  Support for multiple Hive Metastores in a single query
®
© 2014 MapR Technologies 27
Apache Drill: Self Service SQL for Big data
AGILITY
INSTANT INSIGHTS TO BIG DATA
FLEXIBILITY
ONE INTERFACE
FOR HADOOP & NOSQL
FAMILIARITY
EXISTING SKILLS &
TECHNOLOGIES
•  Direct queries on self
describing data
•  No schemas or ETL
required
•  Query HBase and
other NoSQL stores
•  Use SQL to natively
operate on complex
data types (such as
JSON)
•  Leverage ANSI SQL
skills and BI tools
•  Plug-n-play with Hive
schema, file formats,
UDF’s
®
© 2014 MapR Technologies 28
Enterprise Hadoop from MapR
Management
MapR Data Platform
APACHE HADOOP ECOSYSTEM
28
Storm
Shark
Accumulo
Sentry
Spark
Impala
HBase
MapReduce
Hue
Solr
YARN
Flume
Cascading
Pig
Sqoop
Hive/
Stinger/
Tez
Whirr
Oozie
Mahout
Zookeeper
Enterprise-grade Inter-operability Multi-tenancy Security Operational
DrillDrill
®
© 2014 MapR Technologies 29
Drill 1.0 Hive 0.13 w/ Tez Impala 1.x Shark 0.9
Latency Low Medium Low Medium
Files Yes (all Hive file
formats, plus JSON,
Text, …)
Yes (all Hive file
formats)
Yes (Parquet,
Sequence, …)
Yes (all Hive file
formats)
HBase/M7 Yes Yes, perf issues Yes, with issues Yes, perf issues
Schema Hive or schema-less Hive Hive Hive
SQL support ANSI SQL HiveQL HiveQL (subset) HiveQL
Client support ODBC/JDBC ODBC/JDBC ODBC/JDBC ODBC/JDBC
Hive compat High High Low High
Large datasets Yes Yes Limited Limited
Nested data Yes Limited No Limited
Concurrency High Limited Medium Limited
Interactive SQL-on-Hadoop options
®
© 2014 MapR Technologies 30
Underneath the Covers
®
© 2014 MapR Technologies 31
Storage config
®
© 2014 MapR Technologies 32
Basic Process
Zookeepe
r
DFS/HBase DFS/HBase DFS/HBase
Drillbit
Distributed Cache
Drillbit
Distributed Cache
Drillbit
Distributed Cache
Query 1. Query comes to any Drillbit (JDBC, ODBC, CLI, protobuf)
2. Drillbit generates execution plan based on query optimization & locality
3. Fragments are farmed to individual nodes
4. Result is returned to driving node
c c c
®
© 2014 MapR Technologies 33
Stages of Query Planning
Parser
Logical
Planner
Physical
Planner
Query
Foreman
Plan
fragments
sent to drill
bits
SQL
Query
Heuristic and
cost based
Cost based
®
© 2014 MapR Technologies 34© 2014 MapR Technologies
®
Quick Tour
Self-Service Data Exploration with Apache Drill
®
© 2014 MapR Technologies 35
Zero to Results in 2 Minutes (3 Commands)
$ tar xzf apache-drill.tar.gz!
!
$ apache-drill/bin/sqlline -u jdbc:drill:zk=local!
!
0: jdbc:drill:zk=local>!
SELECT count(*) AS incidents, columns[1] AS category!
FROM dfs.`/tmp/SFPD_Incidents_-_Previous_Three_Months.csv`!
GROUP BY columns[1]!
ORDER BY incidents DESC;!
+------------+------------+!
| incidents | category |!
+------------+------------+!
| 8372 | LARCENY/THEFT |!
| 4247 | OTHER OFFENSES |!
| 3765 | NON-CRIMINAL |!
| 2502 | ASSAULT |!
...!
35 rows selected (0.847 seconds)!
Install
Launch shell
(embedded
mode)
Query
Results
®
© 2014 MapR Technologies 36
Data Sources
!select timestamp, message!
!from dfs1.logs.`AppServerLogs/2014/Jan/
p001.parquet` !
!where errorLevel > 2
This is a cluster in Apache Drill
-  DFS
-  HBase
-  Hive meta-store
A work-space
-  Typically a
sub-
directory
A table
-  pathnames
-  Hbase table
-  Hive table
®
© 2014 MapR Technologies 37
A storage engine instance
-  DFS
-  HBase
-  Hive Metastore/HCatalog
A workspace
-  Sub-directory
-  Hive database
-  HBase namespace
A table
-  pathnames
-  HBase table
-  Hive table
Data Source is in the Query
SELECT timestamp, message!
FROM dfs1.logs.`AppServerLogs/2014/Jan/p001.parquet` !
WHERE errorLevel > 2!
®
© 2014 MapR Technologies 38
Data Sources
•  JSON
•  CSV
•  ORC (ie, all Hive types)
•  Parquet
•  HBase tables
•  … can combine them
Select USERS.name,
PROF.emails.work from
dfs.logs.`/data/logs` LOGS,
dfs.users.`/profiles.json` USERS,
where
LOGS.uid = USERS.uid and
errorLevel > 5
order by count(*);
®
© 2014 MapR Technologies 39
Files and trees
// Dynamic queries on files!
select errorLevel, count(*)

from dfs.logs.`/AppServerLogs/2014/Jan/
part0001.parquet` group by errorLevel;!
!
// Dynamic queries on entire directory tree!
select errorLevel, count(*) as TotalErrors

from dfs.logs.`/AppServerLogs`

group by errorLevel;!
®
© 2014 MapR Technologies 40
More with Trees
Use pathname elements as variables in your query…!
# Query some partitions: How many errors per level by month from 2012?!
!
SELECT errorLevel, count(*)!
FROM dfs.logs.`/AppServerLogs`!
WHERE dirs[1] >= 2012!
GROUP BY errorLevel, dirs[2];!
!
# Even more control: How many sales by month in Q4 from 2012 on?!
!
SELECT count(*) as sales, dir0, dir1!
FROM dfs.logs.`/transactionlogs`!
WHERE dir0 >= 2012 and dir1 >=9 and purch_flag=true!
GROUP BY dir0, dir1;!
!
!
!
®
© 2014 MapR Technologies 41
Works with HBase and Embedded Blobs
# Query an HBase table directly (no schemas)!
!
SELECT cf1.month, cf1.year !
FROM hbase.table1;!
!
# Embedded JSON value inside column profileBlob inside
column family cf1 of the HBase table users!
!
SELECT profile.name, count(profile.children)!
FROM (!
SELECT CONVERT_FROM(cf1.profileBlob, 'json') AS profile!
FROM hbase.users!
)!
®
© 2014 MapR Technologies 42
Combine Data Sources on the Fly
# Join log directory with JSON file (user profiles) to identify the
name and email address for anyone associated with an error message.!
!
SELECT DISTINCT users.name, users.emails.work!
FROM dfs.logs.`/data/logs` logs,!
dfs.users.`/profiles.json` users!
WHERE logs.uid = users.id AND!
logs.errorLevel > 5;!
!
# Join a Hive table and an HBase table (without Hive metadata) to
determine the number of tweets per user!
!
SELECT users.name, count(*) as tweetCount!
FROM hive.social.tweets tweets,!
hbase.users users!
WHERE tweets.userId = convert_from(users.rowkey, 'UTF-8')!
GROUP BY tweets.userId;!
!
®
© 2014 MapR Technologies 43
Use ANSI SQL with no modifications
# TPC-H standard query 4!
!
SELECT!
o.o_orderpriority, count(*) AS order_count!
FROM orders o!
WHERE o.o_orderdate >= date '1996-10-01'!
AND o.o_orderdate < date '1996-10-01' + interval '3' month!
AND EXISTS(!
SELECT * FROM lineitem l !
WHERE l.l_orderkey = o.o_orderkey!
AND l.l_commitdate < l.l_receiptdate!
)!
GROUP BY o.o_orderpriority!
ORDER BY o.o_orderpriority;!
®
© 2014 MapR Technologies 44© 2014 MapR Technologies
®
Demo
®
© 2014 MapR Technologies 45
Drill resources
WIKI:
https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+Wiki!
!
Drill in 10 minutes:
https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in
+10+Minutes!
!
Apache page: http://incubator.apache.org/drill/!
!
!
®
© 2014 MapR Technologies 46
Thank You
@mapr maprtech
tshiran@mapr.com
MapRTechnologies
maprtech
mapr-technologies
®
© 2014 MapR Technologies 47
Underneath the Covers
®
© 2014 MapR Technologies 48
Basic Process
Zookeepe
r
DFS/HBase DFS/HBase DFS/HBase
Drillbit
Distributed Cache
Drillbit
Distributed Cache
Drillbit
Distributed Cache
Query 1. Query comes to any Drillbit (JDBC, ODBC, CLI, protobuf)
2. Drillbit generates execution plan based on query optimization & locality
3. Fragments are farmed to individual nodes
4. Result is returned to driving node
c c c
®
© 2014 MapR Technologies 49
Stages of Query Planning
Parser
Logical
Planner
Physical
Planner
Query
Foreman
Plan
fragments
sent to drill
bits
SQL
Query
Heuristic and
cost based
Cost based
®
© 2014 MapR Technologies 50
Query Execution
SQL
Parser
Optimizer
Scheduler
Pig
Parser
PhysicalPlan
Mongo
Cassandra
HiveQL
Parser
RPC Endpoint
Distributed Cache
StorageEngine
Interface
OperatorsOperators
Foreman
LogicalPlan
HDFS
HBase
JDBC
Endpoint
ODBC
Endpoint
®
© 2014 MapR Technologies 51
A Query engine that is…
•  Columnar/Vectorized
•  Optimistic/pipelined
•  Runtime compilation
•  Late binding
•  Extensible
®
© 2014 MapR Technologies 52
Columnar representation
A B C D E
A
B
C
D
On disk
E
®
© 2014 MapR Technologies 53
Columnar Encoding
•  Values in a col. stored next to one-another
–  Better compression
–  Range-map: save min-max, can skip if not
present
•  Only retrieve columns participating in query
•  Aggregations can be performed without
decoding
A
B
C
D
On disk
E
®
© 2014 MapR Technologies 54
Run-length-encoding & Sum
•  Dataset encoded as <val> <run-length>:
–  2, 4 (4 2’s)
–  8, 10 (10 8’s)
•  Goal: sum all the records
•  Normally:
–  Decompress: 2, 2, 2, 2, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8
–  Add: 2 + 2 + 2 + 2 + 8 + 8 + 8 + 8 + 8 + 8 + 8 + 8 + 8 + 8
•  Optimized work: 2 * 4 + 8 * 10
–  Less memory, less operations
®
© 2014 MapR Technologies 55
Bit-packed Dictionary Sort
•  Dataset encoded with a dictionary and bit-positions:
–  Dictionary: [Rupert, Bill, Larry] {0, 1, 2}
–  Values: [1,0,1,2,1,2,1,0]
•  Normal work
–  Decompress & store: Bill, Rupert, Bill, Larry, Bill, Larry, Bill, Rupert
–  Sort: ~24 comparisons of variable width strings
•  Optimized work
–  Sort dictionary: {Bill: 1, Larry: 2, Rupert: 0}
–  Sort bit-packed values
–  Work: max 3 string comparisons, ~24 comparisons of fixed-width
dictionary bits
®
© 2014 MapR Technologies 56
Drill 4-value semantics
•  SQL’s 3-valued semantics
–  True
–  False
–  Unknown
•  Drill adds fourth
–  Repeated
®
© 2014 MapR Technologies 57
Vectorization
•  Drill operates on more than one record at a time
–  Word-sized manipulations
–  SIMD-like instructions
•  GCC, LLVM and JVM all do various optimizations automatically
–  Manually code algorithms
•  Logical Vectorization
–  Bitmaps allow lightning fast null-checks
–  Avoid branching to speed CPU pipeline
®
© 2014 MapR Technologies 58
Runtime Compilation is Faster
•  JIT is smart, but
more gains with
runtime
compilation
•  Janino: Java-
based Java
compiler
From http://bit.ly/
16Xk32x
®
© 2014 MapR Technologies 59
Drill compiler
Loaded class
Merge byte-
code of the
two classes
Janino
compiles
runtime
byte-code
CodeModel
generates
code
Precompiled
byte-code
templates
®
© 2014 MapR Technologies 60
Optimistic
0
20
40
60
80
100
120
140
160
Speed vs. check-pointing
No need to checkpoint
Checkpoint frequentlyApache Drill
®
© 2014 MapR Technologies 61
Optimistic Execution
•  Recovery code trivial
–  Running instances discard the failed query’s intermediate state
•  Pipelining possible
–  Send results as soon as batch is large enough
–  Requires barrier-less decomposition of query
®
© 2014 MapR Technologies 62
Batches of Values
•  Value vectors
–  List of values, with same schema
–  With the 4-value semantics for each value
•  Shipped around in batches
–  max 256k bytes in a batch
–  max 64K rows in a batch
•  RPC designed for multiple replies to a request
®
© 2014 MapR Technologies 63
Pipelining
•  Record batches are pipelined
between nodes
–  ~256kB usually
•  Unit of work for Drill
–  Operators works on a batch
•  Operator reconfiguration
happens at batch boundaries
DrillBit
DrillBit DrillBit
®
© 2014 MapR Technologies 64
Pipelining Record Batches
SQL
Parser
Optimizer
Scheduler
Pig
Parser
PhysicalPlan
Mongo
Cassandra
HiveQL
Parser
RPC Endpoint
Distributed Cache
StorageEngine
Interface
OperatorsOperators
Foreman
LogicalPlan
HDFS
HBase
JDBC
Endpoint
ODBC
Endpoint
®
© 2014 MapR Technologies 65
DISK
Pipelining
•  Random access: sort without copy or
restructuring
•  Avoids serialization/deserialization
•  Off-heap (no GC woes when lots of
memory)
•  Full specification + off-heap + batch
–  Enables C/C++ operators (fast!)
•  Read/write to disk
–  when data larger than memory
Drill Bit
Memory
overflow
uses disk
®
© 2014 MapR Technologies 66
Cost-based Optimization
•  Using Optiq, an extensible framework
•  Pluggable rules, and cost model
•  Rules for distributed plan generation
•  Insert Exchange operator into physical plan
•  Optiq enhanced to explore parallel query plans
•  Pluggable cost model
–  CPU, IO, memory, network cost (data locality)
–  Storage engine features (HDFS vs HIVE vs HBase)
Query
Optimizer
Pluggable
rules
Pluggable
cost model
®
© 2014 MapR Technologies 67
Distributed Plan Cost
•  Operators have distribution property
•  Hash, Broadcast, Singleton, …
•  Exchange operator to enforce distributions
•  Hash: HashToRandomExchange
•  Broadcast: BroadcastExchange
•  Singleton: UnionExchange, SingleMergeExchange
•  Enumerate all, use cost to pick best
•  Merge Join vs Hash Join
•  Partition-based join vs Broadcast-based join
•  Streaming Aggregation vs Hash Aggregation
•  Aggregation in one phase or two phases
•  partial local aggregation followed by final aggregation
HashToRandomExchange
Sort
Streaming-Aggregation
Data Data Data

Contenu connexe

Tendances

SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impalamarkgrover
 
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...Data Con LA
 
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupIntro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupMike Percy
 
Oracle Database Appliance X5-2
Oracle Database Appliance X5-2 Oracle Database Appliance X5-2
Oracle Database Appliance X5-2 Yasir El Nimr
 
Oracle Database Appliance
Oracle Database ApplianceOracle Database Appliance
Oracle Database ApplianceJay Patel
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Data Con LA
 
Interactive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroDataInteractive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroDataOfir Manor
 
Kudu: Resolving Transactional and Analytic Trade-offs in Hadoop
Kudu: Resolving Transactional and Analytic Trade-offs in HadoopKudu: Resolving Transactional and Analytic Trade-offs in Hadoop
Kudu: Resolving Transactional and Analytic Trade-offs in Hadoopjdcryans
 
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming dataUsing Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming dataMike Percy
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache KuduJeff Holoman
 
Apache ignite Datagrid
Apache ignite DatagridApache ignite Datagrid
Apache ignite DatagridSurinder Mehra
 
Oracle Database Appliance, ODA, X7-2 portfolio.
Oracle Database Appliance, ODA, X7-2 portfolio.Oracle Database Appliance, ODA, X7-2 portfolio.
Oracle Database Appliance, ODA, X7-2 portfolio.Daryll Whyte
 
Oracle Database appliance - Value proposition Webcast
Oracle Database appliance - Value proposition WebcastOracle Database appliance - Value proposition Webcast
Oracle Database appliance - Value proposition WebcastThanos TP
 
Exponea - Kafka and Hadoop as components of architecture
Exponea  - Kafka and Hadoop as components of architectureExponea  - Kafka and Hadoop as components of architecture
Exponea - Kafka and Hadoop as components of architectureMartinStrycek
 
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...Cloudera, Inc.
 
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...Yahoo Developer Network
 

Tendances (20)

SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
 
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupIntro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application Meetup
 
Oracle Database Appliance X5-2
Oracle Database Appliance X5-2 Oracle Database Appliance X5-2
Oracle Database Appliance X5-2
 
ODA X6-2 family
ODA X6-2 familyODA X6-2 family
ODA X6-2 family
 
Oracle Database Appliance
Oracle Database ApplianceOracle Database Appliance
Oracle Database Appliance
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
 
Interactive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroDataInteractive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroData
 
Kudu: Resolving Transactional and Analytic Trade-offs in Hadoop
Kudu: Resolving Transactional and Analytic Trade-offs in HadoopKudu: Resolving Transactional and Analytic Trade-offs in Hadoop
Kudu: Resolving Transactional and Analytic Trade-offs in Hadoop
 
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming dataUsing Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache Kudu
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache Kudu
 
Apache ignite Datagrid
Apache ignite DatagridApache ignite Datagrid
Apache ignite Datagrid
 
Oracle Database Appliance, ODA, X7-2 portfolio.
Oracle Database Appliance, ODA, X7-2 portfolio.Oracle Database Appliance, ODA, X7-2 portfolio.
Oracle Database Appliance, ODA, X7-2 portfolio.
 
Oracle Database appliance - Value proposition Webcast
Oracle Database appliance - Value proposition WebcastOracle Database appliance - Value proposition Webcast
Oracle Database appliance - Value proposition Webcast
 
Exponea - Kafka and Hadoop as components of architecture
Exponea  - Kafka and Hadoop as components of architectureExponea  - Kafka and Hadoop as components of architecture
Exponea - Kafka and Hadoop as components of architecture
 
HPE Keynote Hadoop Summit San Jose 2016
HPE Keynote Hadoop Summit San Jose 2016HPE Keynote Hadoop Summit San Jose 2016
HPE Keynote Hadoop Summit San Jose 2016
 
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
 
Apache Hadoop 3
Apache Hadoop 3Apache Hadoop 3
Apache Hadoop 3
 
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
 

Similaire à 2014 08-20-pit-hug

Webinar: Selecting the Right SQL-on-Hadoop Solution
Webinar: Selecting the Right SQL-on-Hadoop SolutionWebinar: Selecting the Right SQL-on-Hadoop Solution
Webinar: Selecting the Right SQL-on-Hadoop SolutionMapR Technologies
 
Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop BigDataEverywhere
 
Data warehousing with Hadoop
Data warehousing with HadoopData warehousing with Hadoop
Data warehousing with Hadoophadooparchbook
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoopbddmoscow
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillTomer Shiran
 
Drill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is PossibleDrill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is PossibleMapR Technologies
 
Building Big data solutions in Azure
Building Big data solutions in AzureBuilding Big data solutions in Azure
Building Big data solutions in AzureMostafa
 
Self-Service Data Exploration with Apache Drill
Self-Service Data Exploration with Apache DrillSelf-Service Data Exploration with Apache Drill
Self-Service Data Exploration with Apache DrillMapR Technologies
 
Big data solutions in Azure
Big data solutions in AzureBig data solutions in Azure
Big data solutions in AzureMostafa
 
Putting Apache Drill into Production
Putting Apache Drill into ProductionPutting Apache Drill into Production
Putting Apache Drill into ProductionMapR Technologies
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Dataconomy Media
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Mats Uddenfeldt
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosLester Martin
 
Hadoop and the Future of SQL: Using BI Tools with Big Data
Hadoop and the Future of SQL: Using BI Tools with Big DataHadoop and the Future of SQL: Using BI Tools with Big Data
Hadoop and the Future of SQL: Using BI Tools with Big DataSenturus
 
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014cdmaxime
 
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)BigDataEverywhere
 

Similaire à 2014 08-20-pit-hug (20)

Webinar: Selecting the Right SQL-on-Hadoop Solution
Webinar: Selecting the Right SQL-on-Hadoop SolutionWebinar: Selecting the Right SQL-on-Hadoop Solution
Webinar: Selecting the Right SQL-on-Hadoop Solution
 
Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop
 
Data warehousing with Hadoop
Data warehousing with HadoopData warehousing with Hadoop
Data warehousing with Hadoop
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoop
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache Drill
 
Drill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is PossibleDrill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is Possible
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Using Apache Drill
Using Apache DrillUsing Apache Drill
Using Apache Drill
 
Hadoop intro
Hadoop introHadoop intro
Hadoop intro
 
Building Big data solutions in Azure
Building Big data solutions in AzureBuilding Big data solutions in Azure
Building Big data solutions in Azure
 
Self-Service Data Exploration with Apache Drill
Self-Service Data Exploration with Apache DrillSelf-Service Data Exploration with Apache Drill
Self-Service Data Exploration with Apache Drill
 
Big data solutions in Azure
Big data solutions in AzureBig data solutions in Azure
Big data solutions in Azure
 
Putting Apache Drill into Production
Putting Apache Drill into ProductionPutting Apache Drill into Production
Putting Apache Drill into Production
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
 
Hadoop and the Future of SQL: Using BI Tools with Big Data
Hadoop and the Future of SQL: Using BI Tools with Big DataHadoop and the Future of SQL: Using BI Tools with Big Data
Hadoop and the Future of SQL: Using BI Tools with Big Data
 
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014
 
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
 

Dernier

5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 

Dernier (20)

5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 

2014 08-20-pit-hug

  • 1. ® © 2014 MapR Technologies 1 ® © 2014 MapR Technologies Apache Drill Andy Pernsteiner 2014-08-20 : Pittsburgh HUG
  • 2. ® © 2014 MapR Technologies 2 • Pioneering Data Agility for Hadoop • Apache open source project • Scale-out execution engine for low-latency queries • Unified SQL-based API for analytics & operational applications APACHE DRILL 40+ contributors 150+ years of experience building databases and distributed systems
  • 3. ® © 2014 MapR Technologies 3 Active Drill Community •  Large community, growing rapidly –  35-40 contributors, 16 committers –  Microsoft, Linked-in, Oracle, Facebook, Visa, Lucidworks,Hortonworks, Concurrent, many universities •  In 2014 –  over 20 meet-ups, many more coming soon –  2 hackathons, with 40+ participants •  Encourage you to join, learn, contribute and have fun …
  • 4. ® © 2014 MapR Technologies 4© 2014 MapR Technologies ® Why drill?
  • 5. ® © 2014 MapR Technologies 5 Hadoop an augmentation for EDW—Why?
  • 6. ® © 2014 MapR Technologies 6
  • 7. ® © 2014 MapR Technologies 7
  • 8. ® © 2014 MapR Technologies 8 Consolidating multiple schemas is very hard. Why? Since schema-on-write, retrieval is pre-determined.
  • 9. ® © 2014 MapR Technologies 9 Silos make analysis very difficult •  How do I identify a unique {customer, trade} across data sets? •  How can I guarantee the lack of anomalous behavior if I can’t see all data?
  • 10. ® © 2014 MapR Technologies 10 Why Hadoop
  • 11. ® © 2014 MapR Technologies 11 SQL is here to stay  
  • 12. ® © 2014 MapR Technologies 12 YOU CAN’T HANDLE REAL SQL
  • 13. ® © 2014 MapR Technologies 13 SQL select * from A where exists ( select 1 from B where B.b < 100 ); •  Did you know Apache HIVE cannot compute it? –  eg, Hive, Impala, Spark/Shark
  • 14. ® © 2014 MapR Technologies 14 Self-described Data select cf.month, cf.year from hbase.table1; •  Did you know normal SQL cannot handle the above? •  Nor can HIVE and its variants like Impala, Shark? •  Because there’s no meta-store definition available
  • 15. ® © 2014 MapR Technologies 15 Rethink SQL for Big Data Preserve •  ANSI SQL •  Familiar and ubiquitous •  Performance •  Interactive nature crucial for BI/Analytics •  One technology •  Painful to manage different technologies •  Enterprise ready •  System-of-record, HA, DR, Security, Multi- tenancy, … Invent •  Flexible data-model •  Allow schemas to evolve rapidly •  Support semi-structured data types •  Agility •  Self-service possible when developer and DBA is same •  Scalability •  In all dimensions: data, speed, schemas, processes, management
  • 16. ® © 2014 MapR Technologies 16 Distance to Data Business (analysts, developers) “Plumbing” development MapReduce Business (analysts, developers) Modeling and transformations Hive and other SQL-on-Hadoop Existing approaches require a middleman (IT) Data Data
  • 17. ® © 2014 MapR Technologies 17 Real-World Data Modeling and Transformations
  • 18. ® © 2014 MapR Technologies 18
  • 19. ® © 2014 MapR Technologies 19 Distance to Data Business (analysts, developers) “Plumbing” development MapReduce Hive and other SQL-on-Hadoop Business (analysts, developers)Data Agility Existing approaches require a middleman (IT) Data Data Data Business (analysts, developers) Modeling and transformations
  • 20. ® © 2014 MapR Technologies 20 Why Improve Distance to Data? •  Enable rapid data exploration and application development •  IT should provide a valuable service without “getting in the way” •  Can’t add DBAs to keep up with the exponential data growth •  Minimize “unnecessary work” so IT can focus on value-added activities and become a partner to the business users 2Reduce the burden on ITImprove time to value
  • 21. ® © 2014 MapR Technologies 21© 2014 MapR Technologies ® Self-Service Data Exploration
  • 22. ® © 2014 MapR Technologies 22 Evolution Towards Self-Service Data Exploration Data Modeling and Transformation Data Visualization IT-driven IT-driven IT-driven Self-service IT-driven Self-service Optional Self-service Traditional BI w/ RDBMS Self-Service BI w/ RDBMS SQL-on-Hadoop Self-Service Data Exploration Zero-day analytics
  • 23. ® © 2014 MapR Technologies 23 (1) Self-Describing Data is Ubiquitous Flat files in DFS •  Complex data (Thrift, Avro, protobuf) •  Columnar data (Parquet, ORC) •  Loosely defined (JSON) •  Traditional files (CSV, TSV) Data stored in NoSQL stores •  Relational-like (rows, columns) •  Sparse data (NoSQL maps) •  Embedded blobs (JSON) •  Document stores (nested objects) {! name: {! first: Michael,! last: Smith! },! hobbies: [ski, soccer],! district: Los Altos! }! {! name: {! first: Jennifer,! last: Gates! },! hobbies: [sing],! preschool: CCLC! }!
  • 24. ® © 2014 MapR Technologies 24 (2) Drill’s Data Model is Flexible HBase JSON BSON CSV TSV Parquet Avro Schema-lessFixed schema Flat Complex Flexibility Flexibility Name! Gender! Age! Michael! M! 6! Jennifer! F! 3! {! name: {! first: Michael,! last: Smith! },! hobbies: [ski, soccer],! district: Los Altos! }! {! name: {! first: Jennifer,! last: Gates! },! hobbies: [sing],! preschool: CCLC! }! RDBMS/SQL-on-Hadoop table Apache Drill table
  • 25. ® © 2014 MapR Technologies 25 (3) Drill Supports Schema Discovery On-The-Fly •  Fixed schema •  Leverage schema in centralized repository (Hive Metastore) •  Fixed schema, evolving schema or schema-less •  Leverage schema in centralized repository or self-describing data 2Schema Discovered On-The-FlySchema Declared In Advance SCHEMA ON WRITE SCHEMA BEFORE READ SCHEMA ON THE FLY
  • 26. ® © 2014 MapR Technologies 26 Seamless integration with Apache Hive •  Low latency queries on Hive tables •  Support for 100s of Hive file formats •  Ability to reuse Hive UDFs •  Support for multiple Hive Metastores in a single query
  • 27. ® © 2014 MapR Technologies 27 Apache Drill: Self Service SQL for Big data AGILITY INSTANT INSIGHTS TO BIG DATA FLEXIBILITY ONE INTERFACE FOR HADOOP & NOSQL FAMILIARITY EXISTING SKILLS & TECHNOLOGIES •  Direct queries on self describing data •  No schemas or ETL required •  Query HBase and other NoSQL stores •  Use SQL to natively operate on complex data types (such as JSON) •  Leverage ANSI SQL skills and BI tools •  Plug-n-play with Hive schema, file formats, UDF’s
  • 28. ® © 2014 MapR Technologies 28 Enterprise Hadoop from MapR Management MapR Data Platform APACHE HADOOP ECOSYSTEM 28 Storm Shark Accumulo Sentry Spark Impala HBase MapReduce Hue Solr YARN Flume Cascading Pig Sqoop Hive/ Stinger/ Tez Whirr Oozie Mahout Zookeeper Enterprise-grade Inter-operability Multi-tenancy Security Operational DrillDrill
  • 29. ® © 2014 MapR Technologies 29 Drill 1.0 Hive 0.13 w/ Tez Impala 1.x Shark 0.9 Latency Low Medium Low Medium Files Yes (all Hive file formats, plus JSON, Text, …) Yes (all Hive file formats) Yes (Parquet, Sequence, …) Yes (all Hive file formats) HBase/M7 Yes Yes, perf issues Yes, with issues Yes, perf issues Schema Hive or schema-less Hive Hive Hive SQL support ANSI SQL HiveQL HiveQL (subset) HiveQL Client support ODBC/JDBC ODBC/JDBC ODBC/JDBC ODBC/JDBC Hive compat High High Low High Large datasets Yes Yes Limited Limited Nested data Yes Limited No Limited Concurrency High Limited Medium Limited Interactive SQL-on-Hadoop options
  • 30. ® © 2014 MapR Technologies 30 Underneath the Covers
  • 31. ® © 2014 MapR Technologies 31 Storage config
  • 32. ® © 2014 MapR Technologies 32 Basic Process Zookeepe r DFS/HBase DFS/HBase DFS/HBase Drillbit Distributed Cache Drillbit Distributed Cache Drillbit Distributed Cache Query 1. Query comes to any Drillbit (JDBC, ODBC, CLI, protobuf) 2. Drillbit generates execution plan based on query optimization & locality 3. Fragments are farmed to individual nodes 4. Result is returned to driving node c c c
  • 33. ® © 2014 MapR Technologies 33 Stages of Query Planning Parser Logical Planner Physical Planner Query Foreman Plan fragments sent to drill bits SQL Query Heuristic and cost based Cost based
  • 34. ® © 2014 MapR Technologies 34© 2014 MapR Technologies ® Quick Tour Self-Service Data Exploration with Apache Drill
  • 35. ® © 2014 MapR Technologies 35 Zero to Results in 2 Minutes (3 Commands) $ tar xzf apache-drill.tar.gz! ! $ apache-drill/bin/sqlline -u jdbc:drill:zk=local! ! 0: jdbc:drill:zk=local>! SELECT count(*) AS incidents, columns[1] AS category! FROM dfs.`/tmp/SFPD_Incidents_-_Previous_Three_Months.csv`! GROUP BY columns[1]! ORDER BY incidents DESC;! +------------+------------+! | incidents | category |! +------------+------------+! | 8372 | LARCENY/THEFT |! | 4247 | OTHER OFFENSES |! | 3765 | NON-CRIMINAL |! | 2502 | ASSAULT |! ...! 35 rows selected (0.847 seconds)! Install Launch shell (embedded mode) Query Results
  • 36. ® © 2014 MapR Technologies 36 Data Sources !select timestamp, message! !from dfs1.logs.`AppServerLogs/2014/Jan/ p001.parquet` ! !where errorLevel > 2 This is a cluster in Apache Drill -  DFS -  HBase -  Hive meta-store A work-space -  Typically a sub- directory A table -  pathnames -  Hbase table -  Hive table
  • 37. ® © 2014 MapR Technologies 37 A storage engine instance -  DFS -  HBase -  Hive Metastore/HCatalog A workspace -  Sub-directory -  Hive database -  HBase namespace A table -  pathnames -  HBase table -  Hive table Data Source is in the Query SELECT timestamp, message! FROM dfs1.logs.`AppServerLogs/2014/Jan/p001.parquet` ! WHERE errorLevel > 2!
  • 38. ® © 2014 MapR Technologies 38 Data Sources •  JSON •  CSV •  ORC (ie, all Hive types) •  Parquet •  HBase tables •  … can combine them Select USERS.name, PROF.emails.work from dfs.logs.`/data/logs` LOGS, dfs.users.`/profiles.json` USERS, where LOGS.uid = USERS.uid and errorLevel > 5 order by count(*);
  • 39. ® © 2014 MapR Technologies 39 Files and trees // Dynamic queries on files! select errorLevel, count(*)
 from dfs.logs.`/AppServerLogs/2014/Jan/ part0001.parquet` group by errorLevel;! ! // Dynamic queries on entire directory tree! select errorLevel, count(*) as TotalErrors
 from dfs.logs.`/AppServerLogs`
 group by errorLevel;!
  • 40. ® © 2014 MapR Technologies 40 More with Trees Use pathname elements as variables in your query…! # Query some partitions: How many errors per level by month from 2012?! ! SELECT errorLevel, count(*)! FROM dfs.logs.`/AppServerLogs`! WHERE dirs[1] >= 2012! GROUP BY errorLevel, dirs[2];! ! # Even more control: How many sales by month in Q4 from 2012 on?! ! SELECT count(*) as sales, dir0, dir1! FROM dfs.logs.`/transactionlogs`! WHERE dir0 >= 2012 and dir1 >=9 and purch_flag=true! GROUP BY dir0, dir1;! ! ! !
  • 41. ® © 2014 MapR Technologies 41 Works with HBase and Embedded Blobs # Query an HBase table directly (no schemas)! ! SELECT cf1.month, cf1.year ! FROM hbase.table1;! ! # Embedded JSON value inside column profileBlob inside column family cf1 of the HBase table users! ! SELECT profile.name, count(profile.children)! FROM (! SELECT CONVERT_FROM(cf1.profileBlob, 'json') AS profile! FROM hbase.users! )!
  • 42. ® © 2014 MapR Technologies 42 Combine Data Sources on the Fly # Join log directory with JSON file (user profiles) to identify the name and email address for anyone associated with an error message.! ! SELECT DISTINCT users.name, users.emails.work! FROM dfs.logs.`/data/logs` logs,! dfs.users.`/profiles.json` users! WHERE logs.uid = users.id AND! logs.errorLevel > 5;! ! # Join a Hive table and an HBase table (without Hive metadata) to determine the number of tweets per user! ! SELECT users.name, count(*) as tweetCount! FROM hive.social.tweets tweets,! hbase.users users! WHERE tweets.userId = convert_from(users.rowkey, 'UTF-8')! GROUP BY tweets.userId;! !
  • 43. ® © 2014 MapR Technologies 43 Use ANSI SQL with no modifications # TPC-H standard query 4! ! SELECT! o.o_orderpriority, count(*) AS order_count! FROM orders o! WHERE o.o_orderdate >= date '1996-10-01'! AND o.o_orderdate < date '1996-10-01' + interval '3' month! AND EXISTS(! SELECT * FROM lineitem l ! WHERE l.l_orderkey = o.o_orderkey! AND l.l_commitdate < l.l_receiptdate! )! GROUP BY o.o_orderpriority! ORDER BY o.o_orderpriority;!
  • 44. ® © 2014 MapR Technologies 44© 2014 MapR Technologies ® Demo
  • 45. ® © 2014 MapR Technologies 45 Drill resources WIKI: https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+Wiki! ! Drill in 10 minutes: https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in +10+Minutes! ! Apache page: http://incubator.apache.org/drill/! ! !
  • 46. ® © 2014 MapR Technologies 46 Thank You @mapr maprtech tshiran@mapr.com MapRTechnologies maprtech mapr-technologies
  • 47. ® © 2014 MapR Technologies 47 Underneath the Covers
  • 48. ® © 2014 MapR Technologies 48 Basic Process Zookeepe r DFS/HBase DFS/HBase DFS/HBase Drillbit Distributed Cache Drillbit Distributed Cache Drillbit Distributed Cache Query 1. Query comes to any Drillbit (JDBC, ODBC, CLI, protobuf) 2. Drillbit generates execution plan based on query optimization & locality 3. Fragments are farmed to individual nodes 4. Result is returned to driving node c c c
  • 49. ® © 2014 MapR Technologies 49 Stages of Query Planning Parser Logical Planner Physical Planner Query Foreman Plan fragments sent to drill bits SQL Query Heuristic and cost based Cost based
  • 50. ® © 2014 MapR Technologies 50 Query Execution SQL Parser Optimizer Scheduler Pig Parser PhysicalPlan Mongo Cassandra HiveQL Parser RPC Endpoint Distributed Cache StorageEngine Interface OperatorsOperators Foreman LogicalPlan HDFS HBase JDBC Endpoint ODBC Endpoint
  • 51. ® © 2014 MapR Technologies 51 A Query engine that is… •  Columnar/Vectorized •  Optimistic/pipelined •  Runtime compilation •  Late binding •  Extensible
  • 52. ® © 2014 MapR Technologies 52 Columnar representation A B C D E A B C D On disk E
  • 53. ® © 2014 MapR Technologies 53 Columnar Encoding •  Values in a col. stored next to one-another –  Better compression –  Range-map: save min-max, can skip if not present •  Only retrieve columns participating in query •  Aggregations can be performed without decoding A B C D On disk E
  • 54. ® © 2014 MapR Technologies 54 Run-length-encoding & Sum •  Dataset encoded as <val> <run-length>: –  2, 4 (4 2’s) –  8, 10 (10 8’s) •  Goal: sum all the records •  Normally: –  Decompress: 2, 2, 2, 2, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8 –  Add: 2 + 2 + 2 + 2 + 8 + 8 + 8 + 8 + 8 + 8 + 8 + 8 + 8 + 8 •  Optimized work: 2 * 4 + 8 * 10 –  Less memory, less operations
  • 55. ® © 2014 MapR Technologies 55 Bit-packed Dictionary Sort •  Dataset encoded with a dictionary and bit-positions: –  Dictionary: [Rupert, Bill, Larry] {0, 1, 2} –  Values: [1,0,1,2,1,2,1,0] •  Normal work –  Decompress & store: Bill, Rupert, Bill, Larry, Bill, Larry, Bill, Rupert –  Sort: ~24 comparisons of variable width strings •  Optimized work –  Sort dictionary: {Bill: 1, Larry: 2, Rupert: 0} –  Sort bit-packed values –  Work: max 3 string comparisons, ~24 comparisons of fixed-width dictionary bits
  • 56. ® © 2014 MapR Technologies 56 Drill 4-value semantics •  SQL’s 3-valued semantics –  True –  False –  Unknown •  Drill adds fourth –  Repeated
  • 57. ® © 2014 MapR Technologies 57 Vectorization •  Drill operates on more than one record at a time –  Word-sized manipulations –  SIMD-like instructions •  GCC, LLVM and JVM all do various optimizations automatically –  Manually code algorithms •  Logical Vectorization –  Bitmaps allow lightning fast null-checks –  Avoid branching to speed CPU pipeline
  • 58. ® © 2014 MapR Technologies 58 Runtime Compilation is Faster •  JIT is smart, but more gains with runtime compilation •  Janino: Java- based Java compiler From http://bit.ly/ 16Xk32x
  • 59. ® © 2014 MapR Technologies 59 Drill compiler Loaded class Merge byte- code of the two classes Janino compiles runtime byte-code CodeModel generates code Precompiled byte-code templates
  • 60. ® © 2014 MapR Technologies 60 Optimistic 0 20 40 60 80 100 120 140 160 Speed vs. check-pointing No need to checkpoint Checkpoint frequentlyApache Drill
  • 61. ® © 2014 MapR Technologies 61 Optimistic Execution •  Recovery code trivial –  Running instances discard the failed query’s intermediate state •  Pipelining possible –  Send results as soon as batch is large enough –  Requires barrier-less decomposition of query
  • 62. ® © 2014 MapR Technologies 62 Batches of Values •  Value vectors –  List of values, with same schema –  With the 4-value semantics for each value •  Shipped around in batches –  max 256k bytes in a batch –  max 64K rows in a batch •  RPC designed for multiple replies to a request
  • 63. ® © 2014 MapR Technologies 63 Pipelining •  Record batches are pipelined between nodes –  ~256kB usually •  Unit of work for Drill –  Operators works on a batch •  Operator reconfiguration happens at batch boundaries DrillBit DrillBit DrillBit
  • 64. ® © 2014 MapR Technologies 64 Pipelining Record Batches SQL Parser Optimizer Scheduler Pig Parser PhysicalPlan Mongo Cassandra HiveQL Parser RPC Endpoint Distributed Cache StorageEngine Interface OperatorsOperators Foreman LogicalPlan HDFS HBase JDBC Endpoint ODBC Endpoint
  • 65. ® © 2014 MapR Technologies 65 DISK Pipelining •  Random access: sort without copy or restructuring •  Avoids serialization/deserialization •  Off-heap (no GC woes when lots of memory) •  Full specification + off-heap + batch –  Enables C/C++ operators (fast!) •  Read/write to disk –  when data larger than memory Drill Bit Memory overflow uses disk
  • 66. ® © 2014 MapR Technologies 66 Cost-based Optimization •  Using Optiq, an extensible framework •  Pluggable rules, and cost model •  Rules for distributed plan generation •  Insert Exchange operator into physical plan •  Optiq enhanced to explore parallel query plans •  Pluggable cost model –  CPU, IO, memory, network cost (data locality) –  Storage engine features (HDFS vs HIVE vs HBase) Query Optimizer Pluggable rules Pluggable cost model
  • 67. ® © 2014 MapR Technologies 67 Distributed Plan Cost •  Operators have distribution property •  Hash, Broadcast, Singleton, … •  Exchange operator to enforce distributions •  Hash: HashToRandomExchange •  Broadcast: BroadcastExchange •  Singleton: UnionExchange, SingleMergeExchange •  Enumerate all, use cost to pick best •  Merge Join vs Hash Join •  Partition-based join vs Broadcast-based join •  Streaming Aggregation vs Hash Aggregation •  Aggregation in one phase or two phases •  partial local aggregation followed by final aggregation HashToRandomExchange Sort Streaming-Aggregation Data Data Data