SlideShare une entreprise Scribd logo
1  sur  34
Confidential © 2014 Actian Corporation1
Highest Performing SQL-in-Hadoop
Announcing “Project Vortex”
Peter Boncz Database Systems Researcher & Actian Chief Technical Advisor
MonetDB architect & Vectorwise founder
Hadoop Summit - San Jose, June 3 2014
Confidential © 2014 Actian Corporation2
History of Vectorwise and Vector Processing
Survey of SQL-on-Hadoop Approaches
Vortex (Actian Vector) Architecture
Benchmark Results
Roadmap
Agenda
Confidential © 2014 Actian Corporation3
MonetDB X100 engine
1994 2004
Start of new
wave of
analytic
DBMS
High
performance
DBMS using
CPU cache
optimizations
Vector Database Processing Timeline
Column-store
pioneer
Vector execution
model
Confidential © 2014 Actian Corporation4
Typical RDBMS: tuple-at-a-time iterator
Query
SELECT
name,
salary*.19 AS tax
FROM
employee
WHERE
age < 25
SCAN
SELECT
PROJECT
30000 john40
next()
next()
next()
10000 carl20
10000 carl20
10000 carl20
1900 carl20
30000 john40
Confidential © 2014 Actian Corporation5
“Vectorized Query Execution”
Vector contains
data of multiple
tuples (~100)
All primitives are
“vectorized”
Effect: much less
Iterator.next()
and primitive calls.
Confidential © 2014 Actian Corporation6
Why Vectors Are Better
Column slices to
represent in-flow data
NOT:
Vertical is a better table storage
layout than horizontal
(though we still think it often is)
RATIONALE:
- Simple array operations are
well-supported by compilers
No record layout complexities
- SIMD friendly layout
- Assumed cache-resident
Confidential © 2014 Actian Corporation7
Vectorized “Primitives” (basic methods)
int
map_mul_flt_col_flt_val (
int *res,
int *col,
int val, int n)
{
for(int i=0; i<n; i++)
res[i] = col[i]*val;
}
Many primitives
take just
1-6 cycles per tuple
High IPC.
Get SIMD out of the
box.
No instruction or data
cache misses.
10-100x faster than
Tuple-at-a-time
Confidential © 2014 Actian Corporation9
MonetDB X100 engine Vectorwise
1994 2004 2010
Start of new
wave of
analytic
DBMS
High
performance
DBMS using
CPU cache
optimizations
Vectorwise
blows the top
off TPC
benchmarks
Vector Database Processing Timeline
Column-store
pioneer
Vector execution
model
Actian launches 1st
commercial vector
processing DBMS
Confidential © 2014 Actian Corporation10
PhD thesis of Spyros Blanas (2013)
Confidential © 2014 Actian Corporation11
the relational
industry is
trying to adopt
vector processing...
Confidential © 2014 Actian Corporation12
Actian Vector: Often copied, never surpassed..
Confidential © 2014 Actian Corporation13
MonetDB X100 engine Vectorwise
1994 2004 2010
Start of new
wave of
analytic
DBMS
High
performance
DBMS using
CPU cache
optimizations
Vectorwise
blows the top
off TPC
benchmarks
Vector Database Processing Timeline
Column-store
pioneer
Vector execution
model
Actian launches 1st
commercial vector
processing DBMS
2012
SQL on
Hadoop
Introduction of SQL
access to Hadoop
data
Immature, not
optimized, not
enterprise-
ready
Confidential © 2014 Actian Corporation14
Hive gets
it too!
Confidential © 2014 Actian Corporation15
MonetDB X100 engine Vectorwise
1994 2004 2010
Start of new
wave of
analytic
DBMS
High
performance
DBMS using
CPU cache
optimizations
Vectorwise
blows the top
off TPC
benchmarks
Actian Introduces
“Project Vortex”
2014
Vector Database Processing Timeline
Column-store
pioneer
Vector execution
model
Actian launches 1st
commercial vector
processing DBMS
2012
SQL on
Hadoop
Introduction of SQL
access to Hadoop
data
Immature, not
optimized, not
enterprise-
ready
Vectorwise built
natively into Hadoop
Highest
performing,
SQL compliant
DBMS running
inside Hadoop
Confidential © 2014 Actian Corporation16
Big Data processing pipelines on Hadoop
■ Unstructured  Structured
■ Unstructured: Data Mining, Pattern Matching (MapReduce)
■ Structured: Cleaner data, bulk loads into warehouse
■ Do we have to buy/manage two clusters??
1. Hadoop/MapReduce
2. MPP SQL warehouse
The case for SQL on Hadoop:
■ Reduced hardware cost (1 cluster)
■ Agile: no more data copying data between Hadoop and SQL
■ Broaden access to Hadoop data through a wealth of SQL apps
■ Standardize cluster admin skills on Hadoop (human resources)
SQL on Hadoop
Confidential © 2014 Actian Corporation17
SQL Outside Hadoop
■ MPP DB  need 2 clusters
■ Connector approach (data copy)
Mature but Limited/Slow
■ Slow legacy query engine (e.g. PostgreSQL)
■ Limited HDFS integration (no deletes,updates)
Integrated but Immature
■ Immature/poor optimizers+engines
■ Incomplete SQL support, no delete/updates,
I18N, security, workload mgmt, access control?
Vendor Approaches to “SQL on Hadoop”
“outside Hadoop”
“wrapped legacy”
“from scratch”
Confidential © 2014 Actian Corporation18
“wrapped
legacy”
“from scratch”
SQL
Maturity
(performance+features)
Hadoop Integration
“SQL on Hadoop” Vendor Landscape
Low Native
High
“outside Hadoop”
Most Mature &
Integrated SQL
Confidential © 2014 Actian Corporation20
“Project Vortex”: Actian Vector in Hadoop
First industry-strength analytical RDBMS “made for Hadoop”
Key Features
compressed vector data formats work natively on HDFS
the most efficient query engine on the market
easily configurable and maintainable MPP system
very high bulk-load performance
full SQL functionality
mature query optimizer
HDFS (append-only) and compressed columnar storage are friends
Vectorized, leading single-server TPC-H for years
Relies solely on Hadoop for system administration.
Partitioned table support and fully parallel loading
Incl. access control, analytic/window functions, complete SQL APIs
Enhanced with advanced distributed parallel execution for scale-up/out
Confidential © 2014 Actian Corporation21
“Project Vortex”: Actian Vector in Hadoop
Hadoop Features in Development:
Automatic HDFS block placement
Direct Querying on Hadoop data formats
Support for full fine-grained trickle updates (insert/delete/modify)
YARN integration
Elastic resource management
Leveraging replication, always HDFS shortcut reads also after nodes fail.
Co-existence of MapReduce and DBMS, avoiding stragglers
Thanks to patented delta update structure (Positional Delta Trees)
Text, Parquet, ORCfile
Workload-driven scaling up&down in 40 steps from 2.5% to 100%
Confidential © 2014 Actian Corporation22
Project Vortex: Architecture
Single SQL frontend connect point
■ Does not store any data
■ Can be outside Hadoop cluster
■ Can be an existing Vector installation
■ Many “worker” data nodes (X100 backend) on Hadoop cluster
■ This collection of compute nodes is called the “worker set”
■ MPI communications, all-to-all
Worker Set
■ Subset of Hadoop cluster, can be shrunk/enlarged without data copy
■ Computer Nodes in worker set should have roughly equal resources
■ Any can coordinate query execution (session master)
Confidential © 2014 Actian Corporation23
Vortex
“worker-set”
YARN
name
node
Vortex Architecture
session master
X100
backend
X100
backend
X100
backend
X100
backend
X100
backend
SQL
frontend
query
plan
X100
backend
data nodes
processes running on the worker set
all-to-all MPI data communications
Actian Director
for ManagementSQL
Confidential © 2014 Actian Corporation24
Project Vortex: Storage
Data Format
■ Vector native compressed data formats with fast decompression
■ MinMax indexes stored separately (allow to avoid reading data blocks)
■ HDFS block placement: we decide were the replicas are
■ Tables are either hash-partitioned or global (i.e. non-partitioned)
Global File System
■ All I/O is through HDFS
■ Achieved in an append-only file system!
■ Any worker can read any table partition
■ Responsibilities for handling partitions is decided at session start
■ Optimization algorithm assigns partitions to nodes that have the file local
■ 100% HDFS “shortcut reads”, also when the node that wrote the partition is down
Confidential © 2014 Actian Corporation25
p1
p2
p3 p2
p4
p5 p4
p6
p1
p6
p1
p3 p5
p3
p2p4
p5
Vortex
“worker-set”
p6
YARN
WAL
WAL
WAL
name
node
g
g
g
Vortex Architecture
p6
p1
p2
p3
p4
p5
session master
partitioned table
X100
backend
X100
backend
X100
backend
X100
backend
X100
backend
SQL
frontend
HDFS“shortcutreads”
query
plan
X100
backend
HDFSblockplacementhints
g
global table
write ahead log
WAL
data nodes
processes running on the worker set
all-to-all MPI data communications
Actian Director
for ManagementSQL
Confidential © 2014 Actian Corporation26
Project Vortex: Minimizing Network Traffic
Storage
■ Co-located partitions (local partitioned hash-joins)
■ Replicated tables (local shared-HashTable hash-joins)
■ Co-partitioned clustered indexes (local merge-joins)
■ MinMax indexes for predicate pushdown (correlates over merge-joins)
Parallel Cost Model
■ Distributed joins, distributed query optimizer considers:
■ Both key-partitioned and shared (broadcast) HashJoin
■ Local broadcast HashJoin for replicated tables
■ Distributed GroupBy, distributed query optimizer considers:
■ Both key-partitioned and global re-aggregated GroupBy
■ Local early aggregation followed by partitioned aggregation
Confidential © 2014 Actian Corporation27
Project Vortex: Resource Management
YARN integration
■ Ask YARN which nodes are less busy, when enlarging the worker set
■ Inform YARN of our usage (CPU, memory) to prevent overload
■ Placeholder processes to decrease and increase YARN resources
Workload management
■ Workload monitoring to gradually determine Hadoop footprint
■ Choose (# cores, RAM) for each query, given the current footprint
■ Choose to involve all or just the minimal subset of workers
Elasticity
■ Scale down to minimal subset of nodes, one core each
■ Scale up to all nodes, all cores
Confidential © 2014 Actian Corporation28
p1
p2
p3 p2
p4
p5 p4
p6
p1
p6
p1
p3 p5
p3
p2p4
p5
Vortex
“worker-set”
p6
minimal YARN footprint maximal YARN footprint
YARN
WAL
WAL
WAL
name
node
g
g
g
Vortex Architecture
p6
p1
p2
p3
p4
p5
session master
partitioned table
X100
backend
X100
backend
X100
backend
X100
backend
X100
backend
SQL
frontend
HDFS“shortcutreads”
Hadoop & Vortex resource info
query
plan
X100
backend
HDFSblockplacementhints
g
global table
write ahead log
WAL
data nodes
processes running on the worker set
all-to-all MPI data communications
Actian Director
for ManagementSQL
Confidential © 2014 Actian Corporation29
Project Vortex: Data Ingestion
Bulk-load
■ Fast Parallel Loader, executes in parallel on all worker nodes
■ SQL COMBINE statement to add and remove data in bulk
■ Text and Parquet readers including nested records (under development)
Updates (DML)
■ Support for Insert, Modify, Delete, Upsert
■ Modify, Deleted, Upsert use Positional Delta Trees (PDTs)
■ Changes get sent to master who emits Write Ahead Log (WAL)
■ At startup, workers only load PDTs for their partitions from WAL
■ Partitioned Tables partition DML to all nodes in worker set
■ Replicated Tables execute DML on the session master
■ Session master broadcasts all PDT changes to all worker nodes
Confidential © 2014 Actian Corporation30
Positional Delta Trees (PDTs)
INSERT INTO inventory VALUES(‘Berlin’, ‘table’, Y, 10)
INSERT INTO inventory VALUES(‘Berlin’, ‘cloth’, Y, 20)
INSERT INTO inventory VALUES(‘Berlin’, ‘chair’, Y, 5)
0
2 1
SID
∆
0 0
ins ins
(Berlin,
chair,
Y,5)
(Berlin,
cloth,
Y, 20)
SID
type
value
0
ins
(Berlin,
table,
Y,10)
SID
type
value
SID STORE PROD NEW QTY RID
0 London chair N 30 0
1 London stool N 10 1
2 London table N 20 2
3 Paris rug N 1 3
4 Paris stool N 5 4
TABLE0
“Positional Update Handling in Column Stores” – SIGMOD 2010
PDTs enable fine-grained updates on append-only data (HDFS)
Confidential © 2014 Actian Corporation31
Vortex vs Impala: how much faster?
Background to “Impala Subset “of TPC-DS benchmark can be found here:
http://blog.cloudera.com/blog/2014/01/impala-performance-dbms-class-speed/
Both Executed on the Same Hardware and Software Environment:
5 nodes:16core, 32thread, 2.4GHz, 64GB RAM, 2x1TB drives, 2x10Gb Ethernet.
Non-Disclosure – Under Embargo Until Public Launch Date: June 3, 2014
q3 q7 q19 q27 q34 q42 q43 q46 q52 q53 q55 q59 q63 q65 q68 q73 q79 q89 q98
Avg: 14x faster
5x
10x
15x
20x
25x
Confidential © 2014 Actian Corporation32
Vortex vs. other “native” Products
Young systems (Hive, Impala, Presto)
■ Significantly lower performance
■ Incomplete SQL (window functions, correlated subqueries, views)
■ No trickle updates (or just bulk load), not always ACID
■ Immature Query Optimizer, authentication access control, I18N, workload
management, APIs, validated SQL apps
 Vortex
 Ultimate SQL on Hadoop Performance
 The fastest analytical query engine in town comes to Hadoop
 Lots of Parallel Query optimization (min. network bandwidth usage)
 Superior Hadoop Integration
 Optimized HDFS block placement
 YARN integration, Elasticity
Confidential © 2014 Actian Corporation33
“Project Vortex” Timeline
Actian Vector in Hadoop - Preview Edition Available
■ Send request to info@actian.com
End of June: initial release
■ Good performance on medium-sized clusters
■ Core Actian DataFlow integration
Fall 2014: second release
■ Trickle update functionality
■ Performance and scalability optimizations
■ HDFS block placement
■ YARN dynamic resource management
Confidential © 2014 Actian Corporation34
Visit the Actian booth #P6 in the expo area!
■ Get a copy of the Project Vortex Technical White Paper
■ See a live product demo of Vortex vs Impala
■ Meet the Actian “Vortex” developers
Learn More…
Win a signed technical book!
■ signing
@16:00
Get a Big Data T-shirt!
Confidential © 2014 Actian Corporation35
Acknowledgements
homepages.cwi.nl/~boncz/msc/2012-AndreiCosteaAdrianIonescu.pdf
Adrian Ionescu
Andrei Costea
(plus the extended Actian Vector team)
Confidential © 2014 Actian Corporation36
www.actian.com
facebook.com/actiancorp
@actiancorp
Thank You

Contenu connexe

Tendances

Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big DataDataWorks Summit
 
2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data IntegrationJeffrey T. Pollock
 
How to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHow to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHortonworks
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...DataWorks Summit
 
Accelerating Big Data Insights
Accelerating Big Data InsightsAccelerating Big Data Insights
Accelerating Big Data InsightsDataWorks Summit
 
Meetup Oracle Database MAD_BCN: 1.2 Oracle Database 18c (autonomous database)
Meetup Oracle Database MAD_BCN: 1.2 Oracle Database 18c (autonomous database)Meetup Oracle Database MAD_BCN: 1.2 Oracle Database 18c (autonomous database)
Meetup Oracle Database MAD_BCN: 1.2 Oracle Database 18c (autonomous database)avanttic Consultoría Tecnológica
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoophuguk
 
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters Hortonworks
 
How to Operationalise Real-Time Hadoop in the Cloud
How to Operationalise Real-Time Hadoop in the CloudHow to Operationalise Real-Time Hadoop in the Cloud
How to Operationalise Real-Time Hadoop in the CloudAttunity
 
Built-In Security for the Cloud
Built-In Security for the CloudBuilt-In Security for the Cloud
Built-In Security for the CloudDataWorks Summit
 
Scaling self service on Hadoop
Scaling self service on HadoopScaling self service on Hadoop
Scaling self service on HadoopDataWorks Summit
 
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerLogical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerDataWorks Summit
 
YARN - Past, Present, & Future
YARN - Past, Present, & FutureYARN - Past, Present, & Future
YARN - Past, Present, & FutureDataWorks Summit
 
Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL S...
Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL S...Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL S...
Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL S...Rui Quintino
 
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...Mark Rittman
 
Breaching the 100TB Mark with SQL Over Hadoop
Breaching the 100TB Mark with SQL Over HadoopBreaching the 100TB Mark with SQL Over Hadoop
Breaching the 100TB Mark with SQL Over HadoopDataWorks Summit
 
Ingesting Data at Blazing Speed Using Apache Orc
Ingesting Data at Blazing Speed Using Apache OrcIngesting Data at Blazing Speed Using Apache Orc
Ingesting Data at Blazing Speed Using Apache OrcDataWorks Summit
 
Data in the Cloud Crash Course
Data in the Cloud Crash CourseData in the Cloud Crash Course
Data in the Cloud Crash CourseDataWorks Summit
 
Format Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and ParquetFormat Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and ParquetDataWorks Summit
 

Tendances (20)

Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big Data
 
2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration
 
How to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHow to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDB
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...
 
Accelerating Big Data Insights
Accelerating Big Data InsightsAccelerating Big Data Insights
Accelerating Big Data Insights
 
Meetup Oracle Database MAD_BCN: 1.2 Oracle Database 18c (autonomous database)
Meetup Oracle Database MAD_BCN: 1.2 Oracle Database 18c (autonomous database)Meetup Oracle Database MAD_BCN: 1.2 Oracle Database 18c (autonomous database)
Meetup Oracle Database MAD_BCN: 1.2 Oracle Database 18c (autonomous database)
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
 
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
 
How to Operationalise Real-Time Hadoop in the Cloud
How to Operationalise Real-Time Hadoop in the CloudHow to Operationalise Real-Time Hadoop in the Cloud
How to Operationalise Real-Time Hadoop in the Cloud
 
Built-In Security for the Cloud
Built-In Security for the CloudBuilt-In Security for the Cloud
Built-In Security for the Cloud
 
Apache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real TimeApache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real Time
 
Scaling self service on Hadoop
Scaling self service on HadoopScaling self service on Hadoop
Scaling self service on Hadoop
 
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerLogical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
 
YARN - Past, Present, & Future
YARN - Past, Present, & FutureYARN - Past, Present, & Future
YARN - Past, Present, & Future
 
Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL S...
Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL S...Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL S...
Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL S...
 
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
 
Breaching the 100TB Mark with SQL Over Hadoop
Breaching the 100TB Mark with SQL Over HadoopBreaching the 100TB Mark with SQL Over Hadoop
Breaching the 100TB Mark with SQL Over Hadoop
 
Ingesting Data at Blazing Speed Using Apache Orc
Ingesting Data at Blazing Speed Using Apache OrcIngesting Data at Blazing Speed Using Apache Orc
Ingesting Data at Blazing Speed Using Apache Orc
 
Data in the Cloud Crash Course
Data in the Cloud Crash CourseData in the Cloud Crash Course
Data in the Cloud Crash Course
 
Format Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and ParquetFormat Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and Parquet
 

Similaire à Actian Vector on Hadoop: First Industrial-strength DBMS to Truly Leverage Hadoop

Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformHortonworks
 
Big Data Education Webcast: Introducing DMX and DMX-h Release 8
Big Data Education Webcast: Introducing DMX and DMX-h Release 8Big Data Education Webcast: Introducing DMX and DMX-h Release 8
Big Data Education Webcast: Introducing DMX and DMX-h Release 8Precisely
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanJim Kaskade
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014cdmaxime
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoopmarkgrover
 
Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX ...
Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX ...Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX ...
Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX ...Precisely
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeNicolas Morales
 
Running Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale PlatformRunning Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale PlatformInMobi Technology
 
Real-Time Data Loading from MySQL to Hadoop
Real-Time Data Loading from MySQL to HadoopReal-Time Data Loading from MySQL to Hadoop
Real-Time Data Loading from MySQL to HadoopContinuent
 
Hadoop Now, Next and Beyond
Hadoop Now, Next and BeyondHadoop Now, Next and Beyond
Hadoop Now, Next and BeyondDataWorks Summit
 
Replication in real-time from Oracle and MySQL into data warehouses and analy...
Replication in real-time from Oracle and MySQL into data warehouses and analy...Replication in real-time from Oracle and MySQL into data warehouses and analy...
Replication in real-time from Oracle and MySQL into data warehouses and analy...Continuent
 
Real-time Data Loading from Oracle and MySQL to Data Warehouses, Analytics
Real-time Data Loading from Oracle and MySQL to Data Warehouses, AnalyticsReal-time Data Loading from Oracle and MySQL to Data Warehouses, Analytics
Real-time Data Loading from Oracle and MySQL to Data Warehouses, AnalyticsContinuent
 
Replication in real-time from Oracle and MySQL into data warehouses and analy...
Replication in real-time from Oracle and MySQL into data warehouses and analy...Replication in real-time from Oracle and MySQL into data warehouses and analy...
Replication in real-time from Oracle and MySQL into data warehouses and analy...Continuent
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemShivaji Dutta
 
TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de al...
TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de al...TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de al...
TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de al...tdc-globalcode
 
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...Amazon Web Services
 
Couchbase and Apache Spark
Couchbase and Apache SparkCouchbase and Apache Spark
Couchbase and Apache SparkMatt Ingenthron
 
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceStorage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceChris Nauroth
 

Similaire à Actian Vector on Hadoop: First Industrial-strength DBMS to Truly Leverage Hadoop (20)

Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
 
Big Data Education Webcast: Introducing DMX and DMX-h Release 8
Big Data Education Webcast: Introducing DMX and DMX-h Release 8Big Data Education Webcast: Introducing DMX and DMX-h Release 8
Big Data Education Webcast: Introducing DMX and DMX-h Release 8
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps Ironfan
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
 
Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX ...
Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX ...Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX ...
Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX ...
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor Landscape
 
Running Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale PlatformRunning Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale Platform
 
Real-Time Data Loading from MySQL to Hadoop
Real-Time Data Loading from MySQL to HadoopReal-Time Data Loading from MySQL to Hadoop
Real-Time Data Loading from MySQL to Hadoop
 
Hadoop Now, Next and Beyond
Hadoop Now, Next and BeyondHadoop Now, Next and Beyond
Hadoop Now, Next and Beyond
 
Replication in real-time from Oracle and MySQL into data warehouses and analy...
Replication in real-time from Oracle and MySQL into data warehouses and analy...Replication in real-time from Oracle and MySQL into data warehouses and analy...
Replication in real-time from Oracle and MySQL into data warehouses and analy...
 
Real-time Data Loading from Oracle and MySQL to Data Warehouses, Analytics
Real-time Data Loading from Oracle and MySQL to Data Warehouses, AnalyticsReal-time Data Loading from Oracle and MySQL to Data Warehouses, Analytics
Real-time Data Loading from Oracle and MySQL to Data Warehouses, Analytics
 
Replication in real-time from Oracle and MySQL into data warehouses and analy...
Replication in real-time from Oracle and MySQL into data warehouses and analy...Replication in real-time from Oracle and MySQL into data warehouses and analy...
Replication in real-time from Oracle and MySQL into data warehouses and analy...
 
Hortonworks.bdb
Hortonworks.bdbHortonworks.bdb
Hortonworks.bdb
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 
TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de al...
TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de al...TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de al...
TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de al...
 
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
 
Munich HUG 21.11.2013
Munich HUG 21.11.2013Munich HUG 21.11.2013
Munich HUG 21.11.2013
 
Couchbase and Apache Spark
Couchbase and Apache SparkCouchbase and Apache Spark
Couchbase and Apache Spark
 
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceStorage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduce
 

Plus de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Plus de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Dernier

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Dernier (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Actian Vector on Hadoop: First Industrial-strength DBMS to Truly Leverage Hadoop

  • 1. Confidential © 2014 Actian Corporation1 Highest Performing SQL-in-Hadoop Announcing “Project Vortex” Peter Boncz Database Systems Researcher & Actian Chief Technical Advisor MonetDB architect & Vectorwise founder Hadoop Summit - San Jose, June 3 2014
  • 2. Confidential © 2014 Actian Corporation2 History of Vectorwise and Vector Processing Survey of SQL-on-Hadoop Approaches Vortex (Actian Vector) Architecture Benchmark Results Roadmap Agenda
  • 3. Confidential © 2014 Actian Corporation3 MonetDB X100 engine 1994 2004 Start of new wave of analytic DBMS High performance DBMS using CPU cache optimizations Vector Database Processing Timeline Column-store pioneer Vector execution model
  • 4. Confidential © 2014 Actian Corporation4 Typical RDBMS: tuple-at-a-time iterator Query SELECT name, salary*.19 AS tax FROM employee WHERE age < 25 SCAN SELECT PROJECT 30000 john40 next() next() next() 10000 carl20 10000 carl20 10000 carl20 1900 carl20 30000 john40
  • 5. Confidential © 2014 Actian Corporation5 “Vectorized Query Execution” Vector contains data of multiple tuples (~100) All primitives are “vectorized” Effect: much less Iterator.next() and primitive calls.
  • 6. Confidential © 2014 Actian Corporation6 Why Vectors Are Better Column slices to represent in-flow data NOT: Vertical is a better table storage layout than horizontal (though we still think it often is) RATIONALE: - Simple array operations are well-supported by compilers No record layout complexities - SIMD friendly layout - Assumed cache-resident
  • 7. Confidential © 2014 Actian Corporation7 Vectorized “Primitives” (basic methods) int map_mul_flt_col_flt_val ( int *res, int *col, int val, int n) { for(int i=0; i<n; i++) res[i] = col[i]*val; } Many primitives take just 1-6 cycles per tuple High IPC. Get SIMD out of the box. No instruction or data cache misses. 10-100x faster than Tuple-at-a-time
  • 8. Confidential © 2014 Actian Corporation9 MonetDB X100 engine Vectorwise 1994 2004 2010 Start of new wave of analytic DBMS High performance DBMS using CPU cache optimizations Vectorwise blows the top off TPC benchmarks Vector Database Processing Timeline Column-store pioneer Vector execution model Actian launches 1st commercial vector processing DBMS
  • 9. Confidential © 2014 Actian Corporation10 PhD thesis of Spyros Blanas (2013)
  • 10. Confidential © 2014 Actian Corporation11 the relational industry is trying to adopt vector processing...
  • 11. Confidential © 2014 Actian Corporation12 Actian Vector: Often copied, never surpassed..
  • 12. Confidential © 2014 Actian Corporation13 MonetDB X100 engine Vectorwise 1994 2004 2010 Start of new wave of analytic DBMS High performance DBMS using CPU cache optimizations Vectorwise blows the top off TPC benchmarks Vector Database Processing Timeline Column-store pioneer Vector execution model Actian launches 1st commercial vector processing DBMS 2012 SQL on Hadoop Introduction of SQL access to Hadoop data Immature, not optimized, not enterprise- ready
  • 13. Confidential © 2014 Actian Corporation14 Hive gets it too!
  • 14. Confidential © 2014 Actian Corporation15 MonetDB X100 engine Vectorwise 1994 2004 2010 Start of new wave of analytic DBMS High performance DBMS using CPU cache optimizations Vectorwise blows the top off TPC benchmarks Actian Introduces “Project Vortex” 2014 Vector Database Processing Timeline Column-store pioneer Vector execution model Actian launches 1st commercial vector processing DBMS 2012 SQL on Hadoop Introduction of SQL access to Hadoop data Immature, not optimized, not enterprise- ready Vectorwise built natively into Hadoop Highest performing, SQL compliant DBMS running inside Hadoop
  • 15. Confidential © 2014 Actian Corporation16 Big Data processing pipelines on Hadoop ■ Unstructured  Structured ■ Unstructured: Data Mining, Pattern Matching (MapReduce) ■ Structured: Cleaner data, bulk loads into warehouse ■ Do we have to buy/manage two clusters?? 1. Hadoop/MapReduce 2. MPP SQL warehouse The case for SQL on Hadoop: ■ Reduced hardware cost (1 cluster) ■ Agile: no more data copying data between Hadoop and SQL ■ Broaden access to Hadoop data through a wealth of SQL apps ■ Standardize cluster admin skills on Hadoop (human resources) SQL on Hadoop
  • 16. Confidential © 2014 Actian Corporation17 SQL Outside Hadoop ■ MPP DB  need 2 clusters ■ Connector approach (data copy) Mature but Limited/Slow ■ Slow legacy query engine (e.g. PostgreSQL) ■ Limited HDFS integration (no deletes,updates) Integrated but Immature ■ Immature/poor optimizers+engines ■ Incomplete SQL support, no delete/updates, I18N, security, workload mgmt, access control? Vendor Approaches to “SQL on Hadoop” “outside Hadoop” “wrapped legacy” “from scratch”
  • 17. Confidential © 2014 Actian Corporation18 “wrapped legacy” “from scratch” SQL Maturity (performance+features) Hadoop Integration “SQL on Hadoop” Vendor Landscape Low Native High “outside Hadoop” Most Mature & Integrated SQL
  • 18. Confidential © 2014 Actian Corporation20 “Project Vortex”: Actian Vector in Hadoop First industry-strength analytical RDBMS “made for Hadoop” Key Features compressed vector data formats work natively on HDFS the most efficient query engine on the market easily configurable and maintainable MPP system very high bulk-load performance full SQL functionality mature query optimizer HDFS (append-only) and compressed columnar storage are friends Vectorized, leading single-server TPC-H for years Relies solely on Hadoop for system administration. Partitioned table support and fully parallel loading Incl. access control, analytic/window functions, complete SQL APIs Enhanced with advanced distributed parallel execution for scale-up/out
  • 19. Confidential © 2014 Actian Corporation21 “Project Vortex”: Actian Vector in Hadoop Hadoop Features in Development: Automatic HDFS block placement Direct Querying on Hadoop data formats Support for full fine-grained trickle updates (insert/delete/modify) YARN integration Elastic resource management Leveraging replication, always HDFS shortcut reads also after nodes fail. Co-existence of MapReduce and DBMS, avoiding stragglers Thanks to patented delta update structure (Positional Delta Trees) Text, Parquet, ORCfile Workload-driven scaling up&down in 40 steps from 2.5% to 100%
  • 20. Confidential © 2014 Actian Corporation22 Project Vortex: Architecture Single SQL frontend connect point ■ Does not store any data ■ Can be outside Hadoop cluster ■ Can be an existing Vector installation ■ Many “worker” data nodes (X100 backend) on Hadoop cluster ■ This collection of compute nodes is called the “worker set” ■ MPI communications, all-to-all Worker Set ■ Subset of Hadoop cluster, can be shrunk/enlarged without data copy ■ Computer Nodes in worker set should have roughly equal resources ■ Any can coordinate query execution (session master)
  • 21. Confidential © 2014 Actian Corporation23 Vortex “worker-set” YARN name node Vortex Architecture session master X100 backend X100 backend X100 backend X100 backend X100 backend SQL frontend query plan X100 backend data nodes processes running on the worker set all-to-all MPI data communications Actian Director for ManagementSQL
  • 22. Confidential © 2014 Actian Corporation24 Project Vortex: Storage Data Format ■ Vector native compressed data formats with fast decompression ■ MinMax indexes stored separately (allow to avoid reading data blocks) ■ HDFS block placement: we decide were the replicas are ■ Tables are either hash-partitioned or global (i.e. non-partitioned) Global File System ■ All I/O is through HDFS ■ Achieved in an append-only file system! ■ Any worker can read any table partition ■ Responsibilities for handling partitions is decided at session start ■ Optimization algorithm assigns partitions to nodes that have the file local ■ 100% HDFS “shortcut reads”, also when the node that wrote the partition is down
  • 23. Confidential © 2014 Actian Corporation25 p1 p2 p3 p2 p4 p5 p4 p6 p1 p6 p1 p3 p5 p3 p2p4 p5 Vortex “worker-set” p6 YARN WAL WAL WAL name node g g g Vortex Architecture p6 p1 p2 p3 p4 p5 session master partitioned table X100 backend X100 backend X100 backend X100 backend X100 backend SQL frontend HDFS“shortcutreads” query plan X100 backend HDFSblockplacementhints g global table write ahead log WAL data nodes processes running on the worker set all-to-all MPI data communications Actian Director for ManagementSQL
  • 24. Confidential © 2014 Actian Corporation26 Project Vortex: Minimizing Network Traffic Storage ■ Co-located partitions (local partitioned hash-joins) ■ Replicated tables (local shared-HashTable hash-joins) ■ Co-partitioned clustered indexes (local merge-joins) ■ MinMax indexes for predicate pushdown (correlates over merge-joins) Parallel Cost Model ■ Distributed joins, distributed query optimizer considers: ■ Both key-partitioned and shared (broadcast) HashJoin ■ Local broadcast HashJoin for replicated tables ■ Distributed GroupBy, distributed query optimizer considers: ■ Both key-partitioned and global re-aggregated GroupBy ■ Local early aggregation followed by partitioned aggregation
  • 25. Confidential © 2014 Actian Corporation27 Project Vortex: Resource Management YARN integration ■ Ask YARN which nodes are less busy, when enlarging the worker set ■ Inform YARN of our usage (CPU, memory) to prevent overload ■ Placeholder processes to decrease and increase YARN resources Workload management ■ Workload monitoring to gradually determine Hadoop footprint ■ Choose (# cores, RAM) for each query, given the current footprint ■ Choose to involve all or just the minimal subset of workers Elasticity ■ Scale down to minimal subset of nodes, one core each ■ Scale up to all nodes, all cores
  • 26. Confidential © 2014 Actian Corporation28 p1 p2 p3 p2 p4 p5 p4 p6 p1 p6 p1 p3 p5 p3 p2p4 p5 Vortex “worker-set” p6 minimal YARN footprint maximal YARN footprint YARN WAL WAL WAL name node g g g Vortex Architecture p6 p1 p2 p3 p4 p5 session master partitioned table X100 backend X100 backend X100 backend X100 backend X100 backend SQL frontend HDFS“shortcutreads” Hadoop & Vortex resource info query plan X100 backend HDFSblockplacementhints g global table write ahead log WAL data nodes processes running on the worker set all-to-all MPI data communications Actian Director for ManagementSQL
  • 27. Confidential © 2014 Actian Corporation29 Project Vortex: Data Ingestion Bulk-load ■ Fast Parallel Loader, executes in parallel on all worker nodes ■ SQL COMBINE statement to add and remove data in bulk ■ Text and Parquet readers including nested records (under development) Updates (DML) ■ Support for Insert, Modify, Delete, Upsert ■ Modify, Deleted, Upsert use Positional Delta Trees (PDTs) ■ Changes get sent to master who emits Write Ahead Log (WAL) ■ At startup, workers only load PDTs for their partitions from WAL ■ Partitioned Tables partition DML to all nodes in worker set ■ Replicated Tables execute DML on the session master ■ Session master broadcasts all PDT changes to all worker nodes
  • 28. Confidential © 2014 Actian Corporation30 Positional Delta Trees (PDTs) INSERT INTO inventory VALUES(‘Berlin’, ‘table’, Y, 10) INSERT INTO inventory VALUES(‘Berlin’, ‘cloth’, Y, 20) INSERT INTO inventory VALUES(‘Berlin’, ‘chair’, Y, 5) 0 2 1 SID ∆ 0 0 ins ins (Berlin, chair, Y,5) (Berlin, cloth, Y, 20) SID type value 0 ins (Berlin, table, Y,10) SID type value SID STORE PROD NEW QTY RID 0 London chair N 30 0 1 London stool N 10 1 2 London table N 20 2 3 Paris rug N 1 3 4 Paris stool N 5 4 TABLE0 “Positional Update Handling in Column Stores” – SIGMOD 2010 PDTs enable fine-grained updates on append-only data (HDFS)
  • 29. Confidential © 2014 Actian Corporation31 Vortex vs Impala: how much faster? Background to “Impala Subset “of TPC-DS benchmark can be found here: http://blog.cloudera.com/blog/2014/01/impala-performance-dbms-class-speed/ Both Executed on the Same Hardware and Software Environment: 5 nodes:16core, 32thread, 2.4GHz, 64GB RAM, 2x1TB drives, 2x10Gb Ethernet. Non-Disclosure – Under Embargo Until Public Launch Date: June 3, 2014 q3 q7 q19 q27 q34 q42 q43 q46 q52 q53 q55 q59 q63 q65 q68 q73 q79 q89 q98 Avg: 14x faster 5x 10x 15x 20x 25x
  • 30. Confidential © 2014 Actian Corporation32 Vortex vs. other “native” Products Young systems (Hive, Impala, Presto) ■ Significantly lower performance ■ Incomplete SQL (window functions, correlated subqueries, views) ■ No trickle updates (or just bulk load), not always ACID ■ Immature Query Optimizer, authentication access control, I18N, workload management, APIs, validated SQL apps  Vortex  Ultimate SQL on Hadoop Performance  The fastest analytical query engine in town comes to Hadoop  Lots of Parallel Query optimization (min. network bandwidth usage)  Superior Hadoop Integration  Optimized HDFS block placement  YARN integration, Elasticity
  • 31. Confidential © 2014 Actian Corporation33 “Project Vortex” Timeline Actian Vector in Hadoop - Preview Edition Available ■ Send request to info@actian.com End of June: initial release ■ Good performance on medium-sized clusters ■ Core Actian DataFlow integration Fall 2014: second release ■ Trickle update functionality ■ Performance and scalability optimizations ■ HDFS block placement ■ YARN dynamic resource management
  • 32. Confidential © 2014 Actian Corporation34 Visit the Actian booth #P6 in the expo area! ■ Get a copy of the Project Vortex Technical White Paper ■ See a live product demo of Vortex vs Impala ■ Meet the Actian “Vortex” developers Learn More… Win a signed technical book! ■ signing @16:00 Get a Big Data T-shirt!
  • 33. Confidential © 2014 Actian Corporation35 Acknowledgements homepages.cwi.nl/~boncz/msc/2012-AndreiCosteaAdrianIonescu.pdf Adrian Ionescu Andrei Costea (plus the extended Actian Vector team)
  • 34. Confidential © 2014 Actian Corporation36 www.actian.com facebook.com/actiancorp @actiancorp Thank You

Notes de l'éditeur

  1. internationalization
  2. internationalization
  3. Execution Subset of TPC-DS as chosen by Impala Data size is 3TB (SF3000) Executed on 5-node “rushcluster” in Austin Both Impala and Vector numbers are on the same hardware Comparison with Impala Verified that Impala plans are sensible Currently observed average speedup is 11x Optimal query plans (manually written) gives us 16x speedup These are real numbers! We executed manual plans directly Changes in the cost model would get us to this performance Performance improvements Cost model changes will get us to 16x speedup Pipeline of query execution changes Well into H2 Estimated to get us 2x improvement So, estimated speedup vs Impala would be ~30x (no guarantees) Planning to run TPC-H SF1000 and SF3000 With all planned improvements (end of the year) we should be able to beat the EXASOL cluster numbers.