SlideShare une entreprise Scribd logo
1  sur  25
It takes two to tango! Is SQL-on-Hadoop the next big step?
Big Data Crunching A Retrospective
Three Phases
What was it like before Hadoop?
ThePhylogeneticTreeofElephants
Partitioned or Sharded RDBMSs
Data Warehouses
Massively Parallel Databases
Tech before Hadoop
Massively Parallel Databases
Shared Nothing Architecture
Hadoop - Early days
Acceptance Life Cycle
Acceptance
Exploration
Resistance
Complementary over Competitive
Split by Structure
What’s the best way to answer questions that span these
two worlds?
Can we interface SQL atop Hadoop?
Can we combine the strengths of parallel databases with
those of Hadoop?
SQL-on-Hadoop : Technology
Distributed Query Processing
Cloudera’s Impala
MapR supported Apache Drill and more..
Split Query Processing
Microsoft Polybase
Hadapt
SQL-on-Hadoop : Technical Approaches
Faster Hive
Hortonworks’ Stinger initiative
Qubole’s Hive-on-the-Cloud
Distributed Query Processing
Cloudera Impala : Architecture
Clients
Impala Shell JDBC/ODBC Client SQL Tools
Data Node Data Node
Impala Daemon Impala Daemon Impala Daemon
Data Node
Query Execution
Query Planning
Query Coordination
Query Execution
Query Planning
Query Coordination
Query Execution
Query Planning
Query Coordination
State StoreMetadata Catalog HDFS Name Node
Unified Metadata Store
Life Cycle of an Impala Query
Clients
Impala Shell JDBC/ODBC Client SQL Tools
Impala Daemon
Data Node
State StoreMetadata Catalog HDFS Name Node
Impala Daemon
Data Node
Impala Daemon
Data Node
Impala Daemon
Data Node
Coordinate Execution
Plan and Optimize
Parse Query
Split Query Processing
Polybase + PDW : Architecture
Clients
ADO.NET JDBC/ODBC Client OLEDB
PDW Engine Service DMS Controller Loader Manager SQL Server
HDFS Bridge
Compute Node
Data Move Service
SQL Server
Job Tracker
Hadoop Cluster
Name Node
Data Node
Task Tracker
Data Node
Task Tracker
Data Node
Task Tracker
PDW Cluster
SQL Server
Compute Node
Data Move Service
HDFS Bridge
Compute Node
Data Move Service
SQL Server
SQL Server
Compute Node
Data Move Service
SQL Server PDW : Architecture
Control Node
CREATE HADOOP_CLUSTER GSL_CLUSTER WITH
(namenode=‘hadoop-head’,namenode_port=9000,
jobtracker=‘hadoop-head’,jobtracker_port=9010);
Register the Hadoop Cluster with PDW
Map HDFS File to External Tables in PDW
CREATE EXTERNAL TABLE hdfsCustomer
( c_custkey!! bigint not null,
c_name!! varchar(25) not null,
c_address!! varchar(40) not null,
c_nationkey! integer not null,
c_phone! ! char(15) not null,
c_acctbal!! decimal(15,2) not null,
c_mktsegment! char(10) not null,
c_comment!! varchar(117) not null)
WITH (LOCATION='/tpch1gb/customer.tbl',
FORMAT_OPTIONS (EXTERNAL_CLUSTER = GSL_CLUSTER,
EXTERNAL_FILEFORMAT = TEXT_FORMAT));
Life Cycle of a Split Query
Clients
ADO.NET JDBC/ODBC Client OLEDB
Loader Manager
Control Node
DMS Controller
Engine Service SQL Server
HDFS Bridge
Compute Node
Data Move Service
SQL Server
Hadoop Cluster
Data Node
Task Tracker
Data Node
Task Tracker
Data Node
Task Tracker
PDW Cluster
HDFS Bridge
Compute Node
Data Move Service
SQL Server
Plan
Job Tracker
Name Node
Data Node
Task Tracker
SQL-on-Hadoop : The Technology
Faster Hive
Distributed Query Processors
Split Query Processors
SQL-on-Hadoop or Map Reduce?
</presentation>
More on
www.systemswemake.com
Follow : @systems_we_make

Contenu connexe

Tendances

Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
Modern Data Stack France
 
Scalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex GryzlovScalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex Gryzlov
Vasil Remeniuk
 
Hw09 Sqoop Database Import For Hadoop
Hw09   Sqoop Database Import For HadoopHw09   Sqoop Database Import For Hadoop
Hw09 Sqoop Database Import For Hadoop
Cloudera, Inc.
 
Where does hadoop come handy
Where does hadoop come handyWhere does hadoop come handy
Where does hadoop come handy
Praveen Sripati
 

Tendances (20)

Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
Setting High Availability in Hadoop Cluster
Setting High Availability in Hadoop ClusterSetting High Availability in Hadoop Cluster
Setting High Availability in Hadoop Cluster
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
SQOOP - RDBMS to Hadoop
SQOOP - RDBMS to HadoopSQOOP - RDBMS to Hadoop
SQOOP - RDBMS to Hadoop
 
An introduction to Apache Hadoop Hive
An introduction to Apache Hadoop HiveAn introduction to Apache Hadoop Hive
An introduction to Apache Hadoop Hive
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solutionHadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solution
 
SQL et in-memory sur Hadoop avec Pivotal et HAWQ
SQL et in-memory sur Hadoop avec Pivotal et HAWQSQL et in-memory sur Hadoop avec Pivotal et HAWQ
SQL et in-memory sur Hadoop avec Pivotal et HAWQ
 
Scalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex GryzlovScalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex Gryzlov
 
Big Data Journey
Big Data JourneyBig Data Journey
Big Data Journey
 
Power Hadoop Cluster with AWS Cloud
Power Hadoop Cluster with AWS CloudPower Hadoop Cluster with AWS Cloud
Power Hadoop Cluster with AWS Cloud
 
HW09 Hadoop Vaidya
HW09 Hadoop VaidyaHW09 Hadoop Vaidya
HW09 Hadoop Vaidya
 
The Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemThe Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop Ecosystem
 
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologies
 
Hadoop overview
Hadoop overviewHadoop overview
Hadoop overview
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guide
 
Hw09 Sqoop Database Import For Hadoop
Hw09   Sqoop Database Import For HadoopHw09   Sqoop Database Import For Hadoop
Hw09 Sqoop Database Import For Hadoop
 
Big Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemBig Data and Hadoop Ecosystem
Big Data and Hadoop Ecosystem
 
Where does hadoop come handy
Where does hadoop come handyWhere does hadoop come handy
Where does hadoop come handy
 

En vedette (6)

W - Recession & Depression in 28 different languages
W - Recession & Depression in 28 different languagesW - Recession & Depression in 28 different languages
W - Recession & Depression in 28 different languages
 
Reinventando a Colmeia
Reinventando a ColmeiaReinventando a Colmeia
Reinventando a Colmeia
 
Information Retrieval with Open Source
Information Retrieval with Open SourceInformation Retrieval with Open Source
Information Retrieval with Open Source
 
Alpha Kappa Psi Professional Development Workshop. Interviewing Made Easy
Alpha Kappa Psi Professional Development Workshop. Interviewing Made EasyAlpha Kappa Psi Professional Development Workshop. Interviewing Made Easy
Alpha Kappa Psi Professional Development Workshop. Interviewing Made Easy
 
Diigo Presentation
Diigo PresentationDiigo Presentation
Diigo Presentation
 
Q - The House Of Slaves
Q - The House Of SlavesQ - The House Of Slaves
Q - The House Of Slaves
 

Similaire à It takes two to tango! : Is SQL-on-Hadoop the next big step?

SQL on Hadoop: Defining the New Generation of Analytics Databases
SQL on Hadoop: Defining the New Generation of Analytics Databases  SQL on Hadoop: Defining the New Generation of Analytics Databases
SQL on Hadoop: Defining the New Generation of Analytics Databases
DataWorks Summit
 
Best Hadoop and Amazon Online Training
Best Hadoop and Amazon Online TrainingBest Hadoop and Amazon Online Training
Best Hadoop and Amazon Online Training
Samatha Kamuni
 
Hadoop and aws map reducecourse
Hadoop and aws map reducecourseHadoop and aws map reducecourse
Hadoop and aws map reducecourse
Samatha Kamuni
 

Similaire à It takes two to tango! : Is SQL-on-Hadoop the next big step? (20)

Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
Windows Azure HDInsight Service
Windows Azure HDInsight ServiceWindows Azure HDInsight Service
Windows Azure HDInsight Service
 
Hadoop and Big Data: Revealed
Hadoop and Big Data: RevealedHadoop and Big Data: Revealed
Hadoop and Big Data: Revealed
 
SQL on Hadoop: Defining the New Generation of Analytics Databases
SQL on Hadoop: Defining the New Generation of Analytics Databases  SQL on Hadoop: Defining the New Generation of Analytics Databases
SQL on Hadoop: Defining the New Generation of Analytics Databases
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
Hadoop Training in Hyderabad | Online Training
Hadoop Training in Hyderabad | Online TrainingHadoop Training in Hyderabad | Online Training
Hadoop Training in Hyderabad | Online Training
 
Modern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An OverviewModern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An Overview
 
Hive with HDInsight
Hive with HDInsightHive with HDInsight
Hive with HDInsight
 
7 Databases in 70 minutes
7 Databases in 70 minutes7 Databases in 70 minutes
7 Databases in 70 minutes
 
Hadoop and big data training
Hadoop and big data trainingHadoop and big data training
Hadoop and big data training
 
Microsoft's Hadoop Story
Microsoft's Hadoop StoryMicrosoft's Hadoop Story
Microsoft's Hadoop Story
 
Best Hadoop and Amazon Online Training
Best Hadoop and Amazon Online TrainingBest Hadoop and Amazon Online Training
Best Hadoop and Amazon Online Training
 
Hadoop and aws map reducecourse
Hadoop and aws map reducecourseHadoop and aws map reducecourse
Hadoop and aws map reducecourse
 
Sureh hadoop 3 years t
Sureh hadoop 3 years tSureh hadoop 3 years t
Sureh hadoop 3 years t
 
Apache drill
Apache drillApache drill
Apache drill
 
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
 
Big data or big deal
Big data or big dealBig data or big deal
Big data or big deal
 
Data Lakes on Public Cloud: Breaking Data Management Monoliths
Data Lakes on Public Cloud: Breaking Data Management MonolithsData Lakes on Public Cloud: Breaking Data Management Monoliths
Data Lakes on Public Cloud: Breaking Data Management Monoliths
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft Platform
 
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stackBig Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

It takes two to tango! : Is SQL-on-Hadoop the next big step?

  • 1.
  • 2. It takes two to tango! Is SQL-on-Hadoop the next big step?
  • 3. Big Data Crunching A Retrospective
  • 5. What was it like before Hadoop? ThePhylogeneticTreeofElephants
  • 6. Partitioned or Sharded RDBMSs Data Warehouses Massively Parallel Databases Tech before Hadoop
  • 7. Massively Parallel Databases Shared Nothing Architecture
  • 12. What’s the best way to answer questions that span these two worlds? Can we interface SQL atop Hadoop? Can we combine the strengths of parallel databases with those of Hadoop?
  • 14. Distributed Query Processing Cloudera’s Impala MapR supported Apache Drill and more.. Split Query Processing Microsoft Polybase Hadapt SQL-on-Hadoop : Technical Approaches Faster Hive Hortonworks’ Stinger initiative Qubole’s Hive-on-the-Cloud
  • 16. Cloudera Impala : Architecture Clients Impala Shell JDBC/ODBC Client SQL Tools Data Node Data Node Impala Daemon Impala Daemon Impala Daemon Data Node Query Execution Query Planning Query Coordination Query Execution Query Planning Query Coordination Query Execution Query Planning Query Coordination State StoreMetadata Catalog HDFS Name Node Unified Metadata Store
  • 17. Life Cycle of an Impala Query Clients Impala Shell JDBC/ODBC Client SQL Tools Impala Daemon Data Node State StoreMetadata Catalog HDFS Name Node Impala Daemon Data Node Impala Daemon Data Node Impala Daemon Data Node Coordinate Execution Plan and Optimize Parse Query
  • 19. Polybase + PDW : Architecture Clients ADO.NET JDBC/ODBC Client OLEDB PDW Engine Service DMS Controller Loader Manager SQL Server HDFS Bridge Compute Node Data Move Service SQL Server Job Tracker Hadoop Cluster Name Node Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker PDW Cluster SQL Server Compute Node Data Move Service HDFS Bridge Compute Node Data Move Service SQL Server SQL Server Compute Node Data Move Service SQL Server PDW : Architecture Control Node
  • 20. CREATE HADOOP_CLUSTER GSL_CLUSTER WITH (namenode=‘hadoop-head’,namenode_port=9000, jobtracker=‘hadoop-head’,jobtracker_port=9010); Register the Hadoop Cluster with PDW
  • 21. Map HDFS File to External Tables in PDW CREATE EXTERNAL TABLE hdfsCustomer ( c_custkey!! bigint not null, c_name!! varchar(25) not null, c_address!! varchar(40) not null, c_nationkey! integer not null, c_phone! ! char(15) not null, c_acctbal!! decimal(15,2) not null, c_mktsegment! char(10) not null, c_comment!! varchar(117) not null) WITH (LOCATION='/tpch1gb/customer.tbl', FORMAT_OPTIONS (EXTERNAL_CLUSTER = GSL_CLUSTER, EXTERNAL_FILEFORMAT = TEXT_FORMAT));
  • 22. Life Cycle of a Split Query Clients ADO.NET JDBC/ODBC Client OLEDB Loader Manager Control Node DMS Controller Engine Service SQL Server HDFS Bridge Compute Node Data Move Service SQL Server Hadoop Cluster Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker PDW Cluster HDFS Bridge Compute Node Data Move Service SQL Server Plan Job Tracker Name Node Data Node Task Tracker
  • 23. SQL-on-Hadoop : The Technology Faster Hive Distributed Query Processors Split Query Processors