SlideShare une entreprise Scribd logo
1  sur  25
It takes two to tango! Is SQL-on-Hadoop the next big step?
Big Data Crunching A Retrospective
Three Phases
What was it like before Hadoop?
ThePhylogeneticTreeofElephants
Partitioned or Sharded RDBMSs
Data Warehouses
Massively Parallel Databases
Tech before Hadoop
Massively Parallel Databases
Shared Nothing Architecture
Hadoop - Early days
Acceptance Life Cycle
Acceptance
Exploration
Resistance
Complementary over Competitive
Split by Structure
What’s the best way to answer questions that span these
two worlds?
Can we interface SQL atop Hadoop?
Can we combine the strengths of parallel databases with
those of Hadoop?
SQL-on-Hadoop : Technology
Distributed Query Processing
Cloudera’s Impala
MapR supported Apache Drill and more..
Split Query Processing
Microsoft Polybase
Hadapt
SQL-on-Hadoop : Technical Approaches
Faster Hive
Hortonworks’ Stinger initiative
Qubole’s Hive-on-the-Cloud
Distributed Query Processing
Cloudera Impala : Architecture
Clients
Impala Shell JDBC/ODBC Client SQL Tools
Data Node Data Node
Impala Daemon Impala Daemon Impala Daemon
Data Node
Query Execution
Query Planning
Query Coordination
Query Execution
Query Planning
Query Coordination
Query Execution
Query Planning
Query Coordination
State StoreMetadata Catalog HDFS Name Node
Unified Metadata Store
Life Cycle of an Impala Query
Clients
Impala Shell JDBC/ODBC Client SQL Tools
Impala Daemon
Data Node
State StoreMetadata Catalog HDFS Name Node
Impala Daemon
Data Node
Impala Daemon
Data Node
Impala Daemon
Data Node
Coordinate Execution
Plan and Optimize
Parse Query
Split Query Processing
Polybase + PDW : Architecture
Clients
ADO.NET JDBC/ODBC Client OLEDB
PDW Engine Service DMS Controller Loader Manager SQL Server
HDFS Bridge
Compute Node
Data Move Service
SQL Server
Job Tracker
Hadoop Cluster
Name Node
Data Node
Task Tracker
Data Node
Task Tracker
Data Node
Task Tracker
PDW Cluster
SQL Server
Compute Node
Data Move Service
HDFS Bridge
Compute Node
Data Move Service
SQL Server
SQL Server
Compute Node
Data Move Service
SQL Server PDW : Architecture
Control Node
CREATE HADOOP_CLUSTER GSL_CLUSTER WITH
(namenode=‘hadoop-head’,namenode_port=9000,
jobtracker=‘hadoop-head’,jobtracker_port=9010);
Register the Hadoop Cluster with PDW
Map HDFS File to External Tables in PDW
CREATE EXTERNAL TABLE hdfsCustomer
( c_custkey!! bigint not null,
c_name!! varchar(25) not null,
c_address!! varchar(40) not null,
c_nationkey! integer not null,
c_phone! ! char(15) not null,
c_acctbal!! decimal(15,2) not null,
c_mktsegment! char(10) not null,
c_comment!! varchar(117) not null)
WITH (LOCATION='/tpch1gb/customer.tbl',
FORMAT_OPTIONS (EXTERNAL_CLUSTER = GSL_CLUSTER,
EXTERNAL_FILEFORMAT = TEXT_FORMAT));
Life Cycle of a Split Query
Clients
ADO.NET JDBC/ODBC Client OLEDB
Loader Manager
Control Node
DMS Controller
Engine Service SQL Server
HDFS Bridge
Compute Node
Data Move Service
SQL Server
Hadoop Cluster
Data Node
Task Tracker
Data Node
Task Tracker
Data Node
Task Tracker
PDW Cluster
HDFS Bridge
Compute Node
Data Move Service
SQL Server
Plan
Job Tracker
Name Node
Data Node
Task Tracker
SQL-on-Hadoop : The Technology
Faster Hive
Distributed Query Processors
Split Query Processors
SQL-on-Hadoop or Map Reduce?
</presentation>
More on
www.systemswemake.com
Follow : @systems_we_make

Contenu connexe

Tendances

Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreModern Data Stack France
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop EcosystemLior Sidi
 
Setting High Availability in Hadoop Cluster
Setting High Availability in Hadoop ClusterSetting High Availability in Hadoop Cluster
Setting High Availability in Hadoop ClusterEdureka!
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introductionChirag Ahuja
 
An introduction to Apache Hadoop Hive
An introduction to Apache Hadoop HiveAn introduction to Apache Hadoop Hive
An introduction to Apache Hadoop HiveMike Frampton
 
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solutionHadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solutionEdureka!
 
SQL et in-memory sur Hadoop avec Pivotal et HAWQ
SQL et in-memory sur Hadoop avec Pivotal et HAWQSQL et in-memory sur Hadoop avec Pivotal et HAWQ
SQL et in-memory sur Hadoop avec Pivotal et HAWQModern Data Stack France
 
Scalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex GryzlovScalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex GryzlovVasil Remeniuk
 
Power Hadoop Cluster with AWS Cloud
Power Hadoop Cluster with AWS CloudPower Hadoop Cluster with AWS Cloud
Power Hadoop Cluster with AWS CloudEdureka!
 
The Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemThe Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemCloudera, Inc.
 
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesKelly Technologies
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideDanairat Thanabodithammachari
 
Hw09 Sqoop Database Import For Hadoop
Hw09   Sqoop Database Import For HadoopHw09   Sqoop Database Import For Hadoop
Hw09 Sqoop Database Import For HadoopCloudera, Inc.
 
Big Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemBig Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemRajkumar Singh
 
Where does hadoop come handy
Where does hadoop come handyWhere does hadoop come handy
Where does hadoop come handyPraveen Sripati
 

Tendances (20)

Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
Setting High Availability in Hadoop Cluster
Setting High Availability in Hadoop ClusterSetting High Availability in Hadoop Cluster
Setting High Availability in Hadoop Cluster
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
SQOOP - RDBMS to Hadoop
SQOOP - RDBMS to HadoopSQOOP - RDBMS to Hadoop
SQOOP - RDBMS to Hadoop
 
An introduction to Apache Hadoop Hive
An introduction to Apache Hadoop HiveAn introduction to Apache Hadoop Hive
An introduction to Apache Hadoop Hive
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solutionHadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solution
 
SQL et in-memory sur Hadoop avec Pivotal et HAWQ
SQL et in-memory sur Hadoop avec Pivotal et HAWQSQL et in-memory sur Hadoop avec Pivotal et HAWQ
SQL et in-memory sur Hadoop avec Pivotal et HAWQ
 
Scalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex GryzlovScalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex Gryzlov
 
Big Data Journey
Big Data JourneyBig Data Journey
Big Data Journey
 
Power Hadoop Cluster with AWS Cloud
Power Hadoop Cluster with AWS CloudPower Hadoop Cluster with AWS Cloud
Power Hadoop Cluster with AWS Cloud
 
HW09 Hadoop Vaidya
HW09 Hadoop VaidyaHW09 Hadoop Vaidya
HW09 Hadoop Vaidya
 
The Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemThe Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop Ecosystem
 
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologies
 
Hadoop overview
Hadoop overviewHadoop overview
Hadoop overview
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guide
 
Hw09 Sqoop Database Import For Hadoop
Hw09   Sqoop Database Import For HadoopHw09   Sqoop Database Import For Hadoop
Hw09 Sqoop Database Import For Hadoop
 
Big Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemBig Data and Hadoop Ecosystem
Big Data and Hadoop Ecosystem
 
Where does hadoop come handy
Where does hadoop come handyWhere does hadoop come handy
Where does hadoop come handy
 

En vedette

W - Recession & Depression in 28 different languages
W - Recession & Depression in 28 different languagesW - Recession & Depression in 28 different languages
W - Recession & Depression in 28 different languagesPatrick Pawlowski
 
Reinventando a Colmeia
Reinventando a ColmeiaReinventando a Colmeia
Reinventando a Colmeiajmm kazi
 
Information Retrieval with Open Source
Information Retrieval with Open SourceInformation Retrieval with Open Source
Information Retrieval with Open Sourcekorzonek
 
Alpha Kappa Psi Professional Development Workshop. Interviewing Made Easy
Alpha Kappa Psi Professional Development Workshop. Interviewing Made EasyAlpha Kappa Psi Professional Development Workshop. Interviewing Made Easy
Alpha Kappa Psi Professional Development Workshop. Interviewing Made Easyucsdakpsi
 

En vedette (6)

W - Recession & Depression in 28 different languages
W - Recession & Depression in 28 different languagesW - Recession & Depression in 28 different languages
W - Recession & Depression in 28 different languages
 
Reinventando a Colmeia
Reinventando a ColmeiaReinventando a Colmeia
Reinventando a Colmeia
 
Information Retrieval with Open Source
Information Retrieval with Open SourceInformation Retrieval with Open Source
Information Retrieval with Open Source
 
Alpha Kappa Psi Professional Development Workshop. Interviewing Made Easy
Alpha Kappa Psi Professional Development Workshop. Interviewing Made EasyAlpha Kappa Psi Professional Development Workshop. Interviewing Made Easy
Alpha Kappa Psi Professional Development Workshop. Interviewing Made Easy
 
Diigo Presentation
Diigo PresentationDiigo Presentation
Diigo Presentation
 
Q - The House Of Slaves
Q - The House Of SlavesQ - The House Of Slaves
Q - The House Of Slaves
 

Similaire à It takes two to tango! : Is SQL-on-Hadoop the next big step?

Windows Azure HDInsight Service
Windows Azure HDInsight ServiceWindows Azure HDInsight Service
Windows Azure HDInsight ServiceNeil Mackenzie
 
Hadoop and Big Data: Revealed
Hadoop and Big Data: RevealedHadoop and Big Data: Revealed
Hadoop and Big Data: RevealedSachin Holla
 
SQL on Hadoop: Defining the New Generation of Analytics Databases
SQL on Hadoop: Defining the New Generation of Analytics Databases  SQL on Hadoop: Defining the New Generation of Analytics Databases
SQL on Hadoop: Defining the New Generation of Analytics Databases DataWorks Summit
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer
 
Hadoop Training in Hyderabad | Online Training
Hadoop Training in Hyderabad | Online TrainingHadoop Training in Hyderabad | Online Training
Hadoop Training in Hyderabad | Online TrainingN Benchmark IT Solutions
 
Modern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An OverviewModern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An OverviewGreat Wide Open
 
7 Databases in 70 minutes
7 Databases in 70 minutes7 Databases in 70 minutes
7 Databases in 70 minutesKaren Lopez
 
Hadoop and big data training
Hadoop and big data trainingHadoop and big data training
Hadoop and big data trainingagiamas
 
Microsoft's Hadoop Story
Microsoft's Hadoop StoryMicrosoft's Hadoop Story
Microsoft's Hadoop StoryMichael Rys
 
Best Hadoop and Amazon Online Training
Best Hadoop and Amazon Online TrainingBest Hadoop and Amazon Online Training
Best Hadoop and Amazon Online TrainingSamatha Kamuni
 
Hadoop and aws map reducecourse
Hadoop and aws map reducecourseHadoop and aws map reducecourse
Hadoop and aws map reducecourseSamatha Kamuni
 
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Lace Lofranco
 
Big data or big deal
Big data or big dealBig data or big deal
Big data or big dealeduarderwee
 
Data Lakes on Public Cloud: Breaking Data Management Monoliths
Data Lakes on Public Cloud: Breaking Data Management MonolithsData Lakes on Public Cloud: Breaking Data Management Monoliths
Data Lakes on Public Cloud: Breaking Data Management MonolithsItai Yaffe
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft PlatformAndrew Brust
 
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stackBig Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stackAndrew Brust
 

Similaire à It takes two to tango! : Is SQL-on-Hadoop the next big step? (20)

Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
Windows Azure HDInsight Service
Windows Azure HDInsight ServiceWindows Azure HDInsight Service
Windows Azure HDInsight Service
 
Hadoop and Big Data: Revealed
Hadoop and Big Data: RevealedHadoop and Big Data: Revealed
Hadoop and Big Data: Revealed
 
SQL on Hadoop: Defining the New Generation of Analytics Databases
SQL on Hadoop: Defining the New Generation of Analytics Databases  SQL on Hadoop: Defining the New Generation of Analytics Databases
SQL on Hadoop: Defining the New Generation of Analytics Databases
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
Hadoop Training in Hyderabad | Online Training
Hadoop Training in Hyderabad | Online TrainingHadoop Training in Hyderabad | Online Training
Hadoop Training in Hyderabad | Online Training
 
Modern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An OverviewModern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An Overview
 
Hive with HDInsight
Hive with HDInsightHive with HDInsight
Hive with HDInsight
 
7 Databases in 70 minutes
7 Databases in 70 minutes7 Databases in 70 minutes
7 Databases in 70 minutes
 
Hadoop and big data training
Hadoop and big data trainingHadoop and big data training
Hadoop and big data training
 
Microsoft's Hadoop Story
Microsoft's Hadoop StoryMicrosoft's Hadoop Story
Microsoft's Hadoop Story
 
Best Hadoop and Amazon Online Training
Best Hadoop and Amazon Online TrainingBest Hadoop and Amazon Online Training
Best Hadoop and Amazon Online Training
 
Hadoop and aws map reducecourse
Hadoop and aws map reducecourseHadoop and aws map reducecourse
Hadoop and aws map reducecourse
 
Sureh hadoop 3 years t
Sureh hadoop 3 years tSureh hadoop 3 years t
Sureh hadoop 3 years t
 
Apache drill
Apache drillApache drill
Apache drill
 
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
 
Big data or big deal
Big data or big dealBig data or big deal
Big data or big deal
 
Data Lakes on Public Cloud: Breaking Data Management Monoliths
Data Lakes on Public Cloud: Breaking Data Management MonolithsData Lakes on Public Cloud: Breaking Data Management Monoliths
Data Lakes on Public Cloud: Breaking Data Management Monoliths
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft Platform
 
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stackBig Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
 

Dernier

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 

Dernier (20)

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 

It takes two to tango! : Is SQL-on-Hadoop the next big step?

  • 1.
  • 2. It takes two to tango! Is SQL-on-Hadoop the next big step?
  • 3. Big Data Crunching A Retrospective
  • 5. What was it like before Hadoop? ThePhylogeneticTreeofElephants
  • 6. Partitioned or Sharded RDBMSs Data Warehouses Massively Parallel Databases Tech before Hadoop
  • 7. Massively Parallel Databases Shared Nothing Architecture
  • 12. What’s the best way to answer questions that span these two worlds? Can we interface SQL atop Hadoop? Can we combine the strengths of parallel databases with those of Hadoop?
  • 14. Distributed Query Processing Cloudera’s Impala MapR supported Apache Drill and more.. Split Query Processing Microsoft Polybase Hadapt SQL-on-Hadoop : Technical Approaches Faster Hive Hortonworks’ Stinger initiative Qubole’s Hive-on-the-Cloud
  • 16. Cloudera Impala : Architecture Clients Impala Shell JDBC/ODBC Client SQL Tools Data Node Data Node Impala Daemon Impala Daemon Impala Daemon Data Node Query Execution Query Planning Query Coordination Query Execution Query Planning Query Coordination Query Execution Query Planning Query Coordination State StoreMetadata Catalog HDFS Name Node Unified Metadata Store
  • 17. Life Cycle of an Impala Query Clients Impala Shell JDBC/ODBC Client SQL Tools Impala Daemon Data Node State StoreMetadata Catalog HDFS Name Node Impala Daemon Data Node Impala Daemon Data Node Impala Daemon Data Node Coordinate Execution Plan and Optimize Parse Query
  • 19. Polybase + PDW : Architecture Clients ADO.NET JDBC/ODBC Client OLEDB PDW Engine Service DMS Controller Loader Manager SQL Server HDFS Bridge Compute Node Data Move Service SQL Server Job Tracker Hadoop Cluster Name Node Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker PDW Cluster SQL Server Compute Node Data Move Service HDFS Bridge Compute Node Data Move Service SQL Server SQL Server Compute Node Data Move Service SQL Server PDW : Architecture Control Node
  • 20. CREATE HADOOP_CLUSTER GSL_CLUSTER WITH (namenode=‘hadoop-head’,namenode_port=9000, jobtracker=‘hadoop-head’,jobtracker_port=9010); Register the Hadoop Cluster with PDW
  • 21. Map HDFS File to External Tables in PDW CREATE EXTERNAL TABLE hdfsCustomer ( c_custkey!! bigint not null, c_name!! varchar(25) not null, c_address!! varchar(40) not null, c_nationkey! integer not null, c_phone! ! char(15) not null, c_acctbal!! decimal(15,2) not null, c_mktsegment! char(10) not null, c_comment!! varchar(117) not null) WITH (LOCATION='/tpch1gb/customer.tbl', FORMAT_OPTIONS (EXTERNAL_CLUSTER = GSL_CLUSTER, EXTERNAL_FILEFORMAT = TEXT_FORMAT));
  • 22. Life Cycle of a Split Query Clients ADO.NET JDBC/ODBC Client OLEDB Loader Manager Control Node DMS Controller Engine Service SQL Server HDFS Bridge Compute Node Data Move Service SQL Server Hadoop Cluster Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker PDW Cluster HDFS Bridge Compute Node Data Move Service SQL Server Plan Job Tracker Name Node Data Node Task Tracker
  • 23. SQL-on-Hadoop : The Technology Faster Hive Distributed Query Processors Split Query Processors