Soumettre la recherche
Mettre en ligne
Hadoop Hive Talk At IIT-Delhi
•
Télécharger en tant que PPT, PDF
•
14 j'aime
•
3,829 vues
Joydeep Sen Sarma
Suivre
Talk at the CS department in IIT 04/02/09.
Lire moins
Lire la suite
Technologie
Signaler
Partager
Signaler
Partager
1 sur 37
Télécharger maintenant
Recommandé
Qubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant Conference
Joydeep Sen Sarma
Nextag talk
Nextag talk
Joydeep Sen Sarma
Facebook Retrospective - Big data-world-europe-2012
Facebook Retrospective - Big data-world-europe-2012
Joydeep Sen Sarma
Cloud Optimized Big Data
Cloud Optimized Big Data
Joydeep Sen Sarma
Qubole @ AWS Meetup Bangalore - July 2015
Qubole @ AWS Meetup Bangalore - July 2015
Joydeep Sen Sarma
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologies
Kelly Technologies
Messaging architecture @FB (Fifth Elephant Conference)
Messaging architecture @FB (Fifth Elephant Conference)
Joydeep Sen Sarma
Migrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMS
Bouquet
Recommandé
Qubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant Conference
Joydeep Sen Sarma
Nextag talk
Nextag talk
Joydeep Sen Sarma
Facebook Retrospective - Big data-world-europe-2012
Facebook Retrospective - Big data-world-europe-2012
Joydeep Sen Sarma
Cloud Optimized Big Data
Cloud Optimized Big Data
Joydeep Sen Sarma
Qubole @ AWS Meetup Bangalore - July 2015
Qubole @ AWS Meetup Bangalore - July 2015
Joydeep Sen Sarma
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologies
Kelly Technologies
Messaging architecture @FB (Fifth Elephant Conference)
Messaging architecture @FB (Fifth Elephant Conference)
Joydeep Sen Sarma
Migrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMS
Bouquet
Cloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQL
liuknag
Hadoop Primer
Hadoop Primer
Steve Staso
Hadoop - Overview
Hadoop - Overview
Jay
Big Data Journey
Big Data Journey
Tugdual Grall
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & Pig
Milind Bhandarkar
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
Cloudera, Inc.
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Databricks
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
Nishith Agarwal
An intriduction to hive
An intriduction to hive
Reza Ameri
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
royans
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
Modern Data Stack France
Apache Hadoop and HBase
Apache Hadoop and HBase
Cloudera, Inc.
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with Hadoop
Hadoop User Group
מיכאל
מיכאל
sqlserver.co.il
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
Bill Liu
Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017
Vinoth Chandar
Powering a Virtual Power Station with Big Data
Powering a Virtual Power Station with Big Data
DataWorks Summit/Hadoop Summit
Apache Hadoop 1.1
Apache Hadoop 1.1
Sperasoft
Big Data and Hadoop Ecosystem
Big Data and Hadoop Ecosystem
Rajkumar Singh
Hadoop Tutorial
Hadoop Tutorial
awesomesos
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
Zheng Shao
Hive
Hive
Srinath Reddy
Contenu connexe
Tendances
Cloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQL
liuknag
Hadoop Primer
Hadoop Primer
Steve Staso
Hadoop - Overview
Hadoop - Overview
Jay
Big Data Journey
Big Data Journey
Tugdual Grall
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & Pig
Milind Bhandarkar
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
Cloudera, Inc.
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Databricks
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
Nishith Agarwal
An intriduction to hive
An intriduction to hive
Reza Ameri
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
royans
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
Modern Data Stack France
Apache Hadoop and HBase
Apache Hadoop and HBase
Cloudera, Inc.
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with Hadoop
Hadoop User Group
מיכאל
מיכאל
sqlserver.co.il
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
Bill Liu
Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017
Vinoth Chandar
Powering a Virtual Power Station with Big Data
Powering a Virtual Power Station with Big Data
DataWorks Summit/Hadoop Summit
Apache Hadoop 1.1
Apache Hadoop 1.1
Sperasoft
Big Data and Hadoop Ecosystem
Big Data and Hadoop Ecosystem
Rajkumar Singh
Hadoop Tutorial
Hadoop Tutorial
awesomesos
Tendances
(20)
Cloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQL
Hadoop Primer
Hadoop Primer
Hadoop - Overview
Hadoop - Overview
Big Data Journey
Big Data Journey
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & Pig
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
An intriduction to hive
An intriduction to hive
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
Apache Hadoop and HBase
Apache Hadoop and HBase
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with Hadoop
מיכאל
מיכאל
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017
Powering a Virtual Power Station with Big Data
Powering a Virtual Power Station with Big Data
Apache Hadoop 1.1
Apache Hadoop 1.1
Big Data and Hadoop Ecosystem
Big Data and Hadoop Ecosystem
Hadoop Tutorial
Hadoop Tutorial
Similaire à Hadoop Hive Talk At IIT-Delhi
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
Zheng Shao
Hive
Hive
Srinath Reddy
Hive Apachecon 2008
Hive Apachecon 2008
athusoo
2008 Ur Tech Talk Zshao
2008 Ur Tech Talk Zshao
Jeff Hammerbacher
Hadoop and Hive
Hadoop and Hive
Zheng Shao
Hive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use Cases
nzhang
Hive ICDE 2010
Hive ICDE 2010
ragho
Hive Percona 2009
Hive Percona 2009
prasadc
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
nzhang
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Yahoo Developer Network
An introduction to Hadoop for large scale data analysis
An introduction to Hadoop for large scale data analysis
Abhijit Sharma
Hadoop Summit 2009 Hive
Hadoop Summit 2009 Hive
Zheng Shao
Hadoop Summit 2009 Hive
Hadoop Summit 2009 Hive
Namit Jain
Meethadoop
Meethadoop
IIIT-H
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
Xiao Qin
02 data warehouse applications with hive
02 data warehouse applications with hive
Subhas Kumar Ghosh
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebook
ragho
Hive User Meeting 2009 8 Facebook
Hive User Meeting 2009 8 Facebook
Zheng Shao
Hadoop institutes in hyderabad
Hadoop institutes in hyderabad
Kelly Technologies
Stratosphere with big_data_analytics
Stratosphere with big_data_analytics
Avinash Pandu
Similaire à Hadoop Hive Talk At IIT-Delhi
(20)
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
Hive
Hive
Hive Apachecon 2008
Hive Apachecon 2008
2008 Ur Tech Talk Zshao
2008 Ur Tech Talk Zshao
Hadoop and Hive
Hadoop and Hive
Hive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use Cases
Hive ICDE 2010
Hive ICDE 2010
Hive Percona 2009
Hive Percona 2009
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
An introduction to Hadoop for large scale data analysis
An introduction to Hadoop for large scale data analysis
Hadoop Summit 2009 Hive
Hadoop Summit 2009 Hive
Hadoop Summit 2009 Hive
Hadoop Summit 2009 Hive
Meethadoop
Meethadoop
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
02 data warehouse applications with hive
02 data warehouse applications with hive
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebook
Hive User Meeting 2009 8 Facebook
Hive User Meeting 2009 8 Facebook
Hadoop institutes in hyderabad
Hadoop institutes in hyderabad
Stratosphere with big_data_analytics
Stratosphere with big_data_analytics
Dernier
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
MounikaPolabathina
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
Neo4j
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
Alan Dix
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
Wes McKinney
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
Nathaniel Shimoni
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Pim van der Noll
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
Curtis Poe
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
ThousandEyes
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
LoriGlavin3
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
LoriGlavin3
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
AliaaTarek5
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
Mydbops
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
IES VE
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Scott Andery
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
Ravi Sanghani
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
Sergiu Bodiu
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
BookNet Canada
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
LoriGlavin3
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
HarshalMandlekar2
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
LoriGlavin3
Dernier
(20)
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Hadoop Hive Talk At IIT-Delhi
1.
Hadoop and Hive
Large Scale Data Processing using Commodity HW/SW Joydeep Sen Sarma
2.
3.
4.
5.
Looks like this
.. Disks Node Disks Node Disks Node Disks Node Disks Node Disks Node 1 Gigabit 4-8 Gigabit Node = DataNode + Map-Reduce
6.
7.
In pictures ..
NameNode Disks 32GB RAM Secondary NameNode Disks 32GB RAM DataNode DataNode DataNode DFS Client DataNode DataNode DataNode getLocations locations
8.
9.
10.
Map/Reduce DataFLow
11.
12.
13.
HIVE: Components HDFS
Hive CLI DDL Queries Browsing Map Reduce MetaStore Thrift API SerDe Thrift Jute JSON.. Execution Hive QL Parser Planner Mgmt. Web UI
14.
Data Model Logical
Partitioning Hash Partitioning Schema Library clicks HDFS MetaStore / hive/clicks /hive/clicks/ds=2008-03-25 /hive/clicks/ds=2008-03-25/0 … Tables #Buckets=32 Bucketing Info Partitioning Cols
15.
16.
17.
18.
Hive QL –
Join in Map Reduce page_view user pv_users Map Shuffle Sort Reduce key value 111 < 1, 1> 111 < 1, 2> 222 < 1, 1> pageid userid time 1 111 9:08:01 2 111 9:08:13 1 222 9:08:14 userid age gender 111 25 female 222 32 male key value 111 < 2, 25> 222 < 2, 32> key value 111 < 1, 1> 111 < 1, 2> 111 < 2, 25> key value 222 < 1, 1> 222 < 2, 32> pageid age 1 25 2 25 pageid age 1 32
19.
20.
21.
22.
23.
Hive QL –
Group By in Map Reduce pv_users Map Shuffle Sort Reduce pageid age 1 25 2 25 pageid age count 1 25 1 1 32 1 pageid age 1 32 2 25 key value <1,25> 1 <2,25> 1 key value <1,32> 1 <2,25> 1 key value <1,25> 1 <1,32> 1 key value <2,25> 1 <2,25> 1 pageid age count 2 25 2
24.
25.
Hive QL –
Group By with Distinct in Map Reduce page_view Shuffle and Sort Reduce Map Reduce pageid count 1 1 2 1 pageid count 1 1 pageid userid time 1 111 9:08:01 2 111 9:08:13 pageid userid time 1 222 9:08:14 2 111 9:08:20 key v <1,111> <2,111> <2,111> key v <1,222> pageid count 1 2 pageid count 2 1
26.
27.
28.
29.
30.
31.
32.
Data Warehousing at
Facebook Today Web Servers Scribe Servers Filers Hive on Hadoop Cluster Oracle RAC Federated MySQL
33.
34.
In Pictures
35.
36.
37.
Notes de l'éditeur
Offline and Near-Real time data processing Not online
Télécharger maintenant