Soumettre la recherche
Mettre en ligne
Improving MySQL performance with Hadoop
•
41 j'aime
•
16,163 vues
Sagar Jauhari
Suivre
Presented at Java One & Oracle Develop 2012.
Lire moins
Lire la suite
Technologie
Signaler
Partager
Signaler
Partager
1 sur 39
Recommandé
MySQL Applier for Apache Hadoop: Real-Time Event Streaming to HDFS
MySQL Applier for Apache Hadoop: Real-Time Event Streaming to HDFS
Mats Kindahl
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
t3rmin4t0r
Data Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for Hadoop
Gwen (Chen) Shapira
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
DataWorks Summit
HBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and Spark
HBaseCon
Sql on everything with drill
Sql on everything with drill
Julien Le Dem
Hadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and Future
DataWorks Summit
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value Stores
DataWorks Summit
Recommandé
MySQL Applier for Apache Hadoop: Real-Time Event Streaming to HDFS
MySQL Applier for Apache Hadoop: Real-Time Event Streaming to HDFS
Mats Kindahl
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
t3rmin4t0r
Data Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for Hadoop
Gwen (Chen) Shapira
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
DataWorks Summit
HBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and Spark
HBaseCon
Sql on everything with drill
Sql on everything with drill
Julien Le Dem
Hadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and Future
DataWorks Summit
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value Stores
DataWorks Summit
Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
sunera pathan
Cloudera Impala
Cloudera Impala
Scott Leberknight
NoSQL Needs SomeSQL
NoSQL Needs SomeSQL
DataWorks Summit
Hadoop and rdbms with sqoop
Hadoop and rdbms with sqoop
Guy Harrison
Architecting Applications with Hadoop
Architecting Applications with Hadoop
markgrover
Apache Ratis - In Search of a Usable Raft Library
Apache Ratis - In Search of a Usable Raft Library
Tsz-Wo (Nicholas) Sze
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
Bikas Saha
Big Data Journey
Big Data Journey
Tugdual Grall
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
Modern Data Stack France
Operationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the Cloud
DataWorks Summit/Hadoop Summit
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
Cloudera, Inc.
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
MapR Technologies
Spark + HBase
Spark + HBase
DataWorks Summit/Hadoop Summit
New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2
DataWorks Summit
2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning
Adam Muise
Integration of HIve and HBase
Integration of HIve and HBase
Hortonworks
Applications on Hadoop
Applications on Hadoop
markgrover
February 2014 HUG : Pig On Tez
February 2014 HUG : Pig On Tez
Yahoo Developer Network
Spark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different Rules
DataWorks Summit/Hadoop Summit
Apache Tez – Present and Future
Apache Tez – Present and Future
DataWorks Summit
Introduction to Hadoop
Introduction to Hadoop
Joey Jablonski
Hadoop Tutorial for Beginners
Hadoop Tutorial for Beginners
business Corporate
Contenu connexe
Tendances
Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
sunera pathan
Cloudera Impala
Cloudera Impala
Scott Leberknight
NoSQL Needs SomeSQL
NoSQL Needs SomeSQL
DataWorks Summit
Hadoop and rdbms with sqoop
Hadoop and rdbms with sqoop
Guy Harrison
Architecting Applications with Hadoop
Architecting Applications with Hadoop
markgrover
Apache Ratis - In Search of a Usable Raft Library
Apache Ratis - In Search of a Usable Raft Library
Tsz-Wo (Nicholas) Sze
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
Bikas Saha
Big Data Journey
Big Data Journey
Tugdual Grall
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
Modern Data Stack France
Operationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the Cloud
DataWorks Summit/Hadoop Summit
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
Cloudera, Inc.
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
MapR Technologies
Spark + HBase
Spark + HBase
DataWorks Summit/Hadoop Summit
New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2
DataWorks Summit
2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning
Adam Muise
Integration of HIve and HBase
Integration of HIve and HBase
Hortonworks
Applications on Hadoop
Applications on Hadoop
markgrover
February 2014 HUG : Pig On Tez
February 2014 HUG : Pig On Tez
Yahoo Developer Network
Spark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different Rules
DataWorks Summit/Hadoop Summit
Apache Tez – Present and Future
Apache Tez – Present and Future
DataWorks Summit
Tendances
(20)
Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
Cloudera Impala
Cloudera Impala
NoSQL Needs SomeSQL
NoSQL Needs SomeSQL
Hadoop and rdbms with sqoop
Hadoop and rdbms with sqoop
Architecting Applications with Hadoop
Architecting Applications with Hadoop
Apache Ratis - In Search of a Usable Raft Library
Apache Ratis - In Search of a Usable Raft Library
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
Big Data Journey
Big Data Journey
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
Operationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the Cloud
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
Spark + HBase
Spark + HBase
New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2
2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning
Integration of HIve and HBase
Integration of HIve and HBase
Applications on Hadoop
Applications on Hadoop
February 2014 HUG : Pig On Tez
February 2014 HUG : Pig On Tez
Spark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different Rules
Apache Tez – Present and Future
Apache Tez – Present and Future
Similaire à Improving MySQL performance with Hadoop
Introduction to Hadoop
Introduction to Hadoop
Joey Jablonski
Hadoop Tutorial for Beginners
Hadoop Tutorial for Beginners
business Corporate
Impala: Real-time Queries in Hadoop
Impala: Real-time Queries in Hadoop
Cloudera, Inc.
Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014
Rajan Kanitkar
WHAT IS HADOOP AND ITS COMPONENTS?
WHAT IS HADOOP AND ITS COMPONENTS?
nakshatraL
Introduction to Apache hadoop
Introduction to Apache hadoop
Omar Jaber
Data Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big Data
Cloudera, Inc.
Hadoop and Big Data: Revealed
Hadoop and Big Data: Revealed
Sachin Holla
Hw09 Hadoop Db
Hw09 Hadoop Db
Cloudera, Inc.
Big Data Training in Amritsar
Big Data Training in Amritsar
E2MATRIX
Big Data Training in Mohali
Big Data Training in Mohali
E2MATRIX
Big data ppt
Big data ppt
Thirunavukkarasu Ps
Big Data Training in Ludhiana
Big Data Training in Ludhiana
E2MATRIX
2.1-HADOOP.pdf
2.1-HADOOP.pdf
MarianJRuben
Introduction To Hadoop Administration - SpringPeople
Introduction To Hadoop Administration - SpringPeople
SpringPeople
Hadoop online training
Hadoop online training
Keylabs
The power of hadoop in cloud computing
The power of hadoop in cloud computing
Joey Echeverria
Hadoop - A Very Short Introduction
Hadoop - A Very Short Introduction
dewang_mistry
Apache hadoop introduction and architecture
Apache hadoop introduction and architecture
Harikrishnan K
Hadoop and mysql by Chris Schneider
Hadoop and mysql by Chris Schneider
Dmitry Makarchuk
Similaire à Improving MySQL performance with Hadoop
(20)
Introduction to Hadoop
Introduction to Hadoop
Hadoop Tutorial for Beginners
Hadoop Tutorial for Beginners
Impala: Real-time Queries in Hadoop
Impala: Real-time Queries in Hadoop
Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014
WHAT IS HADOOP AND ITS COMPONENTS?
WHAT IS HADOOP AND ITS COMPONENTS?
Introduction to Apache hadoop
Introduction to Apache hadoop
Data Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big Data
Hadoop and Big Data: Revealed
Hadoop and Big Data: Revealed
Hw09 Hadoop Db
Hw09 Hadoop Db
Big Data Training in Amritsar
Big Data Training in Amritsar
Big Data Training in Mohali
Big Data Training in Mohali
Big data ppt
Big data ppt
Big Data Training in Ludhiana
Big Data Training in Ludhiana
2.1-HADOOP.pdf
2.1-HADOOP.pdf
Introduction To Hadoop Administration - SpringPeople
Introduction To Hadoop Administration - SpringPeople
Hadoop online training
Hadoop online training
The power of hadoop in cloud computing
The power of hadoop in cloud computing
Hadoop - A Very Short Introduction
Hadoop - A Very Short Introduction
Apache hadoop introduction and architecture
Apache hadoop introduction and architecture
Hadoop and mysql by Chris Schneider
Hadoop and mysql by Chris Schneider
Dernier
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
Igalia
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
Maria Levchenko
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Safe Software
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
apidays
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
sudhanshuwaghmare1
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
The Digital Insurer
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
The Digital Insurer
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
naman860154
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
Delhi Call girls
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
Khem
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
The Digital Insurer
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
Rafal Los
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Drew Madelung
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
UK Journal
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
Radu Cotescu
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
HampshireHUG
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
ThousandEyes
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
Michael W. Hawkins
Dernier
(20)
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
Improving MySQL performance with Hadoop
1.
Copyright © 2012,
Oracle and/or its affiliates. All rights reserved.
2.
Improving MySQL Performance
with Hadoop Sagar Jauhari, Manish Kumar Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
3.
India
May 03 – May 04, 2012 San Francisco September 30 – October 4, 2012 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
4.
Program Agenda ●
Introduction ● Inside Hadoop! ● Integration with MySQL ● Facebook's usage of MySQL & Hadoop ● Twitter's usage of MySQL &Hadoop Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
5.
Introduction MySQL
● 12 million product installations ● 65,000 downloads each day ● Part of the rapidly growing open source LAMP stack ● MySQL Commercial Editions Available Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
6.
Introduction Hadoop
● Highly scalable Distributed Framework ○ Yahoo! has a 4000 node cluster! ● Extremely powerful in terms of computation ○ Sorts a TB of random integers in 62 seconds! Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
7.
Introduction Hadoop is ..
● A scalable system for data storage and processing. ● Fault tolerant ● Parallelizes data processing across many nodes ● Leverages its distributed file system (HDFS)* to cheaply and reliably replicate chunks of data. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
8.
Introduction Who uses Hadoop?
● Yahoo: ■ Ad Systems and Web Search. ● Facebook: ■ Reporting/analytics and machine learning. ● Twitter: ■ Data warehousing, data analysis. ● Netflix: ■ Movie recommendation algorithm uses Hive ( which uses Hadoop, HDFS & MapReduce underneath) Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
9.
Introduction MySQL Vs Hadoop
MySQL Hadoop Data Capacity TB+ (may require sharding) PB+ Data per query GB? PB+ Read/Write Random read/write Sequential scans, Append - only Query Language SQL Java MapReduce, scripting languages, Hive QL Transaction Yes No Indexes Yes No Latence Sub-second (hopefully) Minutes to hours Data structure Structured Structured or unstructured Courtesy: Leveraging Hadoop to Augment MySQL Deployments, Sarah Sproehnle, Cloudera, 2010 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
10.
Inside Hadoop
A shallow Deep Dive Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
11.
Inside Hadoop HDFS
● A distributed, scalable, Name Node and portable file system written in Java ● Each node in a Hadoop HDFS instance typically has a single name-node; a cluster of data-nodes form the HDFS cluster. Map / Reduce Workers Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
12.
Inside Hadoop HDFS
● Uses the TCP/IP layer for Name Node communication ● Stores large files across multiple machines HDFS ● Single name node stores metadata in-memory. Map / Reduce Workers Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
13.
Inside Hadoop HDFS Copyright ©
2012, Oracle and/or its affiliates. All rights reserved.
14.
Inside Hadoop Map Reduce
● Design Goals ○ Scalability ○ Cost Efficiency ● Implementation ○ User Jobs are executed as 'map' and 'reduce' functions ○ Work distribution and fault tolerance are managed Input Map Shuffle and sort Reduce Output Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
15.
Inside Hadoop Map Reduce
● Map ○ Map Reduce job splits input data into independent chunks ○ Each chunk is processed by the map task in a parallel manner ○ Generic key-value computation Input Map Shuffle and sort Reduce Output Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
16.
Inside Hadoop Map Reduce
● Reduce ○ Data from data nodes is merge sorted so that the key-value pairs for a given key are contiguous ○ The merged data is read sequentially and the values are passed to the reduce method with an iterator reading the input file until the next key value is encountered Input Map Shuffle and sort Reduce Output Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
17.
Inside Hadoop Map Reduce
Input Map Shuffle and sort Reduce Output Word Word Count Hadoop Map Hadoop 2 Reduce MySQL MySQL 1 Hive Map Hive 1 Sqoop Reduce Sqoop 1 Pig Map Pig 1 Hadoop Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
18.
Inside Hadoop How does
hadoop use Map-Reduce ● Framework consists of a single master JobTracker and one slave TaskTracker per cluster-node. ● Master ○ Schedules the jobs' component tasks on the slaves ○ Monitors the jobs ○ Re-executes the failed tasks ● Slave ○ Executes the tasks as directed by the master. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
19.
Inside Hadoop Why Map
Reduce ? ● Language support ○ Java, PHP, Hive, Pig, Python, Wukong (Ruby), Rhipe (R) . ● Scales Horizontally ● Programmer is isolated from individual failed tasks ○ Tasks are restarted on another node Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
20.
Inside Hadoop Map Reduce
Limitations ● Not a good fit for problems that exhibit task-driven parallelism. ● Requires a particular form of input - a set of (key, pair) pairs. ● A lot of MapReduce applications end up sharing data one way or another. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
21.
Integration with MySQL
Leveraging Hadoop to Improve MySQL performance Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
22.
Integration with MySQL ●
The benefits of MySQL to developers is the speed, reliability, data integrity and scalability it provides. ● It can successfully process large amounts of data (in petabytes). ● But for applications that require a massive parallel processing we may need the benefits of a parallel processing system, such as hadoop. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
23.
Integration with MySQL Image
Source: Leveraging Hadoop to Augment MySQL Deployments, Sarah Sproehnle, Cloudera, 2010 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
24.
Integration with MySQL
Problem Statement Word Count Problem ● In a large set of documents, find the number of occurrences of each word. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
25.
Integration with MySQL Word
count problem Input Map Shuffle and sort Reduce Output Word Word Count Hadoop Map Hadoop 2 Reduce MySQL MySQL 1 Hive Map Hive 1 Sqoop Reduce Sqoop 1 Pig Map Pig 1 Hadoop Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
26.
Integration with MySQL Mapping
Key and Value represent a row of data: Map key is the byte office, value in a line. (key, value) Intermediate Output foreach <word1>, 1 (word in <word2>, 1 the <word3>, 1 value) output (word,1) Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
27.
Integration with MySQL Reducing
Hadoop aggregates the keys Reduce and calls reduce for each (key, list) unique key: sum <word1>, (1,1,1,1,1,1…1) the list <word2>, (1,1,1) Output <word3>, (1,1,1,1,1,1) . (key, Final result: sum) <word1>, 45823 <word2>, 1204 <word3>, 2693 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
28.
Integration with MySQL
Demo Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
29.
Integration with MySQL Video Copyright
© 2012, Oracle and/or its affiliates. All rights reserved.
30.
Facebook's usage of
MySQL & Hadoop ● Facebook collects TB of data everyday from around 800 million users. ● MySQL handles pretty much every user interaction: likes, shares, status updates, alerts, requests, etc. ● Hadoop/Hive Warehouse – 4800 cores, 2 PetaBytes (July 2009) – 4800 cores, 12 PetaBytes (Sept 2009) ● Hadoop Archival Store – 200 TB Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
31.
Facebook's usage of
MySQL & Hadoop Hive ● Data warehouse system for Hadoop. ● Facilitates easy data summarization. ● Hive translates HiveQL to MapReduce code. ● Querying ○ Provides a mechanism to project structure onto this data ○ Allows querying the data using a SQL-like language called HiveQL Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
32.
Facebook's usage of
MySQL & Hadoop Image Source: Leveraging Hadoop to Augment MySQL Deployments, Sarah Sproehnle, Cloudera, 2010 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
33.
Hive Vs SQL
RDBMS HIVE SQL-92 standard (maybe) Subset of SQL-92 plus Hive- Language specific extension INSERT, UPDATE and INSERT but not UPDATE or Update Capabilities DELETE DELETE Yes No Transactions Sub-Second Minutes or more Latency Any number of indexes, No indexes, data is always Indexes very scanned (in parallel) important for performance TBs PBs Data size Data per query GBs Image Source: Leveraging Hadoop to Augment MySQL Deployments, Sarah Sproehnle, Cloudera, 2010 PBs Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
34.
Hadoop Implementation At Twitter
● > 12 terabytes of new data per day! ● Most stored data is LZ0 compressed ● Uses Scribe to write logs to Hadoop ○ Scribe: a log collection framework created and open- sourced by Facebook. ● Hadoop used for data warehousing, data analysis. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
35.
References
● Leveraging Hadoop to Augment MySQL Deployments - Sarah Sproehnle, Cloudera ● http://engineering.twitter.com/2010/04/hadoop-at-twitter.html ● http://semanticvoid.com ● http://michael-noll.com ● http://hadoop.apache.org/ Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
36.
Legal Disclaimer
● All other products, company names, brand names, trademarks and logos are the property of their respective owners. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
37.
Copyright © 2012,
Oracle and/or its affiliates. All rights reserved.
38.
Thank You Copyright ©
2012, Oracle and/or its affiliates. All rights reserved.
39.
Copyright © 2012,
Oracle and/or its affiliates. All rights reserved.