SlideShare a Scribd company logo
1 of 33
Cloud Infrastructure:
  GFS & MapReduce
          Andrii Vozniuk
  Used slides of: Jeff Dean, Ed Austin


Data Management in the Cloud
            EPFL
      February 27, 2012
Outline
•   Motivation
•   Problem Statement
•   Storage: Google File System (GFS)
•   Processing: MapReduce
•   Benchmarks
•   Conclusions
Motivation
• Huge amounts of data to store and process
• Example @2004:
  – 20+ billion web pages x 20KB/page = 400+ TB
  – Reading from one disc 30-35 MB/s
     • Four months just to read the web
     • 1000 hard drives just to store the web
     • Even more complicated if we want to process data
• Exp. growth. The solution should be scalable.
Motivation
• Buy super fast, ultra reliable hardware?
  – Ultra expensive
  – Controlled by third party
  – Internals can be hidden and proprietary
  – Hard to predict scalability
  – Fails less often, but still fails!
  – No suitable solution on the market
Motivation
• Use commodity hardware? Benefits:
   –   Commodity machines offer much better perf/$
   –   Full control on and understanding of internals
   –   Can be highly optimized for their workloads
   –   Really smart people can do really smart things

• Not that easy:
   – Fault tolerance: something breaks all the time
   – Applications development
   – Debugging, Optimization, Locality
   – Communication and coordination
   – Status reporting, monitoring
• Handle all these issues for every problem you want to solve
Problem Statement
• Develop a scalable distributed file system for
  large data-intensive applications running on
  inexpensive commodity hardware
• Develop a tool for processing large data sets in
  parallel on inexpensive commodity hardware
• Develop the both in coordination for optimal
  performance
Google Cluster Environment



          Cloud                  Datacenters




Servers       Racks   Clusters
Google Cluster Environment
• @2009:
   –   200+ clusters
   –   1000+ machines in many of them
   –   4+ PB File systems
   –   40GB/s read/write load
   –   Frequent HW failures
   –   100s to 1000s active jobs (1 to 1000 tasks)
• Cluster is 1000s of machines
   – Stuff breaks: for 1000 – 1 per day, for 10000 – 10 per day
   – How to store data reliably with high throughput?
   – How to make it easy to develop distributed applications?
Google Technology Stack
Google Technology Stack




                   Focus of
                   this talk
GF S          Google File System*
•   Inexpensive commodity hardware
•   High Throughput > Low Latency
•   Large files (multi GB)
•   Multiple clients
•   Workload
    – Large streaming reads
    – Small random writes
    – Concurrent append to the same file
* The Google File System. S. Ghemawat, H. Gobioff, S. Leung. SOSP, 2003
GF S                      Architecture




•   User-level process running on commodity Linux machines
•   Consists of Master Server and Chunk Servers
•   Files broken into chunks (typically 64 MB), 3x redundancy (clusters, DCs)
•   Data transfers happen directly between clients and Chunk Servers
GF S                  Master Node
• Centralization for simplicity
• Namespace and metadata management
• Managing chunks
   –   Where they are (file<-chunks, replicas)
   –   Where to put new
   –   When to re-replicate (failure, load-balancing)
   –   When and what to delete (garbage collection)
• Fault tolerance
   –   Shadow masters
   –   Monitoring infrastructure outside of GFS
   –   Periodic snapshots
   –   Mirrored operations log
GF S              Master Node
• Metadata is in memory – it’s fast!
  – A 64 MB chunk needs less than 64B metadata => for 640
    TB less than 640MB
• Asks Chunk Servers when
  – Master starts
  – Chunk Server joins the cluster
• Operation log
  – Is used for serialization of concurrent operations
  – Replicated
  – Respond to client only when log is flushed locally and
    remotely
GF S          Chunk Servers
• 64MB chunks as Linux files
  – Reduce size of the master‘s datastructure
  – Reduce client-master interaction
  – Internal fragmentation => allocate space lazily
  – Possible hotspots => re-replicate
• Fault tolerance
  – Heart-beat to the master
  – Something wrong => master inits replication
GF S                 Mutation Order
                             Current lease holder?
    Write request


                        3a. data          identity of primary
                                          location of replicas
                                          (cached by client)

Operation completed                Operation completed
or Error report         3b. data
                                          Primary assigns # to mutations
                                          Applies it
                                          Forwards write request

                        3c. data

                                   Operation completed
GF S         Control & Data Flow
• Decouple control flow and data flow
• Control flow
  – Master -> Primary -> Secondaries
• Data flow
  – Carefully picked chain of Chunk Servers
       • Forward to the closest first
       • Distance estimated based on IP
  – Fully utilize outbound bandwidth
  – Pipelining to exploit full-duplex links
GF S     Other Important Things
• Snapshot operation – make a copy very fast
• Smart chunks creation policy
  – Below-average disk utilization, limited # of recent
• Smart re-replication policy
  – Under replicated first
  – Chunks that are blocking client
  – Live files first (rather than deleted)
• Rebalance and GC periodically

          How to process data stored in GFS?
M M M

 R   R                MapReduce*
• A simple programming model applicable to many
  large-scale computing problems
• Divide and conquer strategy
• Hide messy details in MapReduce runtime library:
     –   Automatic parallelization
     –   Load balancing
     –   Network and disk transfer optimizations
     –   Fault tolerance
     –   Part of the stack: improvements to core library benefit
         all users of library
*MapReduce: Simplified Data Processing on Large Clusters.
J. Dean, S. Ghemawat. OSDI, 2004
M M M

R   R           Typical problem
• Read a lot of data. Break it into the parts
• Map: extract something important from each part
• Shuffle and Sort
• Reduce: aggregate, summarize, filter, transform Map results
• Write the results
• Chain, Cascade

• Implement Map and Reduce to fit the problem
M M M

R   R   Nice Example
M M M

R   R     Other suitable examples
•   Distributed grep
•   Distributed sort (Hadoop MapReduce won TeraSort)
•   Term-vector per host
•   Document clustering
•   Machine learning
•   Web access log stats
•   Web link-graph reversal
•   Inverted index construction
•   Statistical machine translation
M M M

R   R                        Model
• Programmer specifies two primary methods
    – Map(k,v) -> <k’,v’>*
    – Reduce(k’, <v’>*) -> <k’,v’>*
• All v’ with same k’ are reduced together, in order
• Usually also specify how to partition k’
    – Partition(k’, total partitions) -> partition for k’
        • Often a simple hash of the key
        • Allows reduce operations for different k’ to be parallelized
M M M

R   R                   Code
Map
    map(String key, String value):
      // key: document name
      // value: document contents
      for each word w in value:
          EmitIntermediate(w, "1");
Reduce
    reduce(String key, Iterator values):
      // key: a word
      // values: a list of counts
      int result = 0;
      for each v in values:
          result += ParseInt(v);
      Emit(AsString(result))
M M M

R   R                 Architecture




        • One master, many workers
        • Infrastructure manages scheduling and distribution
        • User implements Map and Reduce
M M M

R   R   Architecture
M M M

R   R   Architecture
M M M

R   R   Architecture




        Combiner = local Reduce
M M M

R   R               Important things
•   Mappers scheduled close to data
•   Chunk replication improves locality
•   Reducers often run on same machine as mappers
•   Fault tolerance
    –   Map crash – re-launch all task of machine
    –   Reduce crash – repeat crashed task only
    –   Master crash – repeat whole job
    –   Skip bad records
• Fighting ‘stragglers’ by launching backup tasks
• Proven scalability
• September 2009 Google ran 3,467,000 MR Jobs averaging
  488 machines per Job
• Extensively used in Yahoo and Facebook with Hadoop
GF S   Benchmarks: GFS
M M M

R   R   Benchmarks: MapReduce
Conclusions
“We believe we get tremendous competitive advantage
by essentially building our own infrastructure”
-- Eric Schmidt

• GFS & MapReduce
   – Google achieved their goals
   – A fundamental part of their stack
• Open source implementations
   – GFS  Hadoop Distributed FS (HDFS)
   – MapReduce  Hadoop MapReduce
Thank your for your attention!
   Andrii.Vozniuk@epfl.ch

More Related Content

What's hot

The Google File System (GFS)
The Google File System (GFS)The Google File System (GFS)
The Google File System (GFS)Romain Jacotin
 
Google File System
Google File SystemGoogle File System
Google File Systemguest2cb4689
 
advanced Google file System
advanced Google file Systemadvanced Google file System
advanced Google file Systemdiptipan
 
Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...Antonio Cesarano
 
Replication, Durability, and Disaster Recovery
Replication, Durability, and Disaster RecoveryReplication, Durability, and Disaster Recovery
Replication, Durability, and Disaster RecoverySteven Francia
 
google file system
google file systemgoogle file system
google file systemdiptipan
 
Google file system
Google file systemGoogle file system
Google file systemDhan V Sagar
 
Google File Systems
Google File SystemsGoogle File Systems
Google File SystemsAzeem Mumtaz
 
Google file system GFS
Google file system GFSGoogle file system GFS
Google file system GFSzihad164
 
Google File System
Google File SystemGoogle File System
Google File Systemnadikari123
 
Teoria efectului defectului hardware: GoogleFS
Teoria efectului defectului hardware: GoogleFSTeoria efectului defectului hardware: GoogleFS
Teoria efectului defectului hardware: GoogleFSAsociatia ProLinux
 
MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...
MapReduce:Simplified Data Processing on Large Cluster  Presented by Areej Qas...MapReduce:Simplified Data Processing on Large Cluster  Presented by Areej Qas...
MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...areej qasrawi
 

What's hot (20)

The Google File System (GFS)
The Google File System (GFS)The Google File System (GFS)
The Google File System (GFS)
 
Google file system
Google file systemGoogle file system
Google file system
 
Google File System
Google File SystemGoogle File System
Google File System
 
Google File System
Google File SystemGoogle File System
Google File System
 
advanced Google file System
advanced Google file Systemadvanced Google file System
advanced Google file System
 
Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...
 
Google file system
Google file systemGoogle file system
Google file system
 
Replication, Durability, and Disaster Recovery
Replication, Durability, and Disaster RecoveryReplication, Durability, and Disaster Recovery
Replication, Durability, and Disaster Recovery
 
Google File System
Google File SystemGoogle File System
Google File System
 
google file system
google file systemgoogle file system
google file system
 
Google file system
Google file systemGoogle file system
Google file system
 
Google File Systems
Google File SystemsGoogle File Systems
Google File Systems
 
Google file system GFS
Google file system GFSGoogle file system GFS
Google file system GFS
 
Google File System
Google File SystemGoogle File System
Google File System
 
Google file system
Google file systemGoogle file system
Google file system
 
Unit 2.pptx
Unit 2.pptxUnit 2.pptx
Unit 2.pptx
 
Teoria efectului defectului hardware: GoogleFS
Teoria efectului defectului hardware: GoogleFSTeoria efectului defectului hardware: GoogleFS
Teoria efectului defectului hardware: GoogleFS
 
Gfs介绍
Gfs介绍Gfs介绍
Gfs介绍
 
MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...
MapReduce:Simplified Data Processing on Large Cluster  Presented by Areej Qas...MapReduce:Simplified Data Processing on Large Cluster  Presented by Areej Qas...
MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...
 
Anatomy of file write in hadoop
Anatomy of file write in hadoopAnatomy of file write in hadoop
Anatomy of file write in hadoop
 

Similar to Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk

HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.pptvijayapraba1
 
Distributed Data processing in a Cloud
Distributed Data processing in a CloudDistributed Data processing in a Cloud
Distributed Data processing in a Cloudelliando dias
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...Reynold Xin
 
Map reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clustersMap reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clustersCleverence Kombe
 
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewDistributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewKonstantin V. Shvachko
 
MapReduce presentation
MapReduce presentationMapReduce presentation
MapReduce presentationVu Thi Trang
 
In memory grids IMDG
In memory grids IMDGIn memory grids IMDG
In memory grids IMDGPrateek Jain
 
Large scale computing with mapreduce
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreducehansen3032
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in JavaRuben Badaró
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingBikas Saha
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopHortonworks
 
Introduction to Hadoop and Big Data
Introduction to Hadoop and Big DataIntroduction to Hadoop and Big Data
Introduction to Hadoop and Big DataJoe Alex
 
Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez DataWorks Summit
 
Hadoop performance optimization tips
Hadoop performance optimization tipsHadoop performance optimization tips
Hadoop performance optimization tipsSubhas Kumar Ghosh
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introductionSandeep Singh
 

Similar to Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk (20)

try
trytry
try
 
Hadoop
HadoopHadoop
Hadoop
 
HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.ppt
 
Distributed Data processing in a Cloud
Distributed Data processing in a CloudDistributed Data processing in a Cloud
Distributed Data processing in a Cloud
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
 
Map reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clustersMap reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clusters
 
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewDistributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology Overview
 
MapReduce presentation
MapReduce presentationMapReduce presentation
MapReduce presentation
 
In memory grids IMDG
In memory grids IMDGIn memory grids IMDG
In memory grids IMDG
 
Hadoop - Introduction to HDFS
Hadoop - Introduction to HDFSHadoop - Introduction to HDFS
Hadoop - Introduction to HDFS
 
MapReduce.pptx
MapReduce.pptxMapReduce.pptx
MapReduce.pptx
 
Map reducecloudtech
Map reducecloudtechMap reducecloudtech
Map reducecloudtech
 
Large scale computing with mapreduce
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreduce
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in Java
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
 
Introduction to Hadoop and Big Data
Introduction to Hadoop and Big DataIntroduction to Hadoop and Big Data
Introduction to Hadoop and Big Data
 
Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez
 
Hadoop performance optimization tips
Hadoop performance optimization tipsHadoop performance optimization tips
Hadoop performance optimization tips
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
 

More from Andrii Vozniuk

Enhancing Social Media Platforms for Educational and Humanitarian Knowledge S...
Enhancing Social Media Platforms for Educational and Humanitarian Knowledge S...Enhancing Social Media Platforms for Educational and Humanitarian Knowledge S...
Enhancing Social Media Platforms for Educational and Humanitarian Knowledge S...Andrii Vozniuk
 
Embedded interactive learning analytics dashboards with Elasticsearch and Kib...
Embedded interactive learning analytics dashboards with Elasticsearch and Kib...Embedded interactive learning analytics dashboards with Elasticsearch and Kib...
Embedded interactive learning analytics dashboards with Elasticsearch and Kib...Andrii Vozniuk
 
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...Andrii Vozniuk
 
Combining content analytics and activity tracking to mine user interests and ...
Combining content analytics and activity tracking to mine user interests and ...Combining content analytics and activity tracking to mine user interests and ...
Combining content analytics and activity tracking to mine user interests and ...Andrii Vozniuk
 
TPC-DS performance evaluation for JAQL and PIG queries - Andrii Vozniuk, Serg...
TPC-DS performance evaluation for JAQL and PIG queries - Andrii Vozniuk, Serg...TPC-DS performance evaluation for JAQL and PIG queries - Andrii Vozniuk, Serg...
TPC-DS performance evaluation for JAQL and PIG queries - Andrii Vozniuk, Serg...Andrii Vozniuk
 
Contextual learning analytics apps to create awareness in blended inquiry lea...
Contextual learning analytics apps to create awareness in blended inquiry lea...Contextual learning analytics apps to create awareness in blended inquiry lea...
Contextual learning analytics apps to create awareness in blended inquiry lea...Andrii Vozniuk
 
Graspeo: a Social Media Platform for Knowledge Management in NGOs - Andrii Vo...
Graspeo: a Social Media Platform for Knowledge Management in NGOs - Andrii Vo...Graspeo: a Social Media Platform for Knowledge Management in NGOs - Andrii Vo...
Graspeo: a Social Media Platform for Knowledge Management in NGOs - Andrii Vo...Andrii Vozniuk
 
Towards portable learning analytics dashboards - Andrii Vozniuk, Sten Govaert...
Towards portable learning analytics dashboards - Andrii Vozniuk, Sten Govaert...Towards portable learning analytics dashboards - Andrii Vozniuk, Sten Govaert...
Towards portable learning analytics dashboards - Andrii Vozniuk, Sten Govaert...Andrii Vozniuk
 
AngeLA: Putting the teacher in control of student privacy in the online class...
AngeLA: Putting the teacher in control of student privacy in the online class...AngeLA: Putting the teacher in control of student privacy in the online class...
AngeLA: Putting the teacher in control of student privacy in the online class...Andrii Vozniuk
 
Scheduling in distributed systems - Andrii Vozniuk
Scheduling in distributed systems - Andrii VozniukScheduling in distributed systems - Andrii Vozniuk
Scheduling in distributed systems - Andrii VozniukAndrii Vozniuk
 
Symbolic Reasoning and Concrete Execution - Andrii Vozniuk
Symbolic Reasoning and Concrete Execution - Andrii Vozniuk Symbolic Reasoning and Concrete Execution - Andrii Vozniuk
Symbolic Reasoning and Concrete Execution - Andrii Vozniuk Andrii Vozniuk
 

More from Andrii Vozniuk (11)

Enhancing Social Media Platforms for Educational and Humanitarian Knowledge S...
Enhancing Social Media Platforms for Educational and Humanitarian Knowledge S...Enhancing Social Media Platforms for Educational and Humanitarian Knowledge S...
Enhancing Social Media Platforms for Educational and Humanitarian Knowledge S...
 
Embedded interactive learning analytics dashboards with Elasticsearch and Kib...
Embedded interactive learning analytics dashboards with Elasticsearch and Kib...Embedded interactive learning analytics dashboards with Elasticsearch and Kib...
Embedded interactive learning analytics dashboards with Elasticsearch and Kib...
 
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
 
Combining content analytics and activity tracking to mine user interests and ...
Combining content analytics and activity tracking to mine user interests and ...Combining content analytics and activity tracking to mine user interests and ...
Combining content analytics and activity tracking to mine user interests and ...
 
TPC-DS performance evaluation for JAQL and PIG queries - Andrii Vozniuk, Serg...
TPC-DS performance evaluation for JAQL and PIG queries - Andrii Vozniuk, Serg...TPC-DS performance evaluation for JAQL and PIG queries - Andrii Vozniuk, Serg...
TPC-DS performance evaluation for JAQL and PIG queries - Andrii Vozniuk, Serg...
 
Contextual learning analytics apps to create awareness in blended inquiry lea...
Contextual learning analytics apps to create awareness in blended inquiry lea...Contextual learning analytics apps to create awareness in blended inquiry lea...
Contextual learning analytics apps to create awareness in blended inquiry lea...
 
Graspeo: a Social Media Platform for Knowledge Management in NGOs - Andrii Vo...
Graspeo: a Social Media Platform for Knowledge Management in NGOs - Andrii Vo...Graspeo: a Social Media Platform for Knowledge Management in NGOs - Andrii Vo...
Graspeo: a Social Media Platform for Knowledge Management in NGOs - Andrii Vo...
 
Towards portable learning analytics dashboards - Andrii Vozniuk, Sten Govaert...
Towards portable learning analytics dashboards - Andrii Vozniuk, Sten Govaert...Towards portable learning analytics dashboards - Andrii Vozniuk, Sten Govaert...
Towards portable learning analytics dashboards - Andrii Vozniuk, Sten Govaert...
 
AngeLA: Putting the teacher in control of student privacy in the online class...
AngeLA: Putting the teacher in control of student privacy in the online class...AngeLA: Putting the teacher in control of student privacy in the online class...
AngeLA: Putting the teacher in control of student privacy in the online class...
 
Scheduling in distributed systems - Andrii Vozniuk
Scheduling in distributed systems - Andrii VozniukScheduling in distributed systems - Andrii Vozniuk
Scheduling in distributed systems - Andrii Vozniuk
 
Symbolic Reasoning and Concrete Execution - Andrii Vozniuk
Symbolic Reasoning and Concrete Execution - Andrii Vozniuk Symbolic Reasoning and Concrete Execution - Andrii Vozniuk
Symbolic Reasoning and Concrete Execution - Andrii Vozniuk
 

Recently uploaded

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 

Recently uploaded (20)

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 

Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk

  • 1. Cloud Infrastructure: GFS & MapReduce Andrii Vozniuk Used slides of: Jeff Dean, Ed Austin Data Management in the Cloud EPFL February 27, 2012
  • 2. Outline • Motivation • Problem Statement • Storage: Google File System (GFS) • Processing: MapReduce • Benchmarks • Conclusions
  • 3. Motivation • Huge amounts of data to store and process • Example @2004: – 20+ billion web pages x 20KB/page = 400+ TB – Reading from one disc 30-35 MB/s • Four months just to read the web • 1000 hard drives just to store the web • Even more complicated if we want to process data • Exp. growth. The solution should be scalable.
  • 4. Motivation • Buy super fast, ultra reliable hardware? – Ultra expensive – Controlled by third party – Internals can be hidden and proprietary – Hard to predict scalability – Fails less often, but still fails! – No suitable solution on the market
  • 5. Motivation • Use commodity hardware? Benefits: – Commodity machines offer much better perf/$ – Full control on and understanding of internals – Can be highly optimized for their workloads – Really smart people can do really smart things • Not that easy: – Fault tolerance: something breaks all the time – Applications development – Debugging, Optimization, Locality – Communication and coordination – Status reporting, monitoring • Handle all these issues for every problem you want to solve
  • 6. Problem Statement • Develop a scalable distributed file system for large data-intensive applications running on inexpensive commodity hardware • Develop a tool for processing large data sets in parallel on inexpensive commodity hardware • Develop the both in coordination for optimal performance
  • 7. Google Cluster Environment Cloud Datacenters Servers Racks Clusters
  • 8. Google Cluster Environment • @2009: – 200+ clusters – 1000+ machines in many of them – 4+ PB File systems – 40GB/s read/write load – Frequent HW failures – 100s to 1000s active jobs (1 to 1000 tasks) • Cluster is 1000s of machines – Stuff breaks: for 1000 – 1 per day, for 10000 – 10 per day – How to store data reliably with high throughput? – How to make it easy to develop distributed applications?
  • 10. Google Technology Stack Focus of this talk
  • 11. GF S Google File System* • Inexpensive commodity hardware • High Throughput > Low Latency • Large files (multi GB) • Multiple clients • Workload – Large streaming reads – Small random writes – Concurrent append to the same file * The Google File System. S. Ghemawat, H. Gobioff, S. Leung. SOSP, 2003
  • 12. GF S Architecture • User-level process running on commodity Linux machines • Consists of Master Server and Chunk Servers • Files broken into chunks (typically 64 MB), 3x redundancy (clusters, DCs) • Data transfers happen directly between clients and Chunk Servers
  • 13. GF S Master Node • Centralization for simplicity • Namespace and metadata management • Managing chunks – Where they are (file<-chunks, replicas) – Where to put new – When to re-replicate (failure, load-balancing) – When and what to delete (garbage collection) • Fault tolerance – Shadow masters – Monitoring infrastructure outside of GFS – Periodic snapshots – Mirrored operations log
  • 14. GF S Master Node • Metadata is in memory – it’s fast! – A 64 MB chunk needs less than 64B metadata => for 640 TB less than 640MB • Asks Chunk Servers when – Master starts – Chunk Server joins the cluster • Operation log – Is used for serialization of concurrent operations – Replicated – Respond to client only when log is flushed locally and remotely
  • 15. GF S Chunk Servers • 64MB chunks as Linux files – Reduce size of the master‘s datastructure – Reduce client-master interaction – Internal fragmentation => allocate space lazily – Possible hotspots => re-replicate • Fault tolerance – Heart-beat to the master – Something wrong => master inits replication
  • 16. GF S Mutation Order Current lease holder? Write request 3a. data identity of primary location of replicas (cached by client) Operation completed Operation completed or Error report 3b. data Primary assigns # to mutations Applies it Forwards write request 3c. data Operation completed
  • 17. GF S Control & Data Flow • Decouple control flow and data flow • Control flow – Master -> Primary -> Secondaries • Data flow – Carefully picked chain of Chunk Servers • Forward to the closest first • Distance estimated based on IP – Fully utilize outbound bandwidth – Pipelining to exploit full-duplex links
  • 18. GF S Other Important Things • Snapshot operation – make a copy very fast • Smart chunks creation policy – Below-average disk utilization, limited # of recent • Smart re-replication policy – Under replicated first – Chunks that are blocking client – Live files first (rather than deleted) • Rebalance and GC periodically How to process data stored in GFS?
  • 19. M M M R R MapReduce* • A simple programming model applicable to many large-scale computing problems • Divide and conquer strategy • Hide messy details in MapReduce runtime library: – Automatic parallelization – Load balancing – Network and disk transfer optimizations – Fault tolerance – Part of the stack: improvements to core library benefit all users of library *MapReduce: Simplified Data Processing on Large Clusters. J. Dean, S. Ghemawat. OSDI, 2004
  • 20. M M M R R Typical problem • Read a lot of data. Break it into the parts • Map: extract something important from each part • Shuffle and Sort • Reduce: aggregate, summarize, filter, transform Map results • Write the results • Chain, Cascade • Implement Map and Reduce to fit the problem
  • 21. M M M R R Nice Example
  • 22. M M M R R Other suitable examples • Distributed grep • Distributed sort (Hadoop MapReduce won TeraSort) • Term-vector per host • Document clustering • Machine learning • Web access log stats • Web link-graph reversal • Inverted index construction • Statistical machine translation
  • 23. M M M R R Model • Programmer specifies two primary methods – Map(k,v) -> <k’,v’>* – Reduce(k’, <v’>*) -> <k’,v’>* • All v’ with same k’ are reduced together, in order • Usually also specify how to partition k’ – Partition(k’, total partitions) -> partition for k’ • Often a simple hash of the key • Allows reduce operations for different k’ to be parallelized
  • 24. M M M R R Code Map map(String key, String value): // key: document name // value: document contents for each word w in value: EmitIntermediate(w, "1"); Reduce reduce(String key, Iterator values): // key: a word // values: a list of counts int result = 0; for each v in values: result += ParseInt(v); Emit(AsString(result))
  • 25. M M M R R Architecture • One master, many workers • Infrastructure manages scheduling and distribution • User implements Map and Reduce
  • 26. M M M R R Architecture
  • 27. M M M R R Architecture
  • 28. M M M R R Architecture Combiner = local Reduce
  • 29. M M M R R Important things • Mappers scheduled close to data • Chunk replication improves locality • Reducers often run on same machine as mappers • Fault tolerance – Map crash – re-launch all task of machine – Reduce crash – repeat crashed task only – Master crash – repeat whole job – Skip bad records • Fighting ‘stragglers’ by launching backup tasks • Proven scalability • September 2009 Google ran 3,467,000 MR Jobs averaging 488 machines per Job • Extensively used in Yahoo and Facebook with Hadoop
  • 30. GF S Benchmarks: GFS
  • 31. M M M R R Benchmarks: MapReduce
  • 32. Conclusions “We believe we get tremendous competitive advantage by essentially building our own infrastructure” -- Eric Schmidt • GFS & MapReduce – Google achieved their goals – A fundamental part of their stack • Open source implementations – GFS  Hadoop Distributed FS (HDFS) – MapReduce  Hadoop MapReduce
  • 33. Thank your for your attention! Andrii.Vozniuk@epfl.ch