SlideShare a Scribd company logo
1 of 33
Cloud Infrastructure:
  GFS & MapReduce
          Andrii Vozniuk
  Used slides of: Jeff Dean, Ed Austin


Data Management in the Cloud
            EPFL
      February 27, 2012
Outline
•   Motivation
•   Problem Statement
•   Storage: Google File System (GFS)
•   Processing: MapReduce
•   Benchmarks
•   Conclusions
Motivation
• Huge amounts of data to store and process
• Example @2004:
  – 20+ billion web pages x 20KB/page = 400+ TB
  – Reading from one disc 30-35 MB/s
     • Four months just to read the web
     • 1000 hard drives just to store the web
     • Even more complicated if we want to process data
• Exp. growth. The solution should be scalable.
Motivation
• Buy super fast, ultra reliable hardware?
  – Ultra expensive
  – Controlled by third party
  – Internals can be hidden and proprietary
  – Hard to predict scalability
  – Fails less often, but still fails!
  – No suitable solution on the market
Motivation
• Use commodity hardware? Benefits:
   –   Commodity machines offer much better perf/$
   –   Full control on and understanding of internals
   –   Can be highly optimized for their workloads
   –   Really smart people can do really smart things

• Not that easy:
   – Fault tolerance: something breaks all the time
   – Applications development
   – Debugging, Optimization, Locality
   – Communication and coordination
   – Status reporting, monitoring
• Handle all these issues for every problem you want to solve
Problem Statement
• Develop a scalable distributed file system for
  large data-intensive applications running on
  inexpensive commodity hardware
• Develop a tool for processing large data sets in
  parallel on inexpensive commodity hardware
• Develop the both in coordination for optimal
  performance
Google Cluster Environment



          Cloud                  Datacenters




Servers       Racks   Clusters
Google Cluster Environment
• @2009:
   –   200+ clusters
   –   1000+ machines in many of them
   –   4+ PB File systems
   –   40GB/s read/write load
   –   Frequent HW failures
   –   100s to 1000s active jobs (1 to 1000 tasks)
• Cluster is 1000s of machines
   – Stuff breaks: for 1000 – 1 per day, for 10000 – 10 per day
   – How to store data reliably with high throughput?
   – How to make it easy to develop distributed applications?
Google Technology Stack
Google Technology Stack




                   Focus of
                   this talk
GF S          Google File System*
•   Inexpensive commodity hardware
•   High Throughput > Low Latency
•   Large files (multi GB)
•   Multiple clients
•   Workload
    – Large streaming reads
    – Small random writes
    – Concurrent append to the same file
* The Google File System. S. Ghemawat, H. Gobioff, S. Leung. SOSP, 2003
GF S                      Architecture




•   User-level process running on commodity Linux machines
•   Consists of Master Server and Chunk Servers
•   Files broken into chunks (typically 64 MB), 3x redundancy (clusters, DCs)
•   Data transfers happen directly between clients and Chunk Servers
GF S                  Master Node
• Centralization for simplicity
• Namespace and metadata management
• Managing chunks
   –   Where they are (file<-chunks, replicas)
   –   Where to put new
   –   When to re-replicate (failure, load-balancing)
   –   When and what to delete (garbage collection)
• Fault tolerance
   –   Shadow masters
   –   Monitoring infrastructure outside of GFS
   –   Periodic snapshots
   –   Mirrored operations log
GF S              Master Node
• Metadata is in memory – it’s fast!
  – A 64 MB chunk needs less than 64B metadata => for 640
    TB less than 640MB
• Asks Chunk Servers when
  – Master starts
  – Chunk Server joins the cluster
• Operation log
  – Is used for serialization of concurrent operations
  – Replicated
  – Respond to client only when log is flushed locally and
    remotely
GF S          Chunk Servers
• 64MB chunks as Linux files
  – Reduce size of the master‘s datastructure
  – Reduce client-master interaction
  – Internal fragmentation => allocate space lazily
  – Possible hotspots => re-replicate
• Fault tolerance
  – Heart-beat to the master
  – Something wrong => master inits replication
GF S                 Mutation Order
                             Current lease holder?
    Write request


                        3a. data          identity of primary
                                          location of replicas
                                          (cached by client)

Operation completed                Operation completed
or Error report         3b. data
                                          Primary assigns # to mutations
                                          Applies it
                                          Forwards write request

                        3c. data

                                   Operation completed
GF S         Control & Data Flow
• Decouple control flow and data flow
• Control flow
  – Master -> Primary -> Secondaries
• Data flow
  – Carefully picked chain of Chunk Servers
       • Forward to the closest first
       • Distance estimated based on IP
  – Fully utilize outbound bandwidth
  – Pipelining to exploit full-duplex links
GF S     Other Important Things
• Snapshot operation – make a copy very fast
• Smart chunks creation policy
  – Below-average disk utilization, limited # of recent
• Smart re-replication policy
  – Under replicated first
  – Chunks that are blocking client
  – Live files first (rather than deleted)
• Rebalance and GC periodically

          How to process data stored in GFS?
M M M

 R   R                MapReduce*
• A simple programming model applicable to many
  large-scale computing problems
• Divide and conquer strategy
• Hide messy details in MapReduce runtime library:
     –   Automatic parallelization
     –   Load balancing
     –   Network and disk transfer optimizations
     –   Fault tolerance
     –   Part of the stack: improvements to core library benefit
         all users of library
*MapReduce: Simplified Data Processing on Large Clusters.
J. Dean, S. Ghemawat. OSDI, 2004
M M M

R   R           Typical problem
• Read a lot of data. Break it into the parts
• Map: extract something important from each part
• Shuffle and Sort
• Reduce: aggregate, summarize, filter, transform Map results
• Write the results
• Chain, Cascade

• Implement Map and Reduce to fit the problem
M M M

R   R   Nice Example
M M M

R   R     Other suitable examples
•   Distributed grep
•   Distributed sort (Hadoop MapReduce won TeraSort)
•   Term-vector per host
•   Document clustering
•   Machine learning
•   Web access log stats
•   Web link-graph reversal
•   Inverted index construction
•   Statistical machine translation
M M M

R   R                        Model
• Programmer specifies two primary methods
    – Map(k,v) -> <k’,v’>*
    – Reduce(k’, <v’>*) -> <k’,v’>*
• All v’ with same k’ are reduced together, in order
• Usually also specify how to partition k’
    – Partition(k’, total partitions) -> partition for k’
        • Often a simple hash of the key
        • Allows reduce operations for different k’ to be parallelized
M M M

R   R                   Code
Map
    map(String key, String value):
      // key: document name
      // value: document contents
      for each word w in value:
          EmitIntermediate(w, "1");
Reduce
    reduce(String key, Iterator values):
      // key: a word
      // values: a list of counts
      int result = 0;
      for each v in values:
          result += ParseInt(v);
      Emit(AsString(result))
M M M

R   R                 Architecture




        • One master, many workers
        • Infrastructure manages scheduling and distribution
        • User implements Map and Reduce
M M M

R   R   Architecture
M M M

R   R   Architecture
M M M

R   R   Architecture




        Combiner = local Reduce
M M M

R   R               Important things
•   Mappers scheduled close to data
•   Chunk replication improves locality
•   Reducers often run on same machine as mappers
•   Fault tolerance
    –   Map crash – re-launch all task of machine
    –   Reduce crash – repeat crashed task only
    –   Master crash – repeat whole job
    –   Skip bad records
• Fighting ‘stragglers’ by launching backup tasks
• Proven scalability
• September 2009 Google ran 3,467,000 MR Jobs averaging
  488 machines per Job
• Extensively used in Yahoo and Facebook with Hadoop
GF S   Benchmarks: GFS
M M M

R   R   Benchmarks: MapReduce
Conclusions
“We believe we get tremendous competitive advantage
by essentially building our own infrastructure”
-- Eric Schmidt

• GFS & MapReduce
   – Google achieved their goals
   – A fundamental part of their stack
• Open source implementations
   – GFS  Hadoop Distributed FS (HDFS)
   – MapReduce  Hadoop MapReduce
Thank your for your attention!
   Andrii.Vozniuk@epfl.ch

More Related Content

What's hot

The Google File System (GFS)
The Google File System (GFS)The Google File System (GFS)
The Google File System (GFS)Romain Jacotin
 
Google File System
Google File SystemGoogle File System
Google File Systemguest2cb4689
 
advanced Google file System
advanced Google file Systemadvanced Google file System
advanced Google file Systemdiptipan
 
Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...Antonio Cesarano
 
Replication, Durability, and Disaster Recovery
Replication, Durability, and Disaster RecoveryReplication, Durability, and Disaster Recovery
Replication, Durability, and Disaster RecoverySteven Francia
 
google file system
google file systemgoogle file system
google file systemdiptipan
 
Google file system
Google file systemGoogle file system
Google file systemDhan V Sagar
 
Google File Systems
Google File SystemsGoogle File Systems
Google File SystemsAzeem Mumtaz
 
Google file system GFS
Google file system GFSGoogle file system GFS
Google file system GFSzihad164
 
Google File System
Google File SystemGoogle File System
Google File Systemnadikari123
 
Teoria efectului defectului hardware: GoogleFS
Teoria efectului defectului hardware: GoogleFSTeoria efectului defectului hardware: GoogleFS
Teoria efectului defectului hardware: GoogleFSAsociatia ProLinux
 
MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...
MapReduce:Simplified Data Processing on Large Cluster  Presented by Areej Qas...MapReduce:Simplified Data Processing on Large Cluster  Presented by Areej Qas...
MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...areej qasrawi
 

What's hot (20)

The Google File System (GFS)
The Google File System (GFS)The Google File System (GFS)
The Google File System (GFS)
 
Google file system
Google file systemGoogle file system
Google file system
 
Google File System
Google File SystemGoogle File System
Google File System
 
Google File System
Google File SystemGoogle File System
Google File System
 
advanced Google file System
advanced Google file Systemadvanced Google file System
advanced Google file System
 
Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...
 
Google file system
Google file systemGoogle file system
Google file system
 
Replication, Durability, and Disaster Recovery
Replication, Durability, and Disaster RecoveryReplication, Durability, and Disaster Recovery
Replication, Durability, and Disaster Recovery
 
Google File System
Google File SystemGoogle File System
Google File System
 
google file system
google file systemgoogle file system
google file system
 
Google file system
Google file systemGoogle file system
Google file system
 
Google File Systems
Google File SystemsGoogle File Systems
Google File Systems
 
Google file system GFS
Google file system GFSGoogle file system GFS
Google file system GFS
 
Google File System
Google File SystemGoogle File System
Google File System
 
Google file system
Google file systemGoogle file system
Google file system
 
Unit 2.pptx
Unit 2.pptxUnit 2.pptx
Unit 2.pptx
 
Teoria efectului defectului hardware: GoogleFS
Teoria efectului defectului hardware: GoogleFSTeoria efectului defectului hardware: GoogleFS
Teoria efectului defectului hardware: GoogleFS
 
Gfs介绍
Gfs介绍Gfs介绍
Gfs介绍
 
MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...
MapReduce:Simplified Data Processing on Large Cluster  Presented by Areej Qas...MapReduce:Simplified Data Processing on Large Cluster  Presented by Areej Qas...
MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...
 
Anatomy of file write in hadoop
Anatomy of file write in hadoopAnatomy of file write in hadoop
Anatomy of file write in hadoop
 

Similar to Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk

HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.pptvijayapraba1
 
Distributed Data processing in a Cloud
Distributed Data processing in a CloudDistributed Data processing in a Cloud
Distributed Data processing in a Cloudelliando dias
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...Reynold Xin
 
Map reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clustersMap reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clustersCleverence Kombe
 
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewDistributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewKonstantin V. Shvachko
 
MapReduce presentation
MapReduce presentationMapReduce presentation
MapReduce presentationVu Thi Trang
 
In memory grids IMDG
In memory grids IMDGIn memory grids IMDG
In memory grids IMDGPrateek Jain
 
Large scale computing with mapreduce
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreducehansen3032
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in JavaRuben Badaró
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingBikas Saha
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopHortonworks
 
Introduction to Hadoop and Big Data
Introduction to Hadoop and Big DataIntroduction to Hadoop and Big Data
Introduction to Hadoop and Big DataJoe Alex
 
Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez DataWorks Summit
 
Hadoop performance optimization tips
Hadoop performance optimization tipsHadoop performance optimization tips
Hadoop performance optimization tipsSubhas Kumar Ghosh
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introductionSandeep Singh
 

Similar to Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk (20)

try
trytry
try
 
Hadoop
HadoopHadoop
Hadoop
 
HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.ppt
 
Distributed Data processing in a Cloud
Distributed Data processing in a CloudDistributed Data processing in a Cloud
Distributed Data processing in a Cloud
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
 
Map reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clustersMap reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clusters
 
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewDistributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology Overview
 
MapReduce presentation
MapReduce presentationMapReduce presentation
MapReduce presentation
 
In memory grids IMDG
In memory grids IMDGIn memory grids IMDG
In memory grids IMDG
 
Hadoop - Introduction to HDFS
Hadoop - Introduction to HDFSHadoop - Introduction to HDFS
Hadoop - Introduction to HDFS
 
MapReduce.pptx
MapReduce.pptxMapReduce.pptx
MapReduce.pptx
 
Map reducecloudtech
Map reducecloudtechMap reducecloudtech
Map reducecloudtech
 
Large scale computing with mapreduce
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreduce
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in Java
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
 
Introduction to Hadoop and Big Data
Introduction to Hadoop and Big DataIntroduction to Hadoop and Big Data
Introduction to Hadoop and Big Data
 
Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez
 
Hadoop performance optimization tips
Hadoop performance optimization tipsHadoop performance optimization tips
Hadoop performance optimization tips
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
 

More from Andrii Vozniuk

Enhancing Social Media Platforms for Educational and Humanitarian Knowledge S...
Enhancing Social Media Platforms for Educational and Humanitarian Knowledge S...Enhancing Social Media Platforms for Educational and Humanitarian Knowledge S...
Enhancing Social Media Platforms for Educational and Humanitarian Knowledge S...Andrii Vozniuk
 
Embedded interactive learning analytics dashboards with Elasticsearch and Kib...
Embedded interactive learning analytics dashboards with Elasticsearch and Kib...Embedded interactive learning analytics dashboards with Elasticsearch and Kib...
Embedded interactive learning analytics dashboards with Elasticsearch and Kib...Andrii Vozniuk
 
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...Andrii Vozniuk
 
Combining content analytics and activity tracking to mine user interests and ...
Combining content analytics and activity tracking to mine user interests and ...Combining content analytics and activity tracking to mine user interests and ...
Combining content analytics and activity tracking to mine user interests and ...Andrii Vozniuk
 
TPC-DS performance evaluation for JAQL and PIG queries - Andrii Vozniuk, Serg...
TPC-DS performance evaluation for JAQL and PIG queries - Andrii Vozniuk, Serg...TPC-DS performance evaluation for JAQL and PIG queries - Andrii Vozniuk, Serg...
TPC-DS performance evaluation for JAQL and PIG queries - Andrii Vozniuk, Serg...Andrii Vozniuk
 
Contextual learning analytics apps to create awareness in blended inquiry lea...
Contextual learning analytics apps to create awareness in blended inquiry lea...Contextual learning analytics apps to create awareness in blended inquiry lea...
Contextual learning analytics apps to create awareness in blended inquiry lea...Andrii Vozniuk
 
Graspeo: a Social Media Platform for Knowledge Management in NGOs - Andrii Vo...
Graspeo: a Social Media Platform for Knowledge Management in NGOs - Andrii Vo...Graspeo: a Social Media Platform for Knowledge Management in NGOs - Andrii Vo...
Graspeo: a Social Media Platform for Knowledge Management in NGOs - Andrii Vo...Andrii Vozniuk
 
Towards portable learning analytics dashboards - Andrii Vozniuk, Sten Govaert...
Towards portable learning analytics dashboards - Andrii Vozniuk, Sten Govaert...Towards portable learning analytics dashboards - Andrii Vozniuk, Sten Govaert...
Towards portable learning analytics dashboards - Andrii Vozniuk, Sten Govaert...Andrii Vozniuk
 
AngeLA: Putting the teacher in control of student privacy in the online class...
AngeLA: Putting the teacher in control of student privacy in the online class...AngeLA: Putting the teacher in control of student privacy in the online class...
AngeLA: Putting the teacher in control of student privacy in the online class...Andrii Vozniuk
 
Scheduling in distributed systems - Andrii Vozniuk
Scheduling in distributed systems - Andrii VozniukScheduling in distributed systems - Andrii Vozniuk
Scheduling in distributed systems - Andrii VozniukAndrii Vozniuk
 
Symbolic Reasoning and Concrete Execution - Andrii Vozniuk
Symbolic Reasoning and Concrete Execution - Andrii Vozniuk Symbolic Reasoning and Concrete Execution - Andrii Vozniuk
Symbolic Reasoning and Concrete Execution - Andrii Vozniuk Andrii Vozniuk
 

More from Andrii Vozniuk (11)

Enhancing Social Media Platforms for Educational and Humanitarian Knowledge S...
Enhancing Social Media Platforms for Educational and Humanitarian Knowledge S...Enhancing Social Media Platforms for Educational and Humanitarian Knowledge S...
Enhancing Social Media Platforms for Educational and Humanitarian Knowledge S...
 
Embedded interactive learning analytics dashboards with Elasticsearch and Kib...
Embedded interactive learning analytics dashboards with Elasticsearch and Kib...Embedded interactive learning analytics dashboards with Elasticsearch and Kib...
Embedded interactive learning analytics dashboards with Elasticsearch and Kib...
 
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
 
Combining content analytics and activity tracking to mine user interests and ...
Combining content analytics and activity tracking to mine user interests and ...Combining content analytics and activity tracking to mine user interests and ...
Combining content analytics and activity tracking to mine user interests and ...
 
TPC-DS performance evaluation for JAQL and PIG queries - Andrii Vozniuk, Serg...
TPC-DS performance evaluation for JAQL and PIG queries - Andrii Vozniuk, Serg...TPC-DS performance evaluation for JAQL and PIG queries - Andrii Vozniuk, Serg...
TPC-DS performance evaluation for JAQL and PIG queries - Andrii Vozniuk, Serg...
 
Contextual learning analytics apps to create awareness in blended inquiry lea...
Contextual learning analytics apps to create awareness in blended inquiry lea...Contextual learning analytics apps to create awareness in blended inquiry lea...
Contextual learning analytics apps to create awareness in blended inquiry lea...
 
Graspeo: a Social Media Platform for Knowledge Management in NGOs - Andrii Vo...
Graspeo: a Social Media Platform for Knowledge Management in NGOs - Andrii Vo...Graspeo: a Social Media Platform for Knowledge Management in NGOs - Andrii Vo...
Graspeo: a Social Media Platform for Knowledge Management in NGOs - Andrii Vo...
 
Towards portable learning analytics dashboards - Andrii Vozniuk, Sten Govaert...
Towards portable learning analytics dashboards - Andrii Vozniuk, Sten Govaert...Towards portable learning analytics dashboards - Andrii Vozniuk, Sten Govaert...
Towards portable learning analytics dashboards - Andrii Vozniuk, Sten Govaert...
 
AngeLA: Putting the teacher in control of student privacy in the online class...
AngeLA: Putting the teacher in control of student privacy in the online class...AngeLA: Putting the teacher in control of student privacy in the online class...
AngeLA: Putting the teacher in control of student privacy in the online class...
 
Scheduling in distributed systems - Andrii Vozniuk
Scheduling in distributed systems - Andrii VozniukScheduling in distributed systems - Andrii Vozniuk
Scheduling in distributed systems - Andrii Vozniuk
 
Symbolic Reasoning and Concrete Execution - Andrii Vozniuk
Symbolic Reasoning and Concrete Execution - Andrii Vozniuk Symbolic Reasoning and Concrete Execution - Andrii Vozniuk
Symbolic Reasoning and Concrete Execution - Andrii Vozniuk
 

Recently uploaded

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 

Recently uploaded (20)

DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 

Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk

  • 1. Cloud Infrastructure: GFS & MapReduce Andrii Vozniuk Used slides of: Jeff Dean, Ed Austin Data Management in the Cloud EPFL February 27, 2012
  • 2. Outline • Motivation • Problem Statement • Storage: Google File System (GFS) • Processing: MapReduce • Benchmarks • Conclusions
  • 3. Motivation • Huge amounts of data to store and process • Example @2004: – 20+ billion web pages x 20KB/page = 400+ TB – Reading from one disc 30-35 MB/s • Four months just to read the web • 1000 hard drives just to store the web • Even more complicated if we want to process data • Exp. growth. The solution should be scalable.
  • 4. Motivation • Buy super fast, ultra reliable hardware? – Ultra expensive – Controlled by third party – Internals can be hidden and proprietary – Hard to predict scalability – Fails less often, but still fails! – No suitable solution on the market
  • 5. Motivation • Use commodity hardware? Benefits: – Commodity machines offer much better perf/$ – Full control on and understanding of internals – Can be highly optimized for their workloads – Really smart people can do really smart things • Not that easy: – Fault tolerance: something breaks all the time – Applications development – Debugging, Optimization, Locality – Communication and coordination – Status reporting, monitoring • Handle all these issues for every problem you want to solve
  • 6. Problem Statement • Develop a scalable distributed file system for large data-intensive applications running on inexpensive commodity hardware • Develop a tool for processing large data sets in parallel on inexpensive commodity hardware • Develop the both in coordination for optimal performance
  • 7. Google Cluster Environment Cloud Datacenters Servers Racks Clusters
  • 8. Google Cluster Environment • @2009: – 200+ clusters – 1000+ machines in many of them – 4+ PB File systems – 40GB/s read/write load – Frequent HW failures – 100s to 1000s active jobs (1 to 1000 tasks) • Cluster is 1000s of machines – Stuff breaks: for 1000 – 1 per day, for 10000 – 10 per day – How to store data reliably with high throughput? – How to make it easy to develop distributed applications?
  • 10. Google Technology Stack Focus of this talk
  • 11. GF S Google File System* • Inexpensive commodity hardware • High Throughput > Low Latency • Large files (multi GB) • Multiple clients • Workload – Large streaming reads – Small random writes – Concurrent append to the same file * The Google File System. S. Ghemawat, H. Gobioff, S. Leung. SOSP, 2003
  • 12. GF S Architecture • User-level process running on commodity Linux machines • Consists of Master Server and Chunk Servers • Files broken into chunks (typically 64 MB), 3x redundancy (clusters, DCs) • Data transfers happen directly between clients and Chunk Servers
  • 13. GF S Master Node • Centralization for simplicity • Namespace and metadata management • Managing chunks – Where they are (file<-chunks, replicas) – Where to put new – When to re-replicate (failure, load-balancing) – When and what to delete (garbage collection) • Fault tolerance – Shadow masters – Monitoring infrastructure outside of GFS – Periodic snapshots – Mirrored operations log
  • 14. GF S Master Node • Metadata is in memory – it’s fast! – A 64 MB chunk needs less than 64B metadata => for 640 TB less than 640MB • Asks Chunk Servers when – Master starts – Chunk Server joins the cluster • Operation log – Is used for serialization of concurrent operations – Replicated – Respond to client only when log is flushed locally and remotely
  • 15. GF S Chunk Servers • 64MB chunks as Linux files – Reduce size of the master‘s datastructure – Reduce client-master interaction – Internal fragmentation => allocate space lazily – Possible hotspots => re-replicate • Fault tolerance – Heart-beat to the master – Something wrong => master inits replication
  • 16. GF S Mutation Order Current lease holder? Write request 3a. data identity of primary location of replicas (cached by client) Operation completed Operation completed or Error report 3b. data Primary assigns # to mutations Applies it Forwards write request 3c. data Operation completed
  • 17. GF S Control & Data Flow • Decouple control flow and data flow • Control flow – Master -> Primary -> Secondaries • Data flow – Carefully picked chain of Chunk Servers • Forward to the closest first • Distance estimated based on IP – Fully utilize outbound bandwidth – Pipelining to exploit full-duplex links
  • 18. GF S Other Important Things • Snapshot operation – make a copy very fast • Smart chunks creation policy – Below-average disk utilization, limited # of recent • Smart re-replication policy – Under replicated first – Chunks that are blocking client – Live files first (rather than deleted) • Rebalance and GC periodically How to process data stored in GFS?
  • 19. M M M R R MapReduce* • A simple programming model applicable to many large-scale computing problems • Divide and conquer strategy • Hide messy details in MapReduce runtime library: – Automatic parallelization – Load balancing – Network and disk transfer optimizations – Fault tolerance – Part of the stack: improvements to core library benefit all users of library *MapReduce: Simplified Data Processing on Large Clusters. J. Dean, S. Ghemawat. OSDI, 2004
  • 20. M M M R R Typical problem • Read a lot of data. Break it into the parts • Map: extract something important from each part • Shuffle and Sort • Reduce: aggregate, summarize, filter, transform Map results • Write the results • Chain, Cascade • Implement Map and Reduce to fit the problem
  • 21. M M M R R Nice Example
  • 22. M M M R R Other suitable examples • Distributed grep • Distributed sort (Hadoop MapReduce won TeraSort) • Term-vector per host • Document clustering • Machine learning • Web access log stats • Web link-graph reversal • Inverted index construction • Statistical machine translation
  • 23. M M M R R Model • Programmer specifies two primary methods – Map(k,v) -> <k’,v’>* – Reduce(k’, <v’>*) -> <k’,v’>* • All v’ with same k’ are reduced together, in order • Usually also specify how to partition k’ – Partition(k’, total partitions) -> partition for k’ • Often a simple hash of the key • Allows reduce operations for different k’ to be parallelized
  • 24. M M M R R Code Map map(String key, String value): // key: document name // value: document contents for each word w in value: EmitIntermediate(w, "1"); Reduce reduce(String key, Iterator values): // key: a word // values: a list of counts int result = 0; for each v in values: result += ParseInt(v); Emit(AsString(result))
  • 25. M M M R R Architecture • One master, many workers • Infrastructure manages scheduling and distribution • User implements Map and Reduce
  • 26. M M M R R Architecture
  • 27. M M M R R Architecture
  • 28. M M M R R Architecture Combiner = local Reduce
  • 29. M M M R R Important things • Mappers scheduled close to data • Chunk replication improves locality • Reducers often run on same machine as mappers • Fault tolerance – Map crash – re-launch all task of machine – Reduce crash – repeat crashed task only – Master crash – repeat whole job – Skip bad records • Fighting ‘stragglers’ by launching backup tasks • Proven scalability • September 2009 Google ran 3,467,000 MR Jobs averaging 488 machines per Job • Extensively used in Yahoo and Facebook with Hadoop
  • 30. GF S Benchmarks: GFS
  • 31. M M M R R Benchmarks: MapReduce
  • 32. Conclusions “We believe we get tremendous competitive advantage by essentially building our own infrastructure” -- Eric Schmidt • GFS & MapReduce – Google achieved their goals – A fundamental part of their stack • Open source implementations – GFS  Hadoop Distributed FS (HDFS) – MapReduce  Hadoop MapReduce
  • 33. Thank your for your attention! Andrii.Vozniuk@epfl.ch