SlideShare une entreprise Scribd logo
1  sur  36
Télécharger pour lire hors ligne
Map/Reduce


   Based on original presentation by
Christophe Bisciglia, Aaron Kimball, and
        Sierra Michels-Slettvet

  Except as otherwise noted, the content
 of this presentation is licensed under the
Creative Commons Attribution 2.5 License.
Functional Programming Review

   Functional operations do not modify data structures: They
    always create new ones
   Original data still exists in unmodified form
   Data flows are implicit in program design
   Order of operations does not matter
Functional Programming Review

fun foo(l: int list) =
 sum(l) + mul(l) + length(l)

 Order of sum() and mul(), etc does not matter – they do not
 modify l
Functional Updates Do Not Modify Structures

fun append(x, lst) =
 let lst' = reverse lst in
   reverse ( x :: lst' )


  The append() function above reverses a list, adds a new
  element to the front, and returns all of that, reversed,
  which appends an item.


  But it never modifies lst!
Functions Can Be Used As Arguments

fun DoDouble(f, x) = f (f x)

It does not matter what f does to its
argument; DoDouble() will do it twice.


What is the type of this function?
Map
map f a [] = f(a)
map f (a:as) = list(f(a), map(f, as))
 Creates a new list by applying f to each element of the input list; returns
  output in order. f


                         f


                               f


                                     f


                                          f


                                                f
Fold

fold f x0 lst: ('a*'b->'b)->'b->('a list)->'b
  Moves across a list, applying f to each element plus an accumulator. f
   returns the next accumulator value, which is combined with the next
   element of the list




                           f    f      f    f     f   returned
                 initial
fold left vs. fold right

   Order of list elements can be significant
   Fold left moves left-to-right across the list
   Fold right moves from right-to-left


SML Implementation:

fun foldl f a []      = a
  | foldl f a (x::xs) = foldl f (f(x, a)) xs

fun foldr f a []      = a
  | foldr f a (x::xs) = f(x, (foldr f a xs))
Example

fun foo(l: int list) =
 sum(l) + mul(l) + length(l)

How can we implement this?
Example (Solved)

fun foo(l: int list) =
 sum(l) + mul(l) + length(l)

fun sum(lst) = foldl (fn (x,a)=>x+a) 0 lst
fun mul(lst) = foldl (fn (x,a)=>x*a) 1 lst
fun length(lst) = foldl (fn (x,a)=>1+a) 0 lst
A More Complicated Fold Problem

   Given a list of numbers, how can we generate a list of partial
    sums?

e.g.: [1, 4, 8, 3, 7, 9] 
    [0, 1, 5, 13, 16, 23, 32]
A More Complicated Fold Problem

   Given a list of numbers, how can we generate a list of partial
    sums?

e.g.: [1, 4, 8, 3, 7, 9] 
    [0, 1, 5, 13, 16, 23, 32]
fun partialsum(lst) = foldl(fn(x,a) => list(a (last(a) + x))) 0 lst
A More Complicated Map Problem

   Given a list of words, can we: reverse the letters in each word,
    and reverse the whole list, so it all comes out backwards?

[“my”, “happy”, “cat”] -> [“tac”, “yppah”, “ym”]
A More Complicated Map Problem

   Given a list of words, can we: reverse the letters in each word,
    and reverse the whole list, so it all comes out backwards?

[“my”, “happy”, “cat”] -> [“tac”, “yppah”, “ym”]
fun reverse2(lst) = foldr(fn(x,a)=>list(a, reverseword(x)) [] lst
map Implementation
fun map f []      = []
  | map f (x::xs) = (f x) :: (map f xs)


   This implementation moves left-to-right across the list,
    mapping elements one at a time

   … But does it need to?
Implicit Parallelism In Map

   In a purely functional setting, elements of a list being computed by map
    cannot see the effects of the computations on other elements
   If order of application of f to elements in list is commutative, we can
    reorder or parallelize execution
   This is the insight behind MapReduce
Motivation: Large Scale Data Processing

   Want to process lots of data ( > 1 TB)
   Want to parallelize across hundreds/thousands of CPUs
   … Want to make this easy
    • Hide the details of parallelism, machine management, fault
      tolerance, etc.
Sample Applications
   Distributed Greo
   Count of URL Access Frequency
   Reverse Web-Lijk Graph
   Inverted Index
   Distributed Sort
MapReduce

   Automatic parallelization & distribution
   Fault-tolerant
   Provides status and monitoring tools
   Clean abstraction for programmers
Programming Model

   Borrows from functional programming
   Users implement interface of two functions:

    • map (in_key, in_value) ->
       (out_key, intermediate_value) list

    • reduce (out_key, intermediate_value list) ->
       out_value list
Map

   Records from the data source (lines out of files, rows of a
    database, etc) are fed into the map function as key*value pairs:
    e.g., (filename, line)
   map() produces one or more intermediate values along with an
    output key from the input
   Buffers intermediate values in memory before periodically
    writing to local disk
   Writes are split into R regions based on intermediate key value
    (e.g., hash(key) mod R)
     • Locations of regions communicated back to master who informs
       reduce tasks of all appropriate disk locations
Reduce

   After the map phase is over, all the intermediate values for a
    given output key are combined together into a list
     • RPC over GFS to gather all the keys for a given region
     • Sort all keys since the same key can in general come from
       multiple map processes
   reduce() combines those intermediate values into one or more
    final values for that same output key
   Optional combine() phase as an optimization
Input key *value                                                    Input key *value
                     pairs                                                               pairs



                                              ...

                                  map                                                                  map
Data store 1                                                        Data store n




                (key 1,          (key 2,       (key 3,                               (key 1,            (key 2,       (key 3,
               values ...)      values ...)   values ...)                           values ...)        values ...)   values ...)



                        == Barrier == : Aggregates intermediate values by output key

                                   key 1,                              key 2,                                 key 3,
                               intermediate                        intermediate                           intermediate
                                   values                              values                                 values


                             reduce                           reduce                                  reduce




                       final key 1                          final key 2                           final key 3
                          values                               values                                values
Parallelism

   map() functions run in parallel, creating different intermediate
    values from different input data sets
   reduce() functions also run in parallel, each working on a
    different output key
   All values are processed independently
   Bottleneck: reduce phase cannot start until map phase
    completes
Example: Count word occurrences
map(String input_key, String input_value):
  // input_key: document name
  // input_value: document contents
 for each word w in input_value:
   EmitIntermediate(w, "1");


reduce(String output_key, Iterator
  intermediate_values):
  // output_key: a word
  // output_values: a list of counts
 int result = 0;
 for each v in intermediate_values:
    result += ParseInt(v);
Emit(AsString(result));
Example vs. Actual Source Code

   Example is written in pseudo-code
   Actual implementation is in C++, using a MapReduce library
   Bindings for Python and Java exist via interfaces
   True code is somewhat more involved (defines how the input
    key/values are divided up and accessed, etc.)
Implementation
Locality

   Master program divides up tasks based on location of data:
    tries to have map() tasks on same machine as physical file data
    • Failing that, on the same switch where bandwidth is relatively
      plentiful
    • Datacenter communications architecture?
   map() task inputs are divided into 64 MB blocks: same size as
    Google File System chunks
Fault Tolerance

   Master detects worker failures
    • Re-executes completed & in-progress map() tasks
    • Re-executes in-progress reduce() tasks
    • Importance of deterministic operations
   Data written to temporary files by both map() and reduce()
    • Upon successful completion, map() tells master of file names
        Master ignores if already heard from another map on same task
    • Upon successful completion, reduce() atomically renames file
   Master notices particular input key/values cause crashes in
    map(), and skips those values on re-execution.
    • Effect: Can work around bugs in third-party libraries
Optimizations

   No reduce can start until map is complete:
     • A single slow disk controller can rate-limit the whole process
   Master redundantly executes “slow-moving” map tasks; uses
    results of first copy to finish




    Why is it safe to redundantly execute map tasks? Wouldn’t this mess up
    the total computation?
Optimizations

   “Combiner” functions can run on same machine as a mapper
   Causes a mini-reduce phase to occur before the real reduce
    phase, to save bandwidth
Performance Evaluation
   1800 Machines
    • Gigabit Ethernet Switches
    • Two level tree hierarchy, 100-200 Gbps at root
    • 4GB RAM, 2Ghz dual Intel processors
    • Two 160GB drives
   Grep: 10^10 100 byte records (1 TB)
    • Search for relatively rare 3-character sequence
   Sort: 10^10 100 byte records (1 TB)
Grep Data Transfer Rate
Sort: Normal Execution
Sort: No Backup Tasks
MapReduce Conclusions

   MapReduce has proven to be a useful abstraction
   Greatly simplifies large-scale computations at Google
   Functional programming paradigm can be applied to large-scale
    applications
   Focus on problem, let library deal w/ messy details

Contenu connexe

Tendances

A complete introduction on matlab and matlab's projects
A complete introduction on matlab and matlab's projectsA complete introduction on matlab and matlab's projects
A complete introduction on matlab and matlab's projectsMukesh Kumar
 
Haskell for data science
Haskell for data scienceHaskell for data science
Haskell for data scienceJohn Cant
 
N-Queens Combinatorial Problem - Polyglot FP for fun and profit - Haskell and...
N-Queens Combinatorial Problem - Polyglot FP for fun and profit - Haskell and...N-Queens Combinatorial Problem - Polyglot FP for fun and profit - Haskell and...
N-Queens Combinatorial Problem - Polyglot FP for fun and profit - Haskell and...Philip Schwarz
 
Quicksort - a whistle-stop tour of the algorithm in five languages and four p...
Quicksort - a whistle-stop tour of the algorithm in five languages and four p...Quicksort - a whistle-stop tour of the algorithm in five languages and four p...
Quicksort - a whistle-stop tour of the algorithm in five languages and four p...Philip Schwarz
 
Introduction to Monads in Scala (1)
Introduction to Monads in Scala (1)Introduction to Monads in Scala (1)
Introduction to Monads in Scala (1)stasimus
 
Functional programming with haskell
Functional programming with haskellFunctional programming with haskell
Functional programming with haskellfaradjpour
 
R Workshop for Beginners
R Workshop for BeginnersR Workshop for Beginners
R Workshop for BeginnersMetamarkets
 
18. Java associative arrays
18. Java associative arrays18. Java associative arrays
18. Java associative arraysIntro C# Book
 

Tendances (20)

A complete introduction on matlab and matlab's projects
A complete introduction on matlab and matlab's projectsA complete introduction on matlab and matlab's projects
A complete introduction on matlab and matlab's projects
 
Language R
Language RLanguage R
Language R
 
Haskell for data science
Haskell for data scienceHaskell for data science
Haskell for data science
 
Functional Programming
Functional ProgrammingFunctional Programming
Functional Programming
 
Programming in R
Programming in RProgramming in R
Programming in R
 
01. haskell introduction
01. haskell introduction01. haskell introduction
01. haskell introduction
 
R Basics
R BasicsR Basics
R Basics
 
N-Queens Combinatorial Problem - Polyglot FP for fun and profit - Haskell and...
N-Queens Combinatorial Problem - Polyglot FP for fun and profit - Haskell and...N-Queens Combinatorial Problem - Polyglot FP for fun and profit - Haskell and...
N-Queens Combinatorial Problem - Polyglot FP for fun and profit - Haskell and...
 
Quicksort - a whistle-stop tour of the algorithm in five languages and four p...
Quicksort - a whistle-stop tour of the algorithm in five languages and four p...Quicksort - a whistle-stop tour of the algorithm in five languages and four p...
Quicksort - a whistle-stop tour of the algorithm in five languages and four p...
 
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
 
Sql server lab_2
Sql server lab_2Sql server lab_2
Sql server lab_2
 
Introduction to Monads in Scala (1)
Introduction to Monads in Scala (1)Introduction to Monads in Scala (1)
Introduction to Monads in Scala (1)
 
BasicGraphsWithR
BasicGraphsWithRBasicGraphsWithR
BasicGraphsWithR
 
Data import-cheatsheet
Data import-cheatsheetData import-cheatsheet
Data import-cheatsheet
 
Functional programming with haskell
Functional programming with haskellFunctional programming with haskell
Functional programming with haskell
 
What is matlab
What is matlabWhat is matlab
What is matlab
 
R Workshop for Beginners
R Workshop for BeginnersR Workshop for Beginners
R Workshop for Beginners
 
18. Java associative arrays
18. Java associative arrays18. Java associative arrays
18. Java associative arrays
 
Lecture notesmap
Lecture notesmapLecture notesmap
Lecture notesmap
 
Scala collections
Scala collectionsScala collections
Scala collections
 

Similaire à Map Reduce

Map reduce (from Google)
Map reduce (from Google)Map reduce (from Google)
Map reduce (from Google)Sri Prasanna
 
Mapreduce: Theory and implementation
Mapreduce: Theory and implementationMapreduce: Theory and implementation
Mapreduce: Theory and implementationSri Prasanna
 
Distributed Computing Seminar - Lecture 2: MapReduce Theory and Implementation
Distributed Computing Seminar - Lecture 2: MapReduce Theory and ImplementationDistributed Computing Seminar - Lecture 2: MapReduce Theory and Implementation
Distributed Computing Seminar - Lecture 2: MapReduce Theory and Implementationtugrulh
 
Big data shim
Big data shimBig data shim
Big data shimtistrue
 
Lecture 3: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 3: Data-Intensive Computing for Text Analysis (Fall 2011)Lecture 3: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 3: Data-Intensive Computing for Text Analysis (Fall 2011)Matthew Lease
 
R Cheat Sheet for Data Analysts and Statisticians.pdf
R Cheat Sheet for Data Analysts and Statisticians.pdfR Cheat Sheet for Data Analysts and Statisticians.pdf
R Cheat Sheet for Data Analysts and Statisticians.pdfTimothy McBush Hiele
 
Parallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-JoinsParallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-JoinsJonny Daenen
 
Five Languages in a Moment
Five Languages in a MomentFive Languages in a Moment
Five Languages in a MomentSergio Gil
 
โปรแกรมย่อยและฟังชั่นมาตรฐาน ม.6 1
โปรแกรมย่อยและฟังชั่นมาตรฐาน ม.6 1โปรแกรมย่อยและฟังชั่นมาตรฐาน ม.6 1
โปรแกรมย่อยและฟังชั่นมาตรฐาน ม.6 1Little Tukta Lita
 
Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)Matthew Lease
 
Rcommands-for those who interested in R.
Rcommands-for those who interested in R.Rcommands-for those who interested in R.
Rcommands-for those who interested in R.Dr. Volkan OBAN
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query LanguageJulian Hyde
 
A Proposition for Business Process Modeling
A Proposition for Business Process ModelingA Proposition for Business Process Modeling
A Proposition for Business Process ModelingAng Chen
 
EX-6-Implement Matrix Multiplication with Hadoop Map Reduce.pptx
EX-6-Implement Matrix Multiplication with Hadoop Map Reduce.pptxEX-6-Implement Matrix Multiplication with Hadoop Map Reduce.pptx
EX-6-Implement Matrix Multiplication with Hadoop Map Reduce.pptxvishal choudhary
 
Is Haskell an acceptable Perl?
Is Haskell an acceptable Perl?Is Haskell an acceptable Perl?
Is Haskell an acceptable Perl?osfameron
 

Similaire à Map Reduce (20)

Map reduce (from Google)
Map reduce (from Google)Map reduce (from Google)
Map reduce (from Google)
 
Mapreduce: Theory and implementation
Mapreduce: Theory and implementationMapreduce: Theory and implementation
Mapreduce: Theory and implementation
 
Distributed Computing Seminar - Lecture 2: MapReduce Theory and Implementation
Distributed Computing Seminar - Lecture 2: MapReduce Theory and ImplementationDistributed Computing Seminar - Lecture 2: MapReduce Theory and Implementation
Distributed Computing Seminar - Lecture 2: MapReduce Theory and Implementation
 
Lec2 Mapred
Lec2 MapredLec2 Mapred
Lec2 Mapred
 
Big data shim
Big data shimBig data shim
Big data shim
 
Lecture 3: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 3: Data-Intensive Computing for Text Analysis (Fall 2011)Lecture 3: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 3: Data-Intensive Computing for Text Analysis (Fall 2011)
 
R Cheat Sheet for Data Analysts and Statisticians.pdf
R Cheat Sheet for Data Analysts and Statisticians.pdfR Cheat Sheet for Data Analysts and Statisticians.pdf
R Cheat Sheet for Data Analysts and Statisticians.pdf
 
Parallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-JoinsParallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-Joins
 
purrr.pdf
purrr.pdfpurrr.pdf
purrr.pdf
 
Five Languages in a Moment
Five Languages in a MomentFive Languages in a Moment
Five Languages in a Moment
 
โปรแกรมย่อยและฟังชั่นมาตรฐาน ม.6 1
โปรแกรมย่อยและฟังชั่นมาตรฐาน ม.6 1โปรแกรมย่อยและฟังชั่นมาตรฐาน ม.6 1
โปรแกรมย่อยและฟังชั่นมาตรฐาน ม.6 1
 
Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)
 
Rcommands-for those who interested in R.
Rcommands-for those who interested in R.Rcommands-for those who interested in R.
Rcommands-for those who interested in R.
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query Language
 
Hw09 Hadoop + Clojure
Hw09   Hadoop + ClojureHw09   Hadoop + Clojure
Hw09 Hadoop + Clojure
 
A Proposition for Business Process Modeling
A Proposition for Business Process ModelingA Proposition for Business Process Modeling
A Proposition for Business Process Modeling
 
EX-6-Implement Matrix Multiplication with Hadoop Map Reduce.pptx
EX-6-Implement Matrix Multiplication with Hadoop Map Reduce.pptxEX-6-Implement Matrix Multiplication with Hadoop Map Reduce.pptx
EX-6-Implement Matrix Multiplication with Hadoop Map Reduce.pptx
 
Is Haskell an acceptable Perl?
Is Haskell an acceptable Perl?Is Haskell an acceptable Perl?
Is Haskell an acceptable Perl?
 
Hadoop + Clojure
Hadoop + ClojureHadoop + Clojure
Hadoop + Clojure
 
Lec5
Lec5Lec5
Lec5
 

Plus de Sri Prasanna

Plus de Sri Prasanna (20)

Qr codes para tech radar
Qr codes para tech radarQr codes para tech radar
Qr codes para tech radar
 
Qr codes para tech radar 2
Qr codes para tech radar 2Qr codes para tech radar 2
Qr codes para tech radar 2
 
Test
TestTest
Test
 
Test
TestTest
Test
 
assds
assdsassds
assds
 
assds
assdsassds
assds
 
asdsa
asdsaasdsa
asdsa
 
dsd
dsddsd
dsd
 
About stacks
About stacksAbout stacks
About stacks
 
About Stacks
About  StacksAbout  Stacks
About Stacks
 
About Stacks
About  StacksAbout  Stacks
About Stacks
 
About Stacks
About  StacksAbout  Stacks
About Stacks
 
About Stacks
About  StacksAbout  Stacks
About Stacks
 
About Stacks
About  StacksAbout  Stacks
About Stacks
 
About Stacks
About StacksAbout Stacks
About Stacks
 
About Stacks
About StacksAbout Stacks
About Stacks
 
Network and distributed systems
Network and distributed systemsNetwork and distributed systems
Network and distributed systems
 
Introduction & Parellelization on large scale clusters
Introduction & Parellelization on large scale clustersIntroduction & Parellelization on large scale clusters
Introduction & Parellelization on large scale clusters
 
Other distributed systems
Other distributed systemsOther distributed systems
Other distributed systems
 
Distributed file systems
Distributed file systemsDistributed file systems
Distributed file systems
 

Dernier

P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfP4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfYu Kanazawa / Osaka University
 
Slides CapTechTalks Webinar March 2024 Joshua Sinai.pptx
Slides CapTechTalks Webinar March 2024 Joshua Sinai.pptxSlides CapTechTalks Webinar March 2024 Joshua Sinai.pptx
Slides CapTechTalks Webinar March 2024 Joshua Sinai.pptxCapitolTechU
 
Optical Fibre and It's Applications.pptx
Optical Fibre and It's Applications.pptxOptical Fibre and It's Applications.pptx
Optical Fibre and It's Applications.pptxPurva Nikam
 
How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17Celine George
 
Diploma in Nursing Admission Test Question Solution 2023.pdf
Diploma in Nursing Admission Test Question Solution 2023.pdfDiploma in Nursing Admission Test Question Solution 2023.pdf
Diploma in Nursing Admission Test Question Solution 2023.pdfMohonDas
 
Department of Health Compounder Question ‍Solution 2022.pdf
Department of Health Compounder Question ‍Solution 2022.pdfDepartment of Health Compounder Question ‍Solution 2022.pdf
Department of Health Compounder Question ‍Solution 2022.pdfMohonDas
 
5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...CaraSkikne1
 
Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...
Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...
Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...Dr. Asif Anas
 
The Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsThe Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsEugene Lysak
 
Quality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICEQuality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICESayali Powar
 
3.21.24 The Origins of Black Power.pptx
3.21.24  The Origins of Black Power.pptx3.21.24  The Origins of Black Power.pptx
3.21.24 The Origins of Black Power.pptxmary850239
 
How to Add a New Field in Existing Kanban View in Odoo 17
How to Add a New Field in Existing Kanban View in Odoo 17How to Add a New Field in Existing Kanban View in Odoo 17
How to Add a New Field in Existing Kanban View in Odoo 17Celine George
 
How to Create a Toggle Button in Odoo 17
How to Create a Toggle Button in Odoo 17How to Create a Toggle Button in Odoo 17
How to Create a Toggle Button in Odoo 17Celine George
 
How to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 SalesHow to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 SalesCeline George
 
ARTICULAR DISC OF TEMPOROMANDIBULAR JOINT
ARTICULAR DISC OF TEMPOROMANDIBULAR JOINTARTICULAR DISC OF TEMPOROMANDIBULAR JOINT
ARTICULAR DISC OF TEMPOROMANDIBULAR JOINTDR. SNEHA NAIR
 
KARNAADA.pptx made by - saransh dwivedi ( SD ) - SHALAKYA TANTRA - ENT - 4...
KARNAADA.pptx  made by -  saransh dwivedi ( SD ) -  SHALAKYA TANTRA - ENT - 4...KARNAADA.pptx  made by -  saransh dwivedi ( SD ) -  SHALAKYA TANTRA - ENT - 4...
KARNAADA.pptx made by - saransh dwivedi ( SD ) - SHALAKYA TANTRA - ENT - 4...M56BOOKSTORE PRODUCT/SERVICE
 
The basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxThe basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxheathfieldcps1
 
Ultra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxUltra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxDr. Asif Anas
 

Dernier (20)

P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfP4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
 
Slides CapTechTalks Webinar March 2024 Joshua Sinai.pptx
Slides CapTechTalks Webinar March 2024 Joshua Sinai.pptxSlides CapTechTalks Webinar March 2024 Joshua Sinai.pptx
Slides CapTechTalks Webinar March 2024 Joshua Sinai.pptx
 
Optical Fibre and It's Applications.pptx
Optical Fibre and It's Applications.pptxOptical Fibre and It's Applications.pptx
Optical Fibre and It's Applications.pptx
 
How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17
 
Diploma in Nursing Admission Test Question Solution 2023.pdf
Diploma in Nursing Admission Test Question Solution 2023.pdfDiploma in Nursing Admission Test Question Solution 2023.pdf
Diploma in Nursing Admission Test Question Solution 2023.pdf
 
Department of Health Compounder Question ‍Solution 2022.pdf
Department of Health Compounder Question ‍Solution 2022.pdfDepartment of Health Compounder Question ‍Solution 2022.pdf
Department of Health Compounder Question ‍Solution 2022.pdf
 
Prelims of Kant get Marx 2.0: a general politics quiz
Prelims of Kant get Marx 2.0: a general politics quizPrelims of Kant get Marx 2.0: a general politics quiz
Prelims of Kant get Marx 2.0: a general politics quiz
 
5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...
 
Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...
Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...
Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...
 
The Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsThe Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George Wells
 
Quality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICEQuality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICE
 
3.21.24 The Origins of Black Power.pptx
3.21.24  The Origins of Black Power.pptx3.21.24  The Origins of Black Power.pptx
3.21.24 The Origins of Black Power.pptx
 
How to Add a New Field in Existing Kanban View in Odoo 17
How to Add a New Field in Existing Kanban View in Odoo 17How to Add a New Field in Existing Kanban View in Odoo 17
How to Add a New Field in Existing Kanban View in Odoo 17
 
How to Create a Toggle Button in Odoo 17
How to Create a Toggle Button in Odoo 17How to Create a Toggle Button in Odoo 17
How to Create a Toggle Button in Odoo 17
 
March 2024 Directors Meeting, Division of Student Affairs and Academic Support
March 2024 Directors Meeting, Division of Student Affairs and Academic SupportMarch 2024 Directors Meeting, Division of Student Affairs and Academic Support
March 2024 Directors Meeting, Division of Student Affairs and Academic Support
 
How to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 SalesHow to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 Sales
 
ARTICULAR DISC OF TEMPOROMANDIBULAR JOINT
ARTICULAR DISC OF TEMPOROMANDIBULAR JOINTARTICULAR DISC OF TEMPOROMANDIBULAR JOINT
ARTICULAR DISC OF TEMPOROMANDIBULAR JOINT
 
KARNAADA.pptx made by - saransh dwivedi ( SD ) - SHALAKYA TANTRA - ENT - 4...
KARNAADA.pptx  made by -  saransh dwivedi ( SD ) -  SHALAKYA TANTRA - ENT - 4...KARNAADA.pptx  made by -  saransh dwivedi ( SD ) -  SHALAKYA TANTRA - ENT - 4...
KARNAADA.pptx made by - saransh dwivedi ( SD ) - SHALAKYA TANTRA - ENT - 4...
 
The basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxThe basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptx
 
Ultra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxUltra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptx
 

Map Reduce

  • 1. Map/Reduce Based on original presentation by Christophe Bisciglia, Aaron Kimball, and Sierra Michels-Slettvet Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License.
  • 2. Functional Programming Review  Functional operations do not modify data structures: They always create new ones  Original data still exists in unmodified form  Data flows are implicit in program design  Order of operations does not matter
  • 3. Functional Programming Review fun foo(l: int list) = sum(l) + mul(l) + length(l) Order of sum() and mul(), etc does not matter – they do not modify l
  • 4. Functional Updates Do Not Modify Structures fun append(x, lst) = let lst' = reverse lst in reverse ( x :: lst' ) The append() function above reverses a list, adds a new element to the front, and returns all of that, reversed, which appends an item. But it never modifies lst!
  • 5. Functions Can Be Used As Arguments fun DoDouble(f, x) = f (f x) It does not matter what f does to its argument; DoDouble() will do it twice. What is the type of this function?
  • 6. Map map f a [] = f(a) map f (a:as) = list(f(a), map(f, as)) Creates a new list by applying f to each element of the input list; returns output in order. f f f f f f
  • 7. Fold fold f x0 lst: ('a*'b->'b)->'b->('a list)->'b Moves across a list, applying f to each element plus an accumulator. f returns the next accumulator value, which is combined with the next element of the list f f f f f returned initial
  • 8. fold left vs. fold right  Order of list elements can be significant  Fold left moves left-to-right across the list  Fold right moves from right-to-left SML Implementation: fun foldl f a [] = a | foldl f a (x::xs) = foldl f (f(x, a)) xs fun foldr f a [] = a | foldr f a (x::xs) = f(x, (foldr f a xs))
  • 9. Example fun foo(l: int list) = sum(l) + mul(l) + length(l) How can we implement this?
  • 10. Example (Solved) fun foo(l: int list) = sum(l) + mul(l) + length(l) fun sum(lst) = foldl (fn (x,a)=>x+a) 0 lst fun mul(lst) = foldl (fn (x,a)=>x*a) 1 lst fun length(lst) = foldl (fn (x,a)=>1+a) 0 lst
  • 11. A More Complicated Fold Problem  Given a list of numbers, how can we generate a list of partial sums? e.g.: [1, 4, 8, 3, 7, 9]  [0, 1, 5, 13, 16, 23, 32]
  • 12. A More Complicated Fold Problem  Given a list of numbers, how can we generate a list of partial sums? e.g.: [1, 4, 8, 3, 7, 9]  [0, 1, 5, 13, 16, 23, 32] fun partialsum(lst) = foldl(fn(x,a) => list(a (last(a) + x))) 0 lst
  • 13. A More Complicated Map Problem  Given a list of words, can we: reverse the letters in each word, and reverse the whole list, so it all comes out backwards? [“my”, “happy”, “cat”] -> [“tac”, “yppah”, “ym”]
  • 14. A More Complicated Map Problem  Given a list of words, can we: reverse the letters in each word, and reverse the whole list, so it all comes out backwards? [“my”, “happy”, “cat”] -> [“tac”, “yppah”, “ym”] fun reverse2(lst) = foldr(fn(x,a)=>list(a, reverseword(x)) [] lst
  • 15. map Implementation fun map f [] = [] | map f (x::xs) = (f x) :: (map f xs)  This implementation moves left-to-right across the list, mapping elements one at a time  … But does it need to?
  • 16. Implicit Parallelism In Map  In a purely functional setting, elements of a list being computed by map cannot see the effects of the computations on other elements  If order of application of f to elements in list is commutative, we can reorder or parallelize execution  This is the insight behind MapReduce
  • 17. Motivation: Large Scale Data Processing  Want to process lots of data ( > 1 TB)  Want to parallelize across hundreds/thousands of CPUs  … Want to make this easy • Hide the details of parallelism, machine management, fault tolerance, etc.
  • 18. Sample Applications  Distributed Greo  Count of URL Access Frequency  Reverse Web-Lijk Graph  Inverted Index  Distributed Sort
  • 19. MapReduce  Automatic parallelization & distribution  Fault-tolerant  Provides status and monitoring tools  Clean abstraction for programmers
  • 20. Programming Model  Borrows from functional programming  Users implement interface of two functions: • map (in_key, in_value) -> (out_key, intermediate_value) list • reduce (out_key, intermediate_value list) -> out_value list
  • 21. Map  Records from the data source (lines out of files, rows of a database, etc) are fed into the map function as key*value pairs: e.g., (filename, line)  map() produces one or more intermediate values along with an output key from the input  Buffers intermediate values in memory before periodically writing to local disk  Writes are split into R regions based on intermediate key value (e.g., hash(key) mod R) • Locations of regions communicated back to master who informs reduce tasks of all appropriate disk locations
  • 22. Reduce  After the map phase is over, all the intermediate values for a given output key are combined together into a list • RPC over GFS to gather all the keys for a given region • Sort all keys since the same key can in general come from multiple map processes  reduce() combines those intermediate values into one or more final values for that same output key  Optional combine() phase as an optimization
  • 23. Input key *value Input key *value pairs pairs ... map map Data store 1 Data store n (key 1, (key 2, (key 3, (key 1, (key 2, (key 3, values ...) values ...) values ...) values ...) values ...) values ...) == Barrier == : Aggregates intermediate values by output key key 1, key 2, key 3, intermediate intermediate intermediate values values values reduce reduce reduce final key 1 final key 2 final key 3 values values values
  • 24. Parallelism  map() functions run in parallel, creating different intermediate values from different input data sets  reduce() functions also run in parallel, each working on a different output key  All values are processed independently  Bottleneck: reduce phase cannot start until map phase completes
  • 25. Example: Count word occurrences map(String input_key, String input_value): // input_key: document name // input_value: document contents for each word w in input_value: EmitIntermediate(w, "1"); reduce(String output_key, Iterator intermediate_values): // output_key: a word // output_values: a list of counts int result = 0; for each v in intermediate_values: result += ParseInt(v); Emit(AsString(result));
  • 26. Example vs. Actual Source Code  Example is written in pseudo-code  Actual implementation is in C++, using a MapReduce library  Bindings for Python and Java exist via interfaces  True code is somewhat more involved (defines how the input key/values are divided up and accessed, etc.)
  • 28. Locality  Master program divides up tasks based on location of data: tries to have map() tasks on same machine as physical file data • Failing that, on the same switch where bandwidth is relatively plentiful • Datacenter communications architecture?  map() task inputs are divided into 64 MB blocks: same size as Google File System chunks
  • 29. Fault Tolerance  Master detects worker failures • Re-executes completed & in-progress map() tasks • Re-executes in-progress reduce() tasks • Importance of deterministic operations  Data written to temporary files by both map() and reduce() • Upon successful completion, map() tells master of file names Master ignores if already heard from another map on same task • Upon successful completion, reduce() atomically renames file  Master notices particular input key/values cause crashes in map(), and skips those values on re-execution. • Effect: Can work around bugs in third-party libraries
  • 30. Optimizations  No reduce can start until map is complete: • A single slow disk controller can rate-limit the whole process  Master redundantly executes “slow-moving” map tasks; uses results of first copy to finish Why is it safe to redundantly execute map tasks? Wouldn’t this mess up the total computation?
  • 31. Optimizations  “Combiner” functions can run on same machine as a mapper  Causes a mini-reduce phase to occur before the real reduce phase, to save bandwidth
  • 32. Performance Evaluation  1800 Machines • Gigabit Ethernet Switches • Two level tree hierarchy, 100-200 Gbps at root • 4GB RAM, 2Ghz dual Intel processors • Two 160GB drives  Grep: 10^10 100 byte records (1 TB) • Search for relatively rare 3-character sequence  Sort: 10^10 100 byte records (1 TB)
  • 36. MapReduce Conclusions  MapReduce has proven to be a useful abstraction  Greatly simplifies large-scale computations at Google  Functional programming paradigm can be applied to large-scale applications  Focus on problem, let library deal w/ messy details