SlideShare une entreprise Scribd logo
1  sur  24
Map & Reduce Christopher Schleiden, Christian Corsten, Michael Lottko, Jinhui Li 1 The slides are licensed under aCreative Commons Attribution 3.0 License
Outline Motivation Concept Parallel Map & Reduce Google’s MapReduce Example: Word Count Demo: Hadoop Summary Web Technologies 2
Today the web is all about data! Google Processing of 20 PB/day (2008) LHC Will generate about 15PB/year Facebook 2.5 PB of data + 15 TB/day (4/2009) 3 BUT: It takes ~2.5 hours to read one terabyte off a typical hard disk!
4 Solution: Going Parallel! Data Distribution However, parallel programming is hard!  Synchronization Load Balancing …
Map & Reduce Programming model and Framework  Designed for large volumes of data in parallel Based on functional map and reduce concept e.g., Output of functions only depends on their input, there are no side-effects 5
Functional Concept Map Apply function to each value of a sequence map(k,v)  <k’, v’>* Reduce/Fold Combine all elements of a sequence using binary operator  reduce(k’, <v’>*) <k’, v’>* 6
Typical problem Iterate over large number of records Extract something interesting Shuffle & sort intermediate results Aggregate intermediate results Write final output 7 Map Reduce
Parallel Map & Reduce 8
Parallel Map & Reduce Published (2004) and patented (2010) by Google Inc C++ Runtime with Bindings to Java/Python Other Implementations: Apache Hadoop/Hive project (Java) Developed at Yahoo! Used by: Facebook Hulu IBM And many more Microsoft COSMOS (Scope, based on SQL and C#) Starfish (Ruby) …  9 Footer Text
Parallel Map & Reduce /2 Parallel execution of Map and Reduce stages Scheduling through Master/Worker pattern Runtime handles: Assigning workers to map and reduce tasks Data distribution Detects crashed workers 10
Parallel Map & Reduce Execution 11 Map Reduce Input Output Shuffle & Sort D RE A SU T LT A
Components in Google’s MapReduce Web Technologies 12
Google Filesystem (GFS) Stores… Input data Intermediate results Final results …in 64MB chunks on at least three different machines Web Technologies 13 File Nodes
Scheduling (Master/Worker) One master, many worker Input data split into M map tasks (~64MB in Size; GFS) Reduce phase partitioned into R tasks Tasks are assigned to workers dynamically Master assigns each map task to a free worker Master assigns each reducetask to a free worker Fault handling via Redundancy Master checks if Worker still alive via heart-beat Reschedules work item if worker has died Web Technologies 14
Scheduling Example 15 Map Reduce Input Output Temp Master Assign map Assign reduce D Worker Worker RES A Worker T Worker ULT Worker A
Googles M&R vsHadoop Google MapReduce Main language: C++ Google Filesystem (GFS) GFS Master GFS chunkserver HadoopMapReduce Main language: Java HadoopFilesystem (HDFS) Hadoopnamenode Hadoopdatanode Web Technologies 16
Word Count The Map & Reduce “Hello World” example 17
Word Count - Input Set of text files: Expected Output: sweet (1), this (2), is (2), the (2), foo (1), bar (1), file (1) 18 bar.txt This is the bar file foo.txt Sweet, this is the foo file
Word Count - Map Mapper(filename, file-contents): for each word emit(word,1) Output this (1) is (1) the (1) sweet (1) this (1) the (1)  is (1)  foo (1)  bar (1)  file (1) 19
Word Count – Shuffle Sort this (1) is (1) the (1) sweet (1) this (1) the (1)  is (1)  foo (1)  bar (1)  file (1) this (1) this (1) is (1) is (1)  the (1) the (1)  sweet (1) foo (1)  bar (1)  file (1) 20
Word Count - Reduce reducer(word, values): sum = 0 for each value in values: sum = sum + value emit(word,sum) Output sweet (1) this (2) is (2) the (2) foo (1) bar (1)  file (1) 21
DEMO Hadoop – Word Count 22
Summary Lots of data processed on the web (e.g., Google) Performance solution: Go parallel Input, Map, Shuffle & Sort, Reduce, Output Google File System Scheduling: Master/Worker Word Count example Hadoop Questions? Web Technologies 23
References Inspirations for presentation http://www4.informatik.uni-erlangen.de/Lehre/WS10/V_MW/Uebung/folien/05-Map-Reduce-Framework.pdf http://www.scribd.com/doc/23844299/Map-Reduce-Hadoop-Pig RWTH Map Reduce Talk: http://bit.ly/f5oM7p Paper Dean et al, MapReduce: Simplified Data Processing on Large Clusters, OSDI'04: Sixth Symposium on Operating System Design and Implementation, San Francisco, CA, December, 2004 Ghemawat et al, The Google File System, 19th ACM Symposium on Operating Systems Principles, Lake George, NY, October, 2003. 24

Contenu connexe

Tendances

EDF2012 Kostas Tzouma - Linking and analyzing bigdata - Stratosphere
EDF2012   Kostas Tzouma - Linking and analyzing bigdata - StratosphereEDF2012   Kostas Tzouma - Linking and analyzing bigdata - Stratosphere
EDF2012 Kostas Tzouma - Linking and analyzing bigdata - Stratosphere
European Data Forum
 
Scrap Your MapReduce - Apache Spark
 Scrap Your MapReduce - Apache Spark Scrap Your MapReduce - Apache Spark
Scrap Your MapReduce - Apache Spark
IndicThreads
 
MapMap-Reduce recipes in with c#
MapMap-Reduce recipes in with c#MapMap-Reduce recipes in with c#
MapMap-Reduce recipes in with c#
Erik Lebel
 
Ronalao termpresent
Ronalao termpresentRonalao termpresent
Ronalao termpresent
Elma Belitz
 
Amazon-style shopping cart analysis using MapReduce on a Hadoop cluster
Amazon-style shopping cart analysis using MapReduce on a Hadoop clusterAmazon-style shopping cart analysis using MapReduce on a Hadoop cluster
Amazon-style shopping cart analysis using MapReduce on a Hadoop cluster
Asociatia ProLinux
 
Presentation July 22nd
Presentation July 22ndPresentation July 22nd
Presentation July 22nd
yujin tang
 

Tendances (20)

Riding the Elephant - Hadoop 2.0
Riding the Elephant - Hadoop 2.0Riding the Elephant - Hadoop 2.0
Riding the Elephant - Hadoop 2.0
 
Hadoop map reduce concepts
Hadoop map reduce conceptsHadoop map reduce concepts
Hadoop map reduce concepts
 
Map-Side Merge Joins for Scalable SPARQL BGP Processing
Map-Side Merge Joins for Scalable SPARQL BGP ProcessingMap-Side Merge Joins for Scalable SPARQL BGP Processing
Map-Side Merge Joins for Scalable SPARQL BGP Processing
 
Hadoop institutes-in-bangalore
Hadoop institutes-in-bangaloreHadoop institutes-in-bangalore
Hadoop institutes-in-bangalore
 
Relational Algebra and MapReduce
Relational Algebra and MapReduceRelational Algebra and MapReduce
Relational Algebra and MapReduce
 
C++ on its way to exascale and beyond -- The HPX Parallel Runtime System
C++ on its way to exascale and beyond -- The HPX Parallel Runtime SystemC++ on its way to exascale and beyond -- The HPX Parallel Runtime System
C++ on its way to exascale and beyond -- The HPX Parallel Runtime System
 
Mapreduce introduction
Mapreduce introductionMapreduce introduction
Mapreduce introduction
 
Graph 500 DISLIB powered optimized version
Graph 500 DISLIB powered optimized versionGraph 500 DISLIB powered optimized version
Graph 500 DISLIB powered optimized version
 
Hive integration: HBase and Rcfile__HadoopSummit2010
Hive integration: HBase and Rcfile__HadoopSummit2010Hive integration: HBase and Rcfile__HadoopSummit2010
Hive integration: HBase and Rcfile__HadoopSummit2010
 
EDF2012 Kostas Tzouma - Linking and analyzing bigdata - Stratosphere
EDF2012   Kostas Tzouma - Linking and analyzing bigdata - StratosphereEDF2012   Kostas Tzouma - Linking and analyzing bigdata - Stratosphere
EDF2012 Kostas Tzouma - Linking and analyzing bigdata - Stratosphere
 
Stratosphere with big_data_analytics
Stratosphere with big_data_analyticsStratosphere with big_data_analytics
Stratosphere with big_data_analytics
 
Scrap Your MapReduce - Apache Spark
 Scrap Your MapReduce - Apache Spark Scrap Your MapReduce - Apache Spark
Scrap Your MapReduce - Apache Spark
 
Applying stratosphere for big data analytics
Applying stratosphere for big data analyticsApplying stratosphere for big data analytics
Applying stratosphere for big data analytics
 
MapMap-Reduce recipes in with c#
MapMap-Reduce recipes in with c#MapMap-Reduce recipes in with c#
MapMap-Reduce recipes in with c#
 
Ronalao termpresent
Ronalao termpresentRonalao termpresent
Ronalao termpresent
 
CartoType & OpenStreetMap
CartoType & OpenStreetMapCartoType & OpenStreetMap
CartoType & OpenStreetMap
 
OWL reasoning with WebPIE: calculating the closer of 100 billion triples
OWL reasoning with WebPIE: calculating the closer of 100 billion triplesOWL reasoning with WebPIE: calculating the closer of 100 billion triples
OWL reasoning with WebPIE: calculating the closer of 100 billion triples
 
Towards a Green Ranking for Programming Languages
Towards a Green Ranking for Programming LanguagesTowards a Green Ranking for Programming Languages
Towards a Green Ranking for Programming Languages
 
Amazon-style shopping cart analysis using MapReduce on a Hadoop cluster
Amazon-style shopping cart analysis using MapReduce on a Hadoop clusterAmazon-style shopping cart analysis using MapReduce on a Hadoop cluster
Amazon-style shopping cart analysis using MapReduce on a Hadoop cluster
 
Presentation July 22nd
Presentation July 22ndPresentation July 22nd
Presentation July 22nd
 

Similaire à Map and Reduce

HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
Xiao Qin
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
Mohamed Ali Mahmoud khouder
 
Intermachine Parallelism
Intermachine ParallelismIntermachine Parallelism
Intermachine Parallelism
Sri Prasanna
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
gothicane
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
rantav
 
Hadoop interview question
Hadoop interview questionHadoop interview question
Hadoop interview question
pappupassindia
 
Streaming Python on Hadoop
Streaming Python on HadoopStreaming Python on Hadoop
Streaming Python on Hadoop
Vivian S. Zhang
 

Similaire à Map and Reduce (20)

HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
 
Map reduce and Hadoop on windows
Map reduce and Hadoop on windowsMap reduce and Hadoop on windows
Map reduce and Hadoop on windows
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Intermachine Parallelism
Intermachine ParallelismIntermachine Parallelism
Intermachine Parallelism
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
 
An Enhanced MapReduce Model (on BSP)
An Enhanced MapReduce Model (on BSP)An Enhanced MapReduce Model (on BSP)
An Enhanced MapReduce Model (on BSP)
 
Meethadoop
MeethadoopMeethadoop
Meethadoop
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Sparse matrix computations in MapReduce
Sparse matrix computations in MapReduceSparse matrix computations in MapReduce
Sparse matrix computations in MapReduce
 
Embarrassingly/Delightfully Parallel Problems
Embarrassingly/Delightfully Parallel ProblemsEmbarrassingly/Delightfully Parallel Problems
Embarrassingly/Delightfully Parallel Problems
 
mapReduce.pptx
mapReduce.pptxmapReduce.pptx
mapReduce.pptx
 
Hadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesHadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologies
 
Hadoop interview question
Hadoop interview questionHadoop interview question
Hadoop interview question
 
ch02-mapreduce.pptx
ch02-mapreduce.pptxch02-mapreduce.pptx
ch02-mapreduce.pptx
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologies
 
Streaming Python on Hadoop
Streaming Python on HadoopStreaming Python on Hadoop
Streaming Python on Hadoop
 
MapReduce
MapReduceMapReduce
MapReduce
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 

Map and Reduce

  • 1. Map & Reduce Christopher Schleiden, Christian Corsten, Michael Lottko, Jinhui Li 1 The slides are licensed under aCreative Commons Attribution 3.0 License
  • 2. Outline Motivation Concept Parallel Map & Reduce Google’s MapReduce Example: Word Count Demo: Hadoop Summary Web Technologies 2
  • 3. Today the web is all about data! Google Processing of 20 PB/day (2008) LHC Will generate about 15PB/year Facebook 2.5 PB of data + 15 TB/day (4/2009) 3 BUT: It takes ~2.5 hours to read one terabyte off a typical hard disk!
  • 4. 4 Solution: Going Parallel! Data Distribution However, parallel programming is hard! Synchronization Load Balancing …
  • 5. Map & Reduce Programming model and Framework Designed for large volumes of data in parallel Based on functional map and reduce concept e.g., Output of functions only depends on their input, there are no side-effects 5
  • 6. Functional Concept Map Apply function to each value of a sequence map(k,v)  <k’, v’>* Reduce/Fold Combine all elements of a sequence using binary operator reduce(k’, <v’>*) <k’, v’>* 6
  • 7. Typical problem Iterate over large number of records Extract something interesting Shuffle & sort intermediate results Aggregate intermediate results Write final output 7 Map Reduce
  • 8. Parallel Map & Reduce 8
  • 9. Parallel Map & Reduce Published (2004) and patented (2010) by Google Inc C++ Runtime with Bindings to Java/Python Other Implementations: Apache Hadoop/Hive project (Java) Developed at Yahoo! Used by: Facebook Hulu IBM And many more Microsoft COSMOS (Scope, based on SQL and C#) Starfish (Ruby) … 9 Footer Text
  • 10. Parallel Map & Reduce /2 Parallel execution of Map and Reduce stages Scheduling through Master/Worker pattern Runtime handles: Assigning workers to map and reduce tasks Data distribution Detects crashed workers 10
  • 11. Parallel Map & Reduce Execution 11 Map Reduce Input Output Shuffle & Sort D RE A SU T LT A
  • 12. Components in Google’s MapReduce Web Technologies 12
  • 13. Google Filesystem (GFS) Stores… Input data Intermediate results Final results …in 64MB chunks on at least three different machines Web Technologies 13 File Nodes
  • 14. Scheduling (Master/Worker) One master, many worker Input data split into M map tasks (~64MB in Size; GFS) Reduce phase partitioned into R tasks Tasks are assigned to workers dynamically Master assigns each map task to a free worker Master assigns each reducetask to a free worker Fault handling via Redundancy Master checks if Worker still alive via heart-beat Reschedules work item if worker has died Web Technologies 14
  • 15. Scheduling Example 15 Map Reduce Input Output Temp Master Assign map Assign reduce D Worker Worker RES A Worker T Worker ULT Worker A
  • 16. Googles M&R vsHadoop Google MapReduce Main language: C++ Google Filesystem (GFS) GFS Master GFS chunkserver HadoopMapReduce Main language: Java HadoopFilesystem (HDFS) Hadoopnamenode Hadoopdatanode Web Technologies 16
  • 17. Word Count The Map & Reduce “Hello World” example 17
  • 18. Word Count - Input Set of text files: Expected Output: sweet (1), this (2), is (2), the (2), foo (1), bar (1), file (1) 18 bar.txt This is the bar file foo.txt Sweet, this is the foo file
  • 19. Word Count - Map Mapper(filename, file-contents): for each word emit(word,1) Output this (1) is (1) the (1) sweet (1) this (1) the (1) is (1) foo (1) bar (1) file (1) 19
  • 20. Word Count – Shuffle Sort this (1) is (1) the (1) sweet (1) this (1) the (1) is (1) foo (1) bar (1) file (1) this (1) this (1) is (1) is (1) the (1) the (1) sweet (1) foo (1) bar (1) file (1) 20
  • 21. Word Count - Reduce reducer(word, values): sum = 0 for each value in values: sum = sum + value emit(word,sum) Output sweet (1) this (2) is (2) the (2) foo (1) bar (1) file (1) 21
  • 22. DEMO Hadoop – Word Count 22
  • 23. Summary Lots of data processed on the web (e.g., Google) Performance solution: Go parallel Input, Map, Shuffle & Sort, Reduce, Output Google File System Scheduling: Master/Worker Word Count example Hadoop Questions? Web Technologies 23
  • 24. References Inspirations for presentation http://www4.informatik.uni-erlangen.de/Lehre/WS10/V_MW/Uebung/folien/05-Map-Reduce-Framework.pdf http://www.scribd.com/doc/23844299/Map-Reduce-Hadoop-Pig RWTH Map Reduce Talk: http://bit.ly/f5oM7p Paper Dean et al, MapReduce: Simplified Data Processing on Large Clusters, OSDI'04: Sixth Symposium on Operating System Design and Implementation, San Francisco, CA, December, 2004 Ghemawat et al, The Google File System, 19th ACM Symposium on Operating Systems Principles, Lake George, NY, October, 2003. 24

Notes de l'éditeur

  1. In these days the web is all about data. All major and important websites relay on huge amount of data in some form in order to provide services to users. For example Google … and Facebook …. Also facilities like the LHC will produce data measures in peta bytes each year. However, it takes about 2.5 hours in order to read one terabyte off a typical hard drive. The solution that comes immediately to mind, of course, is going parallel. KonkretesBeispiel [TODO], [Kontextzu Cloud Computing]
  2. Parallel programming is still hard. Programmers have to deal with a lot of boilerplate code and have to manually write code for things like scheduling and load balancing. Also people want to use the company cluster in parallel, so something like a batch system is needed. As more and more companies use huge amounts of data, a some kind of standard framework or platform has emerged in recent years and that is the Map/Reduce framework.
  3. Map Reduce known for years as functional programming concept
  4. Actual execution and scheduling
  5. http://www4.informatik.uni-erlangen.de/Lehre/WS10/V_MW/Uebung/folien/05-Map-Reduce-Framework.pdf