Map Reduce

•Télécharger en tant que PPTX, PDF•

0 j'aime•233 vues

msgroner

Technologie Économie & finance

Functional Programming Map Apply function to transform elements of a list Return results as a list Reduce Apply function to all elements of a list Collect results and return as a single value

Definition MapReduce: A software framework to support processing of massive data sets across distributed computers

SAMPLE USE CASE Back end credit card processor Nightly processing of millions of transactions Processing requires grouping, sorting, and merchant wide analysis Can’t just divide the over all list into equal parts as further analysis is necessary Tight processing window

Description Simple, powerful programming model Language independent Can run on a single machine, but shines for distributed computing and extreme datasets Break down the processing problem into embarrassingly parallel atomic operations

Algorithm Map Phase Raw data analyzed and converted to name/value pair Shuffle Phase All name/value pairs are sorted and grouped by their keys Reduce Phase All values associated with a key are processed for results

MapReduce Walk Through Goal: Construct a word frequency of all the words in Wikipedia

Step 0: Split Data Raw input data divided into N parts N > number of machines Split must be context specific

Step 1: Map Each machine takes/receives a single slice of the raw input for mapping The map function processes the input file and emits a name/value pair of the relevant data

STEP 2: Shuffle The results of the map phase are sorted and grouped by the key in each key value pair.

STEP 3: Reduce Results from shuffle phase divided into M parts M >number of machines Each machine runs a reduction method on a part of shuffle results.

MAPREDUCE BENEFITS Scale Processing speed increases with number of machines involved Reliable Loss of any one machine doesn’t stop processing Cost Often built from heterogeneous commodity grade computers

Use Case Results Processing time of 1 million records Originally ~3 hours Reduced to 40 minutes on 5 computers

Other MapReduce installations Google – Index building Visa – Transaction Processing Facebook – Facebook Lexicon Intelligence Community Yahoo/Google – Terabyte Sort 10 billion, 100 byte records Yahoo: 910 nodes, 206 seconds Google: ~1,000 nodes, 68 seconds

Contenu connexe

Tendances

Automating Engineering with FMESafe Software

FME in Support of DOT Data CultureSafe Software

MAXITHERMAL SOFTWARE FOR USB LOGGERS : WITH MARATHON PRODUCTS Marathon Products, Inc

Hadoop Mapreduce joinsUday Vakalapudi

LUMASS - a Spatial System Dynamics Modelling FrameworkAlexander Herzig

Web Based GIS LeadGen IntroductionInteractiveGIS

QGIS Tutorial 2niloyghosh1984

Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...Safe Software

ArcGIS Bivariate Mapping ToolsAileen Buckley

Repartition join in mapreduceUday Vakalapudi

Map reduce presentationAhmad El Tawil

ArcGIS ExtensionsEsri

Utilities Industry Success Stories with FME Safe Software

Main map reduceMasoumeh Rezaei Jam

From 2D Drawings to 3D Navigation networks built with FMESafe Software

MDT7aplitop

Fdi extreme features360cell

Hadoop MapReduce joinsShalish VJ

8 Ways Utility Networks Can Meet Data DemandsSafe Software

Coordinate Systems in FME 101 Safe Software

Tendances (20)

Automating Engineering with FME

FME in Support of DOT Data Culture

MAXITHERMAL SOFTWARE FOR USB LOGGERS : WITH MARATHON PRODUCTS

Hadoop Mapreduce joins

LUMASS - a Spatial System Dynamics Modelling Framework

Web Based GIS LeadGen Introduction

QGIS Tutorial 2

Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...

ArcGIS Bivariate Mapping Tools

Repartition join in mapreduce

Map reduce presentation

ArcGIS Extensions

Utilities Industry Success Stories with FME

Main map reduce

From 2D Drawings to 3D Navigation networks built with FME

MDT7

Fdi extreme features

Hadoop MapReduce joins

8 Ways Utility Networks Can Meet Data Demands

Coordinate Systems in FME 101

En vedette

Energy For One World- The Book- updated version (December 2012)Energy for One World

My favourite topicShavneet Sharma

Maintaining Body Balance Through Probiotics By Mr. Hussian SabuwalaHealth Education Library for People

Can Water Heal? By Ms. Anu MehtaHealth Education Library for People

Conservation of natural resources A Presentation By Mr Allah Dad Khan Former...Mr.Allah Dad Khan

EFOW Performance Platforms- Plan FrameEnergy for One World

PresentationMohamed Amine Jlassi

презентация пятница (1)yuyukul

An Ecological View on Process Improvement: Some Thoughts for Improving Proces...Luigi Buglione

The LEGO Maturity & Capability Model ApproachLuigi Buglione

Hadoop SecuritySuresh Mandava

Employee onboarding and employee engagement in it organizations human resourc...AKSHAY KHATRI

Kekurangan zat makananMuhammad Nasrullah

презентация лето 13 группаyuyukul

Banking Trends for 2016Capgemini

кожни систем човекаMaja Simic

On Boarding Ppttomnorthrop

Skeletni sistem čoveka Ena Horvat

THE SCIENCE BEHIND EFFECTIVE FACEBOOK AD CAMPAIGNSunfunnel

Measuring Success on Facebook, Twitter & LinkedInBrian Honigman

En vedette (20)

Energy For One World- The Book- updated version (December 2012)

My favourite topic

Maintaining Body Balance Through Probiotics By Mr. Hussian Sabuwala

Can Water Heal? By Ms. Anu Mehta

Conservation of natural resources A Presentation By Mr Allah Dad Khan Former...

EFOW Performance Platforms- Plan Frame

Presentation

презентация пятница (1)

An Ecological View on Process Improvement: Some Thoughts for Improving Proces...

The LEGO Maturity & Capability Model Approach

Hadoop Security

Employee onboarding and employee engagement in it organizations human resourc...

Kekurangan zat makanan

презентация лето 13 группа

Banking Trends for 2016

кожни систем човека

On Boarding Ppt

Skeletni sistem čoveka

THE SCIENCE BEHIND EFFECTIVE FACEBOOK AD CAMPAIGNS

Measuring Success on Facebook, Twitter & LinkedIn

Similaire à Map Reduce

Comparing Distributed Indexing To Mapreduce or Not?TerrierTeam

Map reduce presentationateeq ateeq

Map reduceShahbaz Sidhu

2004 map reduce simplied data processing on large clusters (mapreduce)anh tuan

Spatial Data Integrator - Software Presentation and Use Casesmathieuraj

Hadoop Map ReduceVNIT-ACM Student Chapter

Introduction To Map Reducerantav

Lecture 1 mapreduceShubham Bansal

Sawmill - Integrating R and Large Data CloudsRobert Grossman

Mapreduce2008 cacmlmphuong06

MapReduce: Ordering and Large-Scale Indexing on Large ClustersIRJET Journal

Map ReduceSri Prasanna

Download Itbutest

2 mapreduce-model-principlesGenoveva Vargas-Solar

High Throughput Data AnalysisJ Singh

MapReduce-Notes.pdfAnilVijayagiri

MapReduce.pptxssuserb8d5cb

High Performance Computing on NYC Yellow Taxi Data SetParag Ahire

Big data & HadoopAhmed Gamil

FME Around the WorldSafe Software

Similaire à Map Reduce (20)

Comparing Distributed Indexing To Mapreduce or Not?

Map reduce presentation

Map reduce

2004 map reduce simplied data processing on large clusters (mapreduce)

Spatial Data Integrator - Software Presentation and Use Cases

Hadoop Map Reduce

Introduction To Map Reduce

Lecture 1 mapreduce

Sawmill - Integrating R and Large Data Clouds

Mapreduce2008 cacm

MapReduce: Ordering and Large-Scale Indexing on Large Clusters

Map Reduce

Download It

2 mapreduce-model-principles

High Throughput Data Analysis

MapReduce-Notes.pdf

MapReduce.pptx

High Performance Computing on NYC Yellow Taxi Data Set

Big data & Hadoop

FME Around the World

Dernier

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent

Salesforce Community Group Quito, Salesforce 101Paola De la Torre

A Domino Admins Adventures (Engage 2024)Gabriella Davis

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

Google AI Hackathon: LLM based Evaluator for RAGSujit Pal

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

A Call to Action for Generative AI in 2024Results

GenCyber Cyber Security Day PresentationMichael W. Hawkins

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

The Evolution of Money: Digital Transformation and CBDCs in Central BankingSelcen Ozturkcan

Scaling API-first – The story of a global engineering organizationRadu Cotescu

🐬 The future of MySQL is Postgres 🐘RTylerCroy

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

Dernier (20)

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

My Hashitalk Indonesia April 2024 Presentation

Boost PC performance: How more available memory can improve productivity

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...

Salesforce Community Group Quito, Salesforce 101

A Domino Admins Adventures (Engage 2024)

Breaking the Kubernetes Kill Chain: Host Path Mount

Unblocking The Main Thread Solving ANRs and Frozen Frames

Injustice - Developers Among Us (SciFiDevCon 2024)

Google AI Hackathon: LLM based Evaluator for RAG

Presentation on how to chat with PDF using ChatGPT code interpreter

A Call to Action for Generative AI in 2024

GenCyber Cyber Security Day Presentation

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

The Evolution of Money: Digital Transformation and CBDCs in Central Banking

Scaling API-first – The story of a global engineering organization

🐬 The future of MySQL is Postgres 🐘

Handwritten Text Recognition for manuscripts and early printed texts

Map Reduce

1. MapReduce

2. Functional Programming Map Apply function to transform elements of a list Return results as a list Reduce Apply function to all elements of a list Collect results and return as a single value

3. Definition MapReduce: A software framework to support processing of massive data sets across distributed computers

4. SAMPLE USE CASE Back end credit card processor Nightly processing of millions of transactions Processing requires grouping, sorting, and merchant wide analysis Can’t just divide the over all list into equal parts as further analysis is necessary Tight processing window

5. Description Simple, powerful programming model Language independent Can run on a single machine, but shines for distributed computing and extreme datasets Break down the processing problem into embarrassingly parallel atomic operations

6. Algorithm Map Phase Raw data analyzed and converted to name/value pair Shuffle Phase All name/value pairs are sorted and grouped by their keys Reduce Phase All values associated with a key are processed for results

7. MapReduce Walk Through Goal: Construct a word frequency of all the words in Wikipedia

8. Step 0: Split Data Raw input data divided into N parts N > number of machines Split must be context specific

9. Step 1: Map Each machine takes/receives a single slice of the raw input for mapping The map function processes the input file and emits a name/value pair of the relevant data

10. STEP 2: Shuffle The results of the map phase are sorted and grouped by the key in each key value pair.

11. STEP 3: Reduce Results from shuffle phase divided into M parts M >number of machines Each machine runs a reduction method on a part of shuffle results.

12. MAPREDUCE BENEFITS Scale Processing speed increases with number of machines involved Reliable Loss of any one machine doesn’t stop processing Cost Often built from heterogeneous commodity grade computers

13. Use Case Results Processing time of 1 million records Originally ~3 hours Reduced to 40 minutes on 5 computers

14. Other MapReduce installations Google – Index building Visa – Transaction Processing Facebook – Facebook Lexicon Intelligence Community Yahoo/Google – Terabyte Sort 10 billion, 100 byte records Yahoo: 910 nodes, 206 seconds Google: ~1,000 nodes, 68 seconds

15. Questions

Map Reduce

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Map Reduce

Similaire à Map Reduce (20)

Dernier

Dernier (20)

Map Reduce