SlideShare a Scribd company logo
1 of 80
Download to read offline
MapReduce
MapReduce
  reduced
PetaMengurangi
*
PetaMengurangi

          *google translate
the
problem
lots
of
data
e.g.

the
entire
interwebs
single
computer
not
going
to
work
lots
of
computers
we
have
that
cluster
programming
cluster
programming
        =
suck
MapReduce
makes
the
pain
go
away
2
main
stages
map
map
process
data
on
hosts
reduce
reduce
summarise
the
results
example
count
words
on
lines
>>>
reduce(operator.add,
map(countWords,
lines))
>>>
reduce(operator.add,
map(countWords,
lines))
>>>
reduce(operator.add,
map(countWords,
lines))
except
in
this
case
lots
of
machines
typical
cluster
O(103)
machines

each
2‐8Gb
RAM
 local
IDE
disks
GFS
distributes
the
data
process
data
on
hosts
  summarise
results
split
data
into
chunks
process
data
on
hosts
  summarise
results
split
data
into
chunks
  allocate
machines
process
data
on
hosts
  summarise
results
split
data
into
chunks
  allocate
machines
   start
processes
process
data
on
hosts
  summarise
results
split
data
into
chunks
  allocate
machines
   start
processes
send
data
to
mappers
process
data
on
hosts
  summarise
results
split
data
into
chunks
  allocate
machines
   start
processes
send
data
to
mappers
process
data
on
hosts
    monitor
hosts
  summarise
results
split
data
into
chunks
   allocate
machines
    start
processes
 send
data
to
mappers
 process
data
on
hosts
     monitor
hosts
send
results
to
reducers
   summarise
results
split
data
into
chunks
    allocate
machines
     start
processes
  send
data
to
mappers
  process
data
on
hosts
      monitor
hosts
redo
failed
and
stragglers
 send
results
to
reducers
    summarise
results
split
data
into
chunks
    allocate
machines
     start
processes
  send
data
to
mappers
  process
data
on
hosts
      monitor
hosts
redo
failed
and
stragglers
 send
results
to
reducers
    summarise
results
   output
final
results
MapReduce
does
the

   yukky
stuff
split
data
into
chunks
                allocate
machines
                 start
processes
              send
data
to
mappers
MapReduce
              process
data
on
hosts
                  monitor
hosts          programmer
            redo
failed
and
stragglers
             send
results
to
reducers
                summarise
results
               output
final
results
handles
failures
handles
stragglers
a
vanity
search
%
of
refs
to
Anthony
%
of
refs
to
Anthony
       Baxter
count(‘Anthony
Baxter’)
   count(‘Anthony’)
C++
library
...
with
Python
bindings,

           yay!
class
AnthonyMapper(mrpython.Mapper):




def
Map(self,
map_input):








meCount
=
otherCount
=
0








docId
=
map_input.key()
#
ignored
‐
doc
id








src
=
map_input.value()
#
document
source








text
=
ExtractText(src).split()








seenAnthony
=
False








for
word
in
text:












if
not
seenAnthony:
















if
word.lower()
==
'anthony':




















seenAnthony
=
True












else:
















if
word.lower()
==
'baxter':




















meCount
+=
1
















else:




















otherCount
+=
1

















seenAnthony
=
False








yield
'me',
meCount








yield
'other',
otherCount
class
AnthonyReducer(mrpython.Reducer):




def
Reducer(self,
reduce_input):








'''
Passed
a
key
(either
'me'
or
'other')
and
a
list












of
counts.
Adds
the
counts
and
returns
them.








'''








count
=
0








for
val
in
reduce_input.values():












sum
+=
int(val)








yield
count
the
result:
the
result:

about
1
in
4000
other
uses
for
MapReduce
web
link
graphs
access
logs
text
analysis
google
news
clustering
local
search
road
traffic
take
speed
samples
group
by
road
segment
take
the
average
once
per
minute
output
to
a
map
layer
limitation:
availability
of

          data
MapReduce
is
pretty
cool
for
more
information
“mapreduce
paper”
   “gfs
paper”
 “google
papers”
if
you’d
like
to
play
hadoop.apache.org
open
source
java
implementation
HDFS
Map Reduce In 5 Minutes

More Related Content

Similar to Map Reduce In 5 Minutes

How To Create Custom DSLs By PHP
How To Create Custom DSLs By PHPHow To Create Custom DSLs By PHP
How To Create Custom DSLs By PHP
Atsuhiro Kubo
 
JSplash - Adobe MAX 2009
JSplash - Adobe MAX 2009JSplash - Adobe MAX 2009
JSplash - Adobe MAX 2009
gyuque
 
yusukebe in Yokohama.pm 090909
yusukebe in Yokohama.pm 090909yusukebe in Yokohama.pm 090909
yusukebe in Yokohama.pm 090909
Yusuke Wada
 
Web 2.0 Performance and Reliability: How to Run Large Web Apps
Web 2.0 Performance and Reliability: How to Run Large Web AppsWeb 2.0 Performance and Reliability: How to Run Large Web Apps
Web 2.0 Performance and Reliability: How to Run Large Web Apps
adunne
 
Yahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user groupYahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user group
Hadoop User Group
 
Wide Open Spaces Using My Sql As A Web Mapping Service Backend
Wide Open Spaces Using My Sql As A Web Mapping Service BackendWide Open Spaces Using My Sql As A Web Mapping Service Backend
Wide Open Spaces Using My Sql As A Web Mapping Service Backend
MySQLConference
 

Similar to Map Reduce In 5 Minutes (20)

Map reduce
Map reduceMap reduce
Map reduce
 
Ruby on Rails Tutorial Part I
Ruby on Rails Tutorial Part IRuby on Rails Tutorial Part I
Ruby on Rails Tutorial Part I
 
Gmaps Railscamp2008
Gmaps Railscamp2008Gmaps Railscamp2008
Gmaps Railscamp2008
 
HTML Parsing With Hpricot
HTML Parsing With HpricotHTML Parsing With Hpricot
HTML Parsing With Hpricot
 
XS Japan 2008 Xen Mgmt Japanese
XS Japan 2008 Xen Mgmt JapaneseXS Japan 2008 Xen Mgmt Japanese
XS Japan 2008 Xen Mgmt Japanese
 
How To Create Custom DSLs By PHP
How To Create Custom DSLs By PHPHow To Create Custom DSLs By PHP
How To Create Custom DSLs By PHP
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
JSplash - Adobe MAX 2009
JSplash - Adobe MAX 2009JSplash - Adobe MAX 2009
JSplash - Adobe MAX 2009
 
Ms Dm Online
Ms Dm OnlineMs Dm Online
Ms Dm Online
 
yusukebe in Yokohama.pm 090909
yusukebe in Yokohama.pm 090909yusukebe in Yokohama.pm 090909
yusukebe in Yokohama.pm 090909
 
Adding Statistical Functionality to the DATA Step with PROC FCMP
Adding Statistical Functionality to the DATA Step with PROC FCMPAdding Statistical Functionality to the DATA Step with PROC FCMP
Adding Statistical Functionality to the DATA Step with PROC FCMP
 
Map reduce
Map reduceMap reduce
Map reduce
 
Web 2.0 Performance and Reliability: How to Run Large Web Apps
Web 2.0 Performance and Reliability: How to Run Large Web AppsWeb 2.0 Performance and Reliability: How to Run Large Web Apps
Web 2.0 Performance and Reliability: How to Run Large Web Apps
 
Mapreduce Pact06 Keynote
Mapreduce Pact06 KeynoteMapreduce Pact06 Keynote
Mapreduce Pact06 Keynote
 
Yahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user groupYahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user group
 
Timm – Telecom Network Module Management
Timm – Telecom Network Module ManagementTimm – Telecom Network Module Management
Timm – Telecom Network Module Management
 
Wide Open Spaces Using My Sql As A Web Mapping Service Backend
Wide Open Spaces Using My Sql As A Web Mapping Service BackendWide Open Spaces Using My Sql As A Web Mapping Service Backend
Wide Open Spaces Using My Sql As A Web Mapping Service Backend
 
機械学習と自動微分
機械学習と自動微分機械学習と自動微分
機械学習と自動微分
 
Roll-out of the NYU HSL Website and Drupal CMS
Roll-out of the NYU HSL Website and Drupal CMSRoll-out of the NYU HSL Website and Drupal CMS
Roll-out of the NYU HSL Website and Drupal CMS
 
An Integrated Management Supervisor for End-to-End Management of Heterogeneou...
An Integrated Management Supervisor for End-to-End Management of Heterogeneou...An Integrated Management Supervisor for End-to-End Management of Heterogeneou...
An Integrated Management Supervisor for End-to-End Management of Heterogeneou...
 

More from Linuxmalaysia Malaysia

FOSSDAY@IIUM 2012 Cloud Presentation By LinuxMalaysia
FOSSDAY@IIUM 2012 Cloud Presentation By LinuxMalaysiaFOSSDAY@IIUM 2012 Cloud Presentation By LinuxMalaysia
FOSSDAY@IIUM 2012 Cloud Presentation By LinuxMalaysia
Linuxmalaysia Malaysia
 
Introduction To ICT Security Audit OWASP Day Malaysia 2011
Introduction To ICT Security Audit OWASP Day Malaysia 2011Introduction To ICT Security Audit OWASP Day Malaysia 2011
Introduction To ICT Security Audit OWASP Day Malaysia 2011
Linuxmalaysia Malaysia
 
33853955 bikesh-beginning-smart-phone-web-development
33853955 bikesh-beginning-smart-phone-web-development33853955 bikesh-beginning-smart-phone-web-development
33853955 bikesh-beginning-smart-phone-web-development
Linuxmalaysia Malaysia
 

More from Linuxmalaysia Malaysia (20)

Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th Novembe...
Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th Novembe...Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th Novembe...
Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th Novembe...
 
Call For Speakers Malaysia Open Source Conference 2014 (MOSCMY 2014 - MOSCMY2...
Call For Speakers Malaysia Open Source Conference 2014 (MOSCMY 2014 - MOSCMY2...Call For Speakers Malaysia Open Source Conference 2014 (MOSCMY 2014 - MOSCMY2...
Call For Speakers Malaysia Open Source Conference 2014 (MOSCMY 2014 - MOSCMY2...
 
MOSC2013 MOSCMY Brochure Malaysia Open Source Conference 2013
MOSC2013 MOSCMY Brochure Malaysia Open Source Conference 2013MOSC2013 MOSCMY Brochure Malaysia Open Source Conference 2013
MOSC2013 MOSCMY Brochure Malaysia Open Source Conference 2013
 
Brochure Malaysia Open Source Conference 2013 MOSCMY 2013 (MOSC2013) brochure
Brochure Malaysia Open Source Conference 2013 MOSCMY 2013 (MOSC2013) brochureBrochure Malaysia Open Source Conference 2013 MOSCMY 2013 (MOSC2013) brochure
Brochure Malaysia Open Source Conference 2013 MOSCMY 2013 (MOSC2013) brochure
 
Hala Tuju Kemahiran Keselamatan Komputer Dan Internet (ICT)
Hala Tuju Kemahiran Keselamatan Komputer Dan Internet (ICT)Hala Tuju Kemahiran Keselamatan Komputer Dan Internet (ICT)
Hala Tuju Kemahiran Keselamatan Komputer Dan Internet (ICT)
 
FOSSDAY@IIUM 2012 Cloud Presentation By LinuxMalaysia
FOSSDAY@IIUM 2012 Cloud Presentation By LinuxMalaysiaFOSSDAY@IIUM 2012 Cloud Presentation By LinuxMalaysia
FOSSDAY@IIUM 2012 Cloud Presentation By LinuxMalaysia
 
Questionnaire For Establishment Of Board of Computing Professionals Malaysia ...
Questionnaire For Establishment Of Board of Computing Professionals Malaysia ...Questionnaire For Establishment Of Board of Computing Professionals Malaysia ...
Questionnaire For Establishment Of Board of Computing Professionals Malaysia ...
 
Sponsorship Prospectus Malaysia Open Source Conference 2012 (MOSC2012)
Sponsorship Prospectus Malaysia Open Source Conference 2012  (MOSC2012)Sponsorship Prospectus Malaysia Open Source Conference 2012  (MOSC2012)
Sponsorship Prospectus Malaysia Open Source Conference 2012 (MOSC2012)
 
OSS Community Forum Regarding Proposed BCPM2011 SWOT Slide
OSS Community Forum Regarding Proposed BCPM2011 SWOT SlideOSS Community Forum Regarding Proposed BCPM2011 SWOT Slide
OSS Community Forum Regarding Proposed BCPM2011 SWOT Slide
 
Introduction To ICT Security Audit OWASP Day Malaysia 2011
Introduction To ICT Security Audit OWASP Day Malaysia 2011Introduction To ICT Security Audit OWASP Day Malaysia 2011
Introduction To ICT Security Audit OWASP Day Malaysia 2011
 
Building Smart Phone Web Apps MOSC2010 Bikesh iTrain
Building Smart Phone Web Apps MOSC2010 Bikesh iTrainBuilding Smart Phone Web Apps MOSC2010 Bikesh iTrain
Building Smart Phone Web Apps MOSC2010 Bikesh iTrain
 
OSDC.my Master Plan For Malaysia Open Source Community
OSDC.my Master Plan For Malaysia Open Source CommunityOSDC.my Master Plan For Malaysia Open Source Community
OSDC.my Master Plan For Malaysia Open Source Community
 
33853955 bikesh-beginning-smart-phone-web-development
33853955 bikesh-beginning-smart-phone-web-development33853955 bikesh-beginning-smart-phone-web-development
33853955 bikesh-beginning-smart-phone-web-development
 
Open Source Tools for Creating Mashups with Government Datasets MOSC2010
Open Source Tools for Creating Mashups with Government Datasets MOSC2010Open Source Tools for Creating Mashups with Government Datasets MOSC2010
Open Source Tools for Creating Mashups with Government Datasets MOSC2010
 
DNS solution trumps cloud computing competition
DNS solution trumps cloud computing competitionDNS solution trumps cloud computing competition
DNS solution trumps cloud computing competition
 
Brochure MSC Malaysia Open Source Conference 2010 (MSC MOSC2010)
Brochure MSC Malaysia Open Source Conference 2010 (MSC MOSC2010)Brochure MSC Malaysia Open Source Conference 2010 (MSC MOSC2010)
Brochure MSC Malaysia Open Source Conference 2010 (MSC MOSC2010)
 
Benchmarking On Web Server For Budget 2008 Day
Benchmarking On  Web  Server For  Budget 2008  DayBenchmarking On  Web  Server For  Budget 2008  Day
Benchmarking On Web Server For Budget 2008 Day
 
Sesuaikan Masa Sempena 2010
Sesuaikan Masa Sempena 2010Sesuaikan Masa Sempena 2010
Sesuaikan Masa Sempena 2010
 
OSS Community In Malaysia 2009 List
OSS Community In Malaysia 2009 ListOSS Community In Malaysia 2009 List
OSS Community In Malaysia 2009 List
 
List Of OSS Communities Malaysia 2009
List Of OSS Communities Malaysia 2009List Of OSS Communities Malaysia 2009
List Of OSS Communities Malaysia 2009
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 

Map Reduce In 5 Minutes