SlideShare une entreprise Scribd logo
1  sur  46
cs4414 Fall 2013
University of Virginia
David EvansJodhpur, India (Dec 2011)
Plan for Today
• Recap list map
• Google’s MapReduce
• Tasks in Rust
• Multi-threaded map
26 September 2013 University of Virginia cs4414 1
PS2 is due Monday (30 Sept) at 8:59pm.
Submission form will be posted later today, and
include signup for scheduling your demo/review.
All team members are expected to participate in
the review, except in extreme circumstances.
26 September 2013 University of Virginia cs4414 2
struct Node {
head : int,
tail : Option<~Node>
}
type List = Option<~Node> ;
trait Map {
fn mapr(&self, &fn(int) -> int) -> List;
}
impl Map for List {
fn mapr(&self, f: &fn(int) -> int) -> List {
match(*self) {
None => None,
Some(ref node) => { Some(~Node{ head: f(node.head),
tail: node.tail.mapr(f) }) },
} } }
You should understand everything in this code.
Ask questions now if there is anything unclear.
Cost of Map
26 September 2013 University of Virginia cs4414 3
Core 1
What is the running time of p.map(f)
using one core where p is a list of N
elements and each evaluation of f(x)
takes 1ms?
Cost of Multi-Core Map
26 September 2013 University of Virginia cs4414 4
Core 1
Core 3
Core 2
Core 4
What is the running time of p.map(f)
using k cores where p is a list of N
elements and each evaluation of f(x)
takes 1ms?
How should we parallelize map?
26 September 2013 University of Virginia cs4414 5
fn mapr(&self, f: &fn(int) -> int) -> List {
match(*self) {
None => None,
Some(ref node) => { Some(~Node{ head: f(node.head),
tail: node.tail.mapr(f) }) },
}
}
26 September 2013 University of Virginia cs4414 6
“MapReduce is a programming model and an associated
implementation for processing and generating large data sets. Users
specify a map function that processes a key/value pair to generate a
set of intermediate key/value pairs, and a reduce function that
merges all intermediate values associated with the same
intermediate key. Many real world tasks are expressible in this
model, as shown in the paper.
Programs written in this functional style are automatically
parallelized and executed on a large cluster of commodity machines.”
OSDI 2004
Did
Google
invent
map?
26 September 2013 University of Virginia cs4414 7
8
John McCarthy
1927-2011
26 September 2013 University of Virginia cs4414 9
10
1955-1960: First “mass-produced” computer (sold 123 of them)
1 accumulator register (38 bits), 3 decrement registers (15 bit)
Instructions had 3 bit opcode, 15 bit decrement, 15 bit address
Magentic Core Memory
32,000 36-bit words
40,000 instructions/second
11
John McCarthy
playing chess
with IBM 7090
(1967)
26 September 2013 University of Virginia cs4414 12
fn mapr(&self, f: &fn(int) -> int) -> List {
match(*self) {
None => None,
Some(ref node) => {
Some(~Node{ head: f(node.head), tail: node.tail.mapr(f) }) },
}
}
26 September 2013 University of Virginia cs4414 13
26 September 2013 University of Virginia cs4414 14
@ pointers (in 1960)
26 September 2013 University of Virginia cs4414 15
MapReduce
26 September 2013 University of Virginia cs4414 16
Google’s map:
fn mapr(&self, f: &fn(int) -> int) -> List {
match(*self) {
None => None,
Some(ref node) => {
Some(~Node{ head: f(node.head),
tail: node.tail.mapr(f) }) }, } }
26 September 2013 University of Virginia cs4414 17
26 September 2013 University of Virginia cs4414 18
fn mapg<K1, V1, K2, V2>(List<Pair<K1, V1>>,
f: &fn(K1, V1) -> (K2, V2))
-> List<Pair<K2, V2>>
fn reduceg<K, V, R>(K, List<V>) -> List<R>
26 September 2013 University of Virginia cs4414 19
fn mapg<K1, V1, K2, V2>(List<Pair<K1, V1>>,
f: &fn(K1, V1) -> (K2, V2))
-> List<Pair<K2, V2>>
fn reduceg<K, V, R>(K, List<V>) -> List<R>
fn map_reduce<K1, V1, K2, V2, R>(
List<Pair<K1, V2>>,
mapf: &fn(K1, V1) -> (K2, V2)),
reducef: &fn(K2, List<V2>) -> R))
-> List<R>
26 September 2013 University of Virginia cs4414 20
fn map_reduce<K1, V1, K2, V2, R>(
data: List<Pair<K1, V2>>,
mapf: &fn(K1, V1) -> (K2, V2)),
reducef: &fn(K2, List<V2>) -> R))
-> List<R> {
}
26 September 2013 University of Virginia cs4414 21
fn map_reduce<K1, V1, K2, V2, R>(
data: List<Pair<K1, V2>>,
mapf: &fn(K1, V1) -> (K2, V2)),
reducef: &fn(K2, List<V2>) -> R))
-> List<R> {
let ivalues = data.map(mapf)
let mvalues = // merge ivalues by k2
mvalues.map(reducef)
}
Completing the code (with parallel map will finish
today) is left as sticker-worthy exercise!
Mapping in
Parallel
26 September 2013 University of Virginia cs4414 22
Processes, Threads, Tasks
Process
Originally: abstraction
for owning the whole
machine
What do you need:
26 September 2013 University of Virginia cs4414 23
Thread
(Illusion of)
independent sequence
of instructions
What do you need:
Processes, Threads, Tasks
Process
Originally: abstraction
for owning the whole
machine
What do you need:
26 September 2013 University of Virginia cs4414 24
Own program counter
Own stack, registers
Own memory space
Own program counter
Own stack, registers
Shares memory space
Thread
(Illusion of)
independent sequence
of instructions
What do you need:
Tasks in Rust
26 September 2013 University of Virginia cs4414 25
Tasks
Own PC
Own stack, registers
Safely shared
immutable memory
Safely independent
own memory
26 September 2013 University of Virginia cs4414 26
fn spawn(f: ~fn())
spawn( | | {
println(“Get back to work!”);
});
do spawn {
println(“Get back to work!”);
}
syntactic sugar:
Task = Thread – unsafe memory sharing
or
Task = Process + safe memory sharing – cost of OS process
26 September 2013 University of Virginia cs4414 27
impl Map for List {
fn mapr(&self, f: &fn(int) -> int) -> List {
match(*self) {
None => None,
Some(ref node) => {
Some(~Node{ head: f(node.head),
tail: node.tail.mapr(f) })
},
}
}
}
Original single-threaded mapr
fn spawn(f: ~fn())
26 September 2013 University of Virginia cs4414 28
impl Map for List {
fn mapr(&self, f: extern fn(int) -> int) -> List {
match(*self) {
None => None,
Some(ref node) => {
do spawn {
f(node.head)
}
Some(~Node{ head: ?,
tail: node.tail.mapr(f) })
},
}
}
}
First attempt
Cannot use node here!
26 September 2013 University of Virginia cs4414 29
impl Map for List {
fn mapr(&self, f: extern fn(int) -> int) -> List {
match(*self) {
None => None,
Some(ref node) => {
let val = node.head;
do spawn {
f(val)
}
Some(~Node{ head: ?,
tail: node.tail.mapr(f) })
},
}
}
}
How can we get results back from a spawned task without shared memory?
Channels
26 September 2013 University of Virginia cs4414 30
let (port, chan) : (Port<int>, Chan<int>) = stream();
let val = node.head;
do spawn {
chan.send(f(val));
}
let newval = port.recv();
26 September 2013 University of Virginia cs4414 31
Using streams to
spawn is dangerous
for salmon, but Rust
saves you from (data)
races with the bears!
26 September 2013 University of Virginia cs4414 32
First attempt
fn mapr(&self, f: extern fn(int) -> int) -> List {
match(*self) {
None => None,
Some(ref node) => {
let (port, chan) : (Port<int>, Chan<int>) = stream();
let newtail = node.tail.mapr(f);
let val = node.head;
do spawn {
chan.send(f(val));
}
Some(~Node{ head: port.recv(), tail: newtail })
}
}
}
} Compiles are runs fine and produces correct output…
but has a major bug!
26 September 2013 University of Virginia cs4414 33
Now we’re spawning!
fn mapr(&self, f: extern fn(int) -> int) -> List {
match(*self) {
None => None,
Some(ref node) => {
let (port, chan) : (Port<int>, Chan<int>) = stream();
let val = node.head;
do spawn {
chan.send(f(val));
}
let newtail = node.tail.mapr(f);
Some(~Node{ head: port.recv(), tail: newtail })
}
}
}
}
26 September 2013 University of Virginia cs4414 34
fn collatz_steps(n: int) -> int {
if n == 1 { 0 } else { 1 + collatz_steps(if n % 2 == 0 { n / 2 } else { 3*n + 1 }) }
}
fn find_collatz(k: int) -> int {
// Returns the minimum value, n, with Collatz stopping time >= k.
let mut n = 1;
while collatz_steps(n) < k { n += 1; }
n
}
fn main() {
let lst0 : List = Some(~Node{head: 400, tail: .
Some(~Node{head : 410, tail:
// … 16 total similar elements
} );
println(lst0.to_str());
let lst1 = lst0.mapr(find_collatz);
println(lst1.to_str());
let lst2 = lst1.mapr(find_collatz);
println(lst2.to_str());
}
26 September 2013 University of Virginia cs4414 35
When 350+% of your CPU isn’t fast enough, its time to buy a new computer!
26 September 2013 University of Virginia cs4414 36
26 September 2013 University of Virginia cs4414 37
Intel i7 Quad-Core Processor
26 September 2013 University of Virginia cs4414 38
Intel i7 Quad-Core Processor
Core Core Core Core
Shared Memory Cache (L3 = 6MB)
~256KBL2
Cache(?)
Why so few?
26 September 2013 University of Virginia cs4414 39
26 September 2013 University of Virginia cs4414 40
Portuguese, was beavering away in the library when ‘smoke
suddenly started to come out’ of her computer. Fortunately, she
removed the fire hazard from the library, averting disaster at the
last moment. The student gave The Tab her version of the story:
“I was in the library working at my computer when smoke
suddenly started to come out of it. I freaked out for a second,
trying to save my work onto my hard disk, but then I realised it
was probably more important to take it out of the library.
The Tab (Oxford), “Laptop Fire Almost Destroys College Library”
Where the Cores Are
26 September 2013 University of Virginia cs4414 41
nVIDIA GeForce GTX 650M
384 cores
(but even harder for
typical programs to
use well than Intel’s
cores)
How much faster will my Rust mapping
program be on my new machine?
26 September 2013 University of Virginia cs4414 42
2013 MacBook Pro
Intel i7-3740QM
2.7 GHz, 4 cores (8 threads)
6MB shared L3 cache
2011 MacBook Air
Intel i5-2557M
1.7 GHz, 2 cores (4 threads)
3 MB shared L3 cache
both support “hyperthreading” (two threads per core)
60 seconds
(normalized time,
running on 16-
element list)
?
26 September 2013 University of Virginia cs4414 43
26 September 2013 University of Virginia cs4414 44
Submit your “guesses” and reasoning
in course forum….hopefully I will
know the actual answer by Tuesday!
PS2 is due Monday (30 Sept) at 8:59pm.
Submission form will be posted later today, and
include signup for scheduling your demo/review.
All team members are expected to participate in
the review, except in extreme circumstances.
26 September 2013 University of Virginia cs4414 45

Contenu connexe

Tendances

Profiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf ToolsProfiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf Tools
emBO_Conference
 

Tendances (20)

Segmentation Faults, Page Faults, Processes, Threads, and Tasks
Segmentation Faults, Page Faults, Processes, Threads, and TasksSegmentation Faults, Page Faults, Processes, Threads, and Tasks
Segmentation Faults, Page Faults, Processes, Threads, and Tasks
 
Synchronization
SynchronizationSynchronization
Synchronization
 
System Calls
System CallsSystem Calls
System Calls
 
SSL Failing, Sharing, and Scheduling
SSL Failing, Sharing, and SchedulingSSL Failing, Sharing, and Scheduling
SSL Failing, Sharing, and Scheduling
 
The Internet
The InternetThe Internet
The Internet
 
Functional Reactive Programming with RxJS
Functional Reactive Programming with RxJSFunctional Reactive Programming with RxJS
Functional Reactive Programming with RxJS
 
Making a Process (Virtualizing Memory)
Making a Process (Virtualizing Memory)Making a Process (Virtualizing Memory)
Making a Process (Virtualizing Memory)
 
Scheduling in Linux and Web Servers
Scheduling in Linux and Web ServersScheduling in Linux and Web Servers
Scheduling in Linux and Web Servers
 
2013 0928 programming by cuda
2013 0928 programming by cuda2013 0928 programming by cuda
2013 0928 programming by cuda
 
EBtree - Design for a Scheduler and Use (Almost) Everywhere
EBtree - Design for a Scheduler and Use (Almost) EverywhereEBtree - Design for a Scheduler and Use (Almost) Everywhere
EBtree - Design for a Scheduler and Use (Almost) Everywhere
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)
 
Profiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf ToolsProfiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf Tools
 
Exploitation of counter overflows in the Linux kernel
Exploitation of counter overflows in the Linux kernelExploitation of counter overflows in the Linux kernel
Exploitation of counter overflows in the Linux kernel
 
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
 
Profiling Ruby
Profiling RubyProfiling Ruby
Profiling Ruby
 
Is your profiler speaking the same language as you? -- Docklands JUG
Is your profiler speaking the same language as you? -- Docklands JUGIs your profiler speaking the same language as you? -- Docklands JUG
Is your profiler speaking the same language as you? -- Docklands JUG
 
The future is CSN
The future is CSNThe future is CSN
The future is CSN
 
Wprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Wprowadzenie do technologii Big Data / Intro to Big Data EcosystemWprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Wprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
 
grsecurity and PaX
grsecurity and PaXgrsecurity and PaX
grsecurity and PaX
 
Programar para GPUs
Programar para GPUsProgramar para GPUs
Programar para GPUs
 

En vedette

Building a High-Performance Distributed Task Queue on MongoDB
Building a High-Performance Distributed Task Queue on MongoDBBuilding a High-Performance Distributed Task Queue on MongoDB
Building a High-Performance Distributed Task Queue on MongoDB
MongoDB
 

En vedette (6)

Synchronization
SynchronizationSynchronization
Synchronization
 
Chapman: Building a High-Performance Distributed Task Service with MongoDB
Chapman: Building a High-Performance Distributed Task Service with MongoDBChapman: Building a High-Performance Distributed Task Service with MongoDB
Chapman: Building a High-Performance Distributed Task Service with MongoDB
 
The First Billion Android Activations
The First Billion Android ActivationsThe First Billion Android Activations
The First Billion Android Activations
 
Inventing the Future
Inventing the FutureInventing the Future
Inventing the Future
 
Building a High-Performance Distributed Task Queue on MongoDB
Building a High-Performance Distributed Task Queue on MongoDBBuilding a High-Performance Distributed Task Queue on MongoDB
Building a High-Performance Distributed Task Queue on MongoDB
 
Class 1: Introduction - What is an Operating System?
Class 1: Introduction - What is an Operating System?Class 1: Introduction - What is an Operating System?
Class 1: Introduction - What is an Operating System?
 

Similaire à Multi-Tasking Map (MapReduce, Tasks in Rust)

Putting a Fork in Fork (Linux Process and Memory Management)
Putting a Fork in Fork (Linux Process and Memory Management)Putting a Fork in Fork (Linux Process and Memory Management)
Putting a Fork in Fork (Linux Process and Memory Management)
David Evans
 
A gentle introduction to functional programming through music and clojure
A gentle introduction to functional programming through music and clojureA gentle introduction to functional programming through music and clojure
A gentle introduction to functional programming through music and clojure
Paul Lam
 

Similaire à Multi-Tasking Map (MapReduce, Tasks in Rust) (20)

ELEMENTARY DATASTRUCTURES
ELEMENTARY DATASTRUCTURESELEMENTARY DATASTRUCTURES
ELEMENTARY DATASTRUCTURES
 
Eta
EtaEta
Eta
 
Scalding - the not-so-basics @ ScalaDays 2014
Scalding - the not-so-basics @ ScalaDays 2014Scalding - the not-so-basics @ ScalaDays 2014
Scalding - the not-so-basics @ ScalaDays 2014
 
Making a Process
Making a ProcessMaking a Process
Making a Process
 
Modern technologies in data science
Modern technologies in data science Modern technologies in data science
Modern technologies in data science
 
talk at Virginia Bioinformatics Institute, December 5, 2013
talk at Virginia Bioinformatics Institute, December 5, 2013talk at Virginia Bioinformatics Institute, December 5, 2013
talk at Virginia Bioinformatics Institute, December 5, 2013
 
Beauty and the beast - Haskell on JVM
Beauty and the beast  - Haskell on JVMBeauty and the beast  - Haskell on JVM
Beauty and the beast - Haskell on JVM
 
Spark_Documentation_Template1
Spark_Documentation_Template1Spark_Documentation_Template1
Spark_Documentation_Template1
 
Putting a Fork in Fork (Linux Process and Memory Management)
Putting a Fork in Fork (Linux Process and Memory Management)Putting a Fork in Fork (Linux Process and Memory Management)
Putting a Fork in Fork (Linux Process and Memory Management)
 
RR & Docker @ MuensteR Meetup (Sep 2017)
RR & Docker @ MuensteR Meetup (Sep 2017)RR & Docker @ MuensteR Meetup (Sep 2017)
RR & Docker @ MuensteR Meetup (Sep 2017)
 
Subtle Asynchrony by Jeff Hammond
Subtle Asynchrony by Jeff HammondSubtle Asynchrony by Jeff Hammond
Subtle Asynchrony by Jeff Hammond
 
Real Time Big Data Management
Real Time Big Data ManagementReal Time Big Data Management
Real Time Big Data Management
 
Metaprogramming and Reflection in Common Lisp
Metaprogramming and Reflection in Common LispMetaprogramming and Reflection in Common Lisp
Metaprogramming and Reflection in Common Lisp
 
[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探
[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探
[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探
 
Spark Streaming Programming Techniques You Should Know with Gerard Maas
Spark Streaming Programming Techniques You Should Know with Gerard MaasSpark Streaming Programming Techniques You Should Know with Gerard Maas
Spark Streaming Programming Techniques You Should Know with Gerard Maas
 
JSDC 2014 - functional java script, why or why not
JSDC 2014 - functional java script, why or why notJSDC 2014 - functional java script, why or why not
JSDC 2014 - functional java script, why or why not
 
DAW: Duplicate-AWare Federated Query Processing over the Web of Data
DAW: Duplicate-AWare Federated Query Processing over the Web of DataDAW: Duplicate-AWare Federated Query Processing over the Web of Data
DAW: Duplicate-AWare Federated Query Processing over the Web of Data
 
Survey onhpcs languages
Survey onhpcs languagesSurvey onhpcs languages
Survey onhpcs languages
 
St Petersburg R user group meetup 2, Parallel R
St Petersburg R user group meetup 2, Parallel RSt Petersburg R user group meetup 2, Parallel R
St Petersburg R user group meetup 2, Parallel R
 
A gentle introduction to functional programming through music and clojure
A gentle introduction to functional programming through music and clojureA gentle introduction to functional programming through music and clojure
A gentle introduction to functional programming through music and clojure
 

Plus de David Evans

Plus de David Evans (20)

Cryptocurrency Jeopardy!
Cryptocurrency Jeopardy!Cryptocurrency Jeopardy!
Cryptocurrency Jeopardy!
 
Trick or Treat?: Bitcoin for Non-Believers, Cryptocurrencies for Cypherpunks
Trick or Treat?: Bitcoin for Non-Believers, Cryptocurrencies for CypherpunksTrick or Treat?: Bitcoin for Non-Believers, Cryptocurrencies for Cypherpunks
Trick or Treat?: Bitcoin for Non-Believers, Cryptocurrencies for Cypherpunks
 
Hidden Services, Zero Knowledge
Hidden Services, Zero KnowledgeHidden Services, Zero Knowledge
Hidden Services, Zero Knowledge
 
Anonymity in Bitcoin
Anonymity in BitcoinAnonymity in Bitcoin
Anonymity in Bitcoin
 
Midterm Confirmations
Midterm ConfirmationsMidterm Confirmations
Midterm Confirmations
 
Scripting Transactions
Scripting TransactionsScripting Transactions
Scripting Transactions
 
How to Live in Paradise
How to Live in ParadiseHow to Live in Paradise
How to Live in Paradise
 
Bitcoin Script
Bitcoin ScriptBitcoin Script
Bitcoin Script
 
Mining Economics
Mining EconomicsMining Economics
Mining Economics
 
Mining
MiningMining
Mining
 
The Blockchain
The BlockchainThe Blockchain
The Blockchain
 
Becoming More Paranoid
Becoming More ParanoidBecoming More Paranoid
Becoming More Paranoid
 
Asymmetric Key Signatures
Asymmetric Key SignaturesAsymmetric Key Signatures
Asymmetric Key Signatures
 
Introduction to Cryptography
Introduction to CryptographyIntroduction to Cryptography
Introduction to Cryptography
 
Class 1: What is Money?
Class 1: What is Money?Class 1: What is Money?
Class 1: What is Money?
 
Multi-Party Computation for the Masses
Multi-Party Computation for the MassesMulti-Party Computation for the Masses
Multi-Party Computation for the Masses
 
Proof of Reserve
Proof of ReserveProof of Reserve
Proof of Reserve
 
Silk Road
Silk RoadSilk Road
Silk Road
 
Blooming Sidechains!
Blooming Sidechains!Blooming Sidechains!
Blooming Sidechains!
 
Useful Proofs of Work, Permacoin
Useful Proofs of Work, PermacoinUseful Proofs of Work, Permacoin
Useful Proofs of Work, Permacoin
 

Dernier

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Krashi Coaching
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
fonyou31
 

Dernier (20)

Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 

Multi-Tasking Map (MapReduce, Tasks in Rust)

  • 1. cs4414 Fall 2013 University of Virginia David EvansJodhpur, India (Dec 2011)
  • 2. Plan for Today • Recap list map • Google’s MapReduce • Tasks in Rust • Multi-threaded map 26 September 2013 University of Virginia cs4414 1 PS2 is due Monday (30 Sept) at 8:59pm. Submission form will be posted later today, and include signup for scheduling your demo/review. All team members are expected to participate in the review, except in extreme circumstances.
  • 3. 26 September 2013 University of Virginia cs4414 2 struct Node { head : int, tail : Option<~Node> } type List = Option<~Node> ; trait Map { fn mapr(&self, &fn(int) -> int) -> List; } impl Map for List { fn mapr(&self, f: &fn(int) -> int) -> List { match(*self) { None => None, Some(ref node) => { Some(~Node{ head: f(node.head), tail: node.tail.mapr(f) }) }, } } } You should understand everything in this code. Ask questions now if there is anything unclear.
  • 4. Cost of Map 26 September 2013 University of Virginia cs4414 3 Core 1 What is the running time of p.map(f) using one core where p is a list of N elements and each evaluation of f(x) takes 1ms?
  • 5. Cost of Multi-Core Map 26 September 2013 University of Virginia cs4414 4 Core 1 Core 3 Core 2 Core 4 What is the running time of p.map(f) using k cores where p is a list of N elements and each evaluation of f(x) takes 1ms?
  • 6. How should we parallelize map? 26 September 2013 University of Virginia cs4414 5 fn mapr(&self, f: &fn(int) -> int) -> List { match(*self) { None => None, Some(ref node) => { Some(~Node{ head: f(node.head), tail: node.tail.mapr(f) }) }, } }
  • 7. 26 September 2013 University of Virginia cs4414 6 “MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines.” OSDI 2004
  • 8. Did Google invent map? 26 September 2013 University of Virginia cs4414 7
  • 10. 26 September 2013 University of Virginia cs4414 9
  • 11. 10 1955-1960: First “mass-produced” computer (sold 123 of them) 1 accumulator register (38 bits), 3 decrement registers (15 bit) Instructions had 3 bit opcode, 15 bit decrement, 15 bit address Magentic Core Memory 32,000 36-bit words 40,000 instructions/second
  • 13. 26 September 2013 University of Virginia cs4414 12 fn mapr(&self, f: &fn(int) -> int) -> List { match(*self) { None => None, Some(ref node) => { Some(~Node{ head: f(node.head), tail: node.tail.mapr(f) }) }, } }
  • 14. 26 September 2013 University of Virginia cs4414 13
  • 15. 26 September 2013 University of Virginia cs4414 14 @ pointers (in 1960)
  • 16. 26 September 2013 University of Virginia cs4414 15
  • 17. MapReduce 26 September 2013 University of Virginia cs4414 16 Google’s map: fn mapr(&self, f: &fn(int) -> int) -> List { match(*self) { None => None, Some(ref node) => { Some(~Node{ head: f(node.head), tail: node.tail.mapr(f) }) }, } }
  • 18. 26 September 2013 University of Virginia cs4414 17
  • 19. 26 September 2013 University of Virginia cs4414 18 fn mapg<K1, V1, K2, V2>(List<Pair<K1, V1>>, f: &fn(K1, V1) -> (K2, V2)) -> List<Pair<K2, V2>> fn reduceg<K, V, R>(K, List<V>) -> List<R>
  • 20. 26 September 2013 University of Virginia cs4414 19 fn mapg<K1, V1, K2, V2>(List<Pair<K1, V1>>, f: &fn(K1, V1) -> (K2, V2)) -> List<Pair<K2, V2>> fn reduceg<K, V, R>(K, List<V>) -> List<R> fn map_reduce<K1, V1, K2, V2, R>( List<Pair<K1, V2>>, mapf: &fn(K1, V1) -> (K2, V2)), reducef: &fn(K2, List<V2>) -> R)) -> List<R>
  • 21. 26 September 2013 University of Virginia cs4414 20 fn map_reduce<K1, V1, K2, V2, R>( data: List<Pair<K1, V2>>, mapf: &fn(K1, V1) -> (K2, V2)), reducef: &fn(K2, List<V2>) -> R)) -> List<R> { }
  • 22. 26 September 2013 University of Virginia cs4414 21 fn map_reduce<K1, V1, K2, V2, R>( data: List<Pair<K1, V2>>, mapf: &fn(K1, V1) -> (K2, V2)), reducef: &fn(K2, List<V2>) -> R)) -> List<R> { let ivalues = data.map(mapf) let mvalues = // merge ivalues by k2 mvalues.map(reducef) } Completing the code (with parallel map will finish today) is left as sticker-worthy exercise!
  • 23. Mapping in Parallel 26 September 2013 University of Virginia cs4414 22
  • 24. Processes, Threads, Tasks Process Originally: abstraction for owning the whole machine What do you need: 26 September 2013 University of Virginia cs4414 23 Thread (Illusion of) independent sequence of instructions What do you need:
  • 25. Processes, Threads, Tasks Process Originally: abstraction for owning the whole machine What do you need: 26 September 2013 University of Virginia cs4414 24 Own program counter Own stack, registers Own memory space Own program counter Own stack, registers Shares memory space Thread (Illusion of) independent sequence of instructions What do you need:
  • 26. Tasks in Rust 26 September 2013 University of Virginia cs4414 25
  • 27. Tasks Own PC Own stack, registers Safely shared immutable memory Safely independent own memory 26 September 2013 University of Virginia cs4414 26 fn spawn(f: ~fn()) spawn( | | { println(“Get back to work!”); }); do spawn { println(“Get back to work!”); } syntactic sugar: Task = Thread – unsafe memory sharing or Task = Process + safe memory sharing – cost of OS process
  • 28. 26 September 2013 University of Virginia cs4414 27 impl Map for List { fn mapr(&self, f: &fn(int) -> int) -> List { match(*self) { None => None, Some(ref node) => { Some(~Node{ head: f(node.head), tail: node.tail.mapr(f) }) }, } } } Original single-threaded mapr fn spawn(f: ~fn())
  • 29. 26 September 2013 University of Virginia cs4414 28 impl Map for List { fn mapr(&self, f: extern fn(int) -> int) -> List { match(*self) { None => None, Some(ref node) => { do spawn { f(node.head) } Some(~Node{ head: ?, tail: node.tail.mapr(f) }) }, } } } First attempt Cannot use node here!
  • 30. 26 September 2013 University of Virginia cs4414 29 impl Map for List { fn mapr(&self, f: extern fn(int) -> int) -> List { match(*self) { None => None, Some(ref node) => { let val = node.head; do spawn { f(val) } Some(~Node{ head: ?, tail: node.tail.mapr(f) }) }, } } } How can we get results back from a spawned task without shared memory?
  • 31. Channels 26 September 2013 University of Virginia cs4414 30 let (port, chan) : (Port<int>, Chan<int>) = stream(); let val = node.head; do spawn { chan.send(f(val)); } let newval = port.recv();
  • 32. 26 September 2013 University of Virginia cs4414 31 Using streams to spawn is dangerous for salmon, but Rust saves you from (data) races with the bears!
  • 33. 26 September 2013 University of Virginia cs4414 32 First attempt fn mapr(&self, f: extern fn(int) -> int) -> List { match(*self) { None => None, Some(ref node) => { let (port, chan) : (Port<int>, Chan<int>) = stream(); let newtail = node.tail.mapr(f); let val = node.head; do spawn { chan.send(f(val)); } Some(~Node{ head: port.recv(), tail: newtail }) } } } } Compiles are runs fine and produces correct output… but has a major bug!
  • 34. 26 September 2013 University of Virginia cs4414 33 Now we’re spawning! fn mapr(&self, f: extern fn(int) -> int) -> List { match(*self) { None => None, Some(ref node) => { let (port, chan) : (Port<int>, Chan<int>) = stream(); let val = node.head; do spawn { chan.send(f(val)); } let newtail = node.tail.mapr(f); Some(~Node{ head: port.recv(), tail: newtail }) } } } }
  • 35. 26 September 2013 University of Virginia cs4414 34 fn collatz_steps(n: int) -> int { if n == 1 { 0 } else { 1 + collatz_steps(if n % 2 == 0 { n / 2 } else { 3*n + 1 }) } } fn find_collatz(k: int) -> int { // Returns the minimum value, n, with Collatz stopping time >= k. let mut n = 1; while collatz_steps(n) < k { n += 1; } n } fn main() { let lst0 : List = Some(~Node{head: 400, tail: . Some(~Node{head : 410, tail: // … 16 total similar elements } ); println(lst0.to_str()); let lst1 = lst0.mapr(find_collatz); println(lst1.to_str()); let lst2 = lst1.mapr(find_collatz); println(lst2.to_str()); }
  • 36. 26 September 2013 University of Virginia cs4414 35 When 350+% of your CPU isn’t fast enough, its time to buy a new computer!
  • 37. 26 September 2013 University of Virginia cs4414 36
  • 38. 26 September 2013 University of Virginia cs4414 37 Intel i7 Quad-Core Processor
  • 39. 26 September 2013 University of Virginia cs4414 38 Intel i7 Quad-Core Processor Core Core Core Core Shared Memory Cache (L3 = 6MB) ~256KBL2 Cache(?)
  • 40. Why so few? 26 September 2013 University of Virginia cs4414 39
  • 41. 26 September 2013 University of Virginia cs4414 40 Portuguese, was beavering away in the library when ‘smoke suddenly started to come out’ of her computer. Fortunately, she removed the fire hazard from the library, averting disaster at the last moment. The student gave The Tab her version of the story: “I was in the library working at my computer when smoke suddenly started to come out of it. I freaked out for a second, trying to save my work onto my hard disk, but then I realised it was probably more important to take it out of the library. The Tab (Oxford), “Laptop Fire Almost Destroys College Library”
  • 42. Where the Cores Are 26 September 2013 University of Virginia cs4414 41 nVIDIA GeForce GTX 650M 384 cores (but even harder for typical programs to use well than Intel’s cores)
  • 43. How much faster will my Rust mapping program be on my new machine? 26 September 2013 University of Virginia cs4414 42 2013 MacBook Pro Intel i7-3740QM 2.7 GHz, 4 cores (8 threads) 6MB shared L3 cache 2011 MacBook Air Intel i5-2557M 1.7 GHz, 2 cores (4 threads) 3 MB shared L3 cache both support “hyperthreading” (two threads per core) 60 seconds (normalized time, running on 16- element list) ?
  • 44. 26 September 2013 University of Virginia cs4414 43
  • 45. 26 September 2013 University of Virginia cs4414 44 Submit your “guesses” and reasoning in course forum….hopefully I will know the actual answer by Tuesday! PS2 is due Monday (30 Sept) at 8:59pm. Submission form will be posted later today, and include signup for scheduling your demo/review. All team members are expected to participate in the review, except in extreme circumstances.
  • 46. 26 September 2013 University of Virginia cs4414 45