SlideShare une entreprise Scribd logo
1  sur  15
Télécharger pour lire hors ligne
Example Parallel Overview snow fork Summary
Parallel Computing with R
Péter Sólymos
Edmonton R User Group meeting, April 26, 2013
Example Parallel Overview snow fork Summary
Ovenbird example from 'detect' package
> str(oven)
'data.frame': 891 obs. of 11 variables:
$ count : int 1 0 0 1 0 0 0 0 0 0 ...
$ route : int 2 2 2 2 2 2 2 2 2 2 ...
$ stop : int 2 4 6 8 10 12 14 16 18 20 ...
$ pforest: num 0.947 0.903 0.814 0.89 0.542 ...
$ pdecid : num 0.575 0.562 0.549 0.679 0.344 ...
$ pagri : num 0 0 0 0 0.414 ...
$ long : num 609343 608556 607738 607680 607944 ...
$ lat : num 5949071 5947735 5946301 5944720 5943088 ...
$ observ : Factor w/ 4 levels "ARS","DW","RDW",..: 4 4 4 4 4 4 4 4 4 4 ...
$ julian : int 181 181 181 181 181 181 181 181 181 181 ...
$ timeday: int 2 4 6 8 10 12 14 16 18 20 ...
Example Parallel Overview snow fork Summary
NegBin GLM with bootstrap
> library(MASS)
> m <- glm.nb(count ~ pforest, oven)
> fun1 <- function(i) {
+ id <- sample.int(nrow(oven), nrow(oven), replace = TRUE)
+ coef(glm.nb(count ~ pforest, oven[id, ]))
+ }
> B <- 199
> system.time(bm <- sapply(1:B, fun1))
user system elapsed
26.79 0.02 27.11
> bm <- cbind(coef(m), bm)
> cbind(coef(summary(m))[, 1:2], `Boot. SE` = apply(bm, 1, sd))
Estimate Std. Error Boot. SE
(Intercept) -2.177 0.1277 0.1229
pforest 2.674 0.1709 0.1553
Example Parallel Overview snow fork Summary
Parallel bootstrap
> library(parallel)
> (cl <- makePSOCKcluster(3))
socket cluster with 3 nodes on host 'localhost'
> clusterExport(cl, "oven")
> tmp <- clusterEvalQ(cl, library(MASS))
> t0 <- proc.time()
> bm2 <- parSapply(cl, 1:B, fun1)
> proc.time() - t0
user system elapsed
0.00 0.00 11.06
> stopCluster(cl)
Example Parallel Overview snow fork Summary
High performance computing (HPC)
ˆ Parallel computing,
ˆ large memory and out-of-memory data,
ˆ interfaces for compiled code,
ˆ proling tools,
ˆ batch scheduling.
CRAN Task View: High-Performance and Parallel Computing with R
Example Parallel Overview snow fork Summary
Parallel computing
Embarassingly parallel problems:
ˆ bootstrap,
ˆ MCMC,
ˆ simulations.
Can be broken down into independent pieces.1
1Schmidberger et al. 2009 JSS: State of the Art in Parallel Computing with R
Example Parallel Overview snow fork Summary
Parallel computing
ˆ explicit (distributed memory),
ˆ implicit (shared memory),
ˆ grid,
ˆ Hadoop,
ˆ GPUs.
Example Parallel Overview snow fork Summary
Starting a cluster
 library(snow)
 cl - makeCluster(3, type = SOCK)
Cluster types:
ˆ SOCK, multicore
ˆ PVM, Parallel Virtual Machine
ˆ MPI, Message Passing Interface
ˆ NWS, NetWorkSpaces (multicore  grid)
Error: invalid connection
Example Parallel Overview snow fork Summary
Distribute stu, evaluate expressions
 clusterExport(cl, oven)
 clusterEvalQ(cl, library(MASS))
[[1]]
[1] MASS methods stats graphics
[5] grDevices utils datasets base
[[2]]
[1] MASS methods stats graphics
[5] grDevices utils datasets base
[[3]]
[1] MASS methods stats graphics
[5] grDevices utils datasets base
Example Parallel Overview snow fork Summary
Random Number Generation (RNG)
 library(rlecuyer)
 tmp - clusterEvalQ(cl, set.seed(1234))
 clusterEvalQ(cl, rnorm(5))
[[1]]
[1] -1.2071 0.2774 1.0844 -2.3457 0.4291
[[2]]
[1] -1.2071 0.2774 1.0844 -2.3457 0.4291
 snow:::clusterSetupRNG(cl)
[1] RNGstream
 clusterEvalQ(cl, rnorm(5))
[[1]]
[1] -1.14063 -0.49816 -0.76670 -0.04821 -1.09852
[[2]]
[1] 0.7050 0.4821 -1.2848 0.7198 0.7386
Important when calculating indices or doing simulations.
Example Parallel Overview snow fork Summary
Apply operations: split
 parallel:::parLapply
function (cl = NULL, X, fun, ...)
{
cl - defaultCluster(cl)
do.call(c, clusterApply(cl, x = splitList(X, length(cl)),
fun = lapply, fun, ...), quote = TRUE)
}
bytecode: 0x04c1eba8
environment: namespace:parallel
 snow:::splitList(1:10, length(cl))
[[1]]
[1] 1 2 3 4 5
[[2]]
[1] 6 7 8 9 10
Example Parallel Overview snow fork Summary
Apply operations: evaluate and combine
 f - function(i) i * 2
 (res - clusterApply(cl, snow:::splitList(1:10, length(cl)),
+ f))
[[1]]
[1] 2 4 6
[[2]]
[1] 8 10 12 14
[[3]]
[1] 16 18 20
 do.call(c, res)
[1] 2 4 6 8 10 12 14 16 18 20
Example Parallel Overview snow fork Summary
Apply operations: load balancing
 f - function(i) i * 2
 unlist(parallel:::parLapplyLB(cl, 1:10, f))
[1] 2 4 6 8 10 12 14 16 18 20
Example Parallel Overview snow fork Summary
Implicit parallelism
No need to distribute stu, only evaluate on child processes.
 mclapply(X, FUN, mc.cores)
Example Parallel Overview snow fork Summary
Summary
Parallel computing is not hard on a single computer.
Diculty comes in when using large, shared, and heterogeneous
resources.
 stopCluster(cl)

Contenu connexe

Tendances

Практический опыт профайлинга и оптимизации производительности Ruby-приложений
Практический опыт профайлинга и оптимизации производительности Ruby-приложенийПрактический опыт профайлинга и оптимизации производительности Ruby-приложений
Практический опыт профайлинга и оптимизации производительности Ruby-приложений
Olga Lavrentieva
 

Tendances (20)

SevillaR meetup: dplyr and magrittr
SevillaR meetup: dplyr and magrittrSevillaR meetup: dplyr and magrittr
SevillaR meetup: dplyr and magrittr
 
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak   CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
 
App-o-Lockalypse now!
App-o-Lockalypse now!App-o-Lockalypse now!
App-o-Lockalypse now!
 
Profiling Ruby
Profiling RubyProfiling Ruby
Profiling Ruby
 
Практический опыт профайлинга и оптимизации производительности Ruby-приложений
Практический опыт профайлинга и оптимизации производительности Ruby-приложенийПрактический опыт профайлинга и оптимизации производительности Ruby-приложений
Практический опыт профайлинга и оптимизации производительности Ruby-приложений
 
This is not your father's monitoring.
This is not your father's monitoring.This is not your father's monitoring.
This is not your father's monitoring.
 
OSTEP Chapter2 Introduction
OSTEP Chapter2 IntroductionOSTEP Chapter2 Introduction
OSTEP Chapter2 Introduction
 
MongoDB World 2019: Event Horizon: Meet Albert Einstein As You Move To The Cloud
MongoDB World 2019: Event Horizon: Meet Albert Einstein As You Move To The CloudMongoDB World 2019: Event Horizon: Meet Albert Einstein As You Move To The Cloud
MongoDB World 2019: Event Horizon: Meet Albert Einstein As You Move To The Cloud
 
Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305...
Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305...Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305...
Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305...
 
Artimon - Apache Flume (incubating) NYC Meetup 20111108
Artimon - Apache Flume (incubating) NYC Meetup 20111108Artimon - Apache Flume (incubating) NYC Meetup 20111108
Artimon - Apache Flume (incubating) NYC Meetup 20111108
 
Parallel Computing in R
Parallel Computing in RParallel Computing in R
Parallel Computing in R
 
Tracing and awk in ns2
Tracing and awk in ns2Tracing and awk in ns2
Tracing and awk in ns2
 
Db2
Db2Db2
Db2
 
Kubernetes Tutorial
Kubernetes TutorialKubernetes Tutorial
Kubernetes Tutorial
 
Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotha...
Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotha...Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotha...
Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotha...
 
C++ Optimization Tips
C++ Optimization TipsC++ Optimization Tips
C++ Optimization Tips
 
Tests unitaires pour PostgreSQL avec pgTap
Tests unitaires pour PostgreSQL avec pgTapTests unitaires pour PostgreSQL avec pgTap
Tests unitaires pour PostgreSQL avec pgTap
 
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert HodgesWebinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
 
Nodejs性能分析优化和分布式设计探讨
Nodejs性能分析优化和分布式设计探讨Nodejs性能分析优化和分布式设计探讨
Nodejs性能分析优化和分布式设计探讨
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
 

En vedette (12)

Lesson 10 Application Program Interface
Lesson 10 Application Program InterfaceLesson 10 Application Program Interface
Lesson 10 Application Program Interface
 
Fork CMS
Fork CMSFork CMS
Fork CMS
 
FORK Overview
FORK OverviewFORK Overview
FORK Overview
 
Git & GitHub
Git & GitHubGit & GitHub
Git & GitHub
 
Unix kernal
Unix kernalUnix kernal
Unix kernal
 
Linux Process & CF scheduling
Linux Process & CF schedulingLinux Process & CF scheduling
Linux Process & CF scheduling
 
System call (Fork +Exec)
System call (Fork +Exec)System call (Fork +Exec)
System call (Fork +Exec)
 
Part 04 Creating a System Call in Linux
Part 04 Creating a System Call in LinuxPart 04 Creating a System Call in Linux
Part 04 Creating a System Call in Linux
 
Chapter 3 - Processes
Chapter 3 - ProcessesChapter 3 - Processes
Chapter 3 - Processes
 
Linux Programming
Linux ProgrammingLinux Programming
Linux Programming
 
System call
System callSystem call
System call
 
System calls
System callsSystem calls
System calls
 

Similaire à Parallel Computing with R

Parallel R in snow (english after 2nd slide)
Parallel R in snow (english after 2nd slide)Parallel R in snow (english after 2nd slide)
Parallel R in snow (english after 2nd slide)
Cdiscount
 
Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)
Ontico
 
Profiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf ToolsProfiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf Tools
emBO_Conference
 

Similaire à Parallel Computing with R (20)

Parallel R in snow (english after 2nd slide)
Parallel R in snow (english after 2nd slide)Parallel R in snow (english after 2nd slide)
Parallel R in snow (english after 2nd slide)
 
Do snow.rwn
Do snow.rwnDo snow.rwn
Do snow.rwn
 
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidiaRAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
 
Debugging Ruby
Debugging RubyDebugging Ruby
Debugging Ruby
 
Replication MongoDB Days 2013
Replication MongoDB Days 2013Replication MongoDB Days 2013
Replication MongoDB Days 2013
 
Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)
 
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
 
Performance Tuning EC2 Instances
Performance Tuning EC2 InstancesPerformance Tuning EC2 Instances
Performance Tuning EC2 Instances
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
Debugging Ruby Systems
Debugging Ruby SystemsDebugging Ruby Systems
Debugging Ruby Systems
 
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems Performance
 
EKON22 Introduction to Machinelearning
EKON22 Introduction to MachinelearningEKON22 Introduction to Machinelearning
EKON22 Introduction to Machinelearning
 
Complex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutionsComplex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutions
 
Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn
Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp KrennJavantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn
Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn
 
pstack, truss etc to understand deeper issues in Oracle database
pstack, truss etc to understand deeper issues in Oracle databasepstack, truss etc to understand deeper issues in Oracle database
pstack, truss etc to understand deeper issues in Oracle database
 
Profiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf ToolsProfiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf Tools
 
Monitoring with Prometheus
Monitoring with PrometheusMonitoring with Prometheus
Monitoring with Prometheus
 
LSFMM 2019 BPF Observability
LSFMM 2019 BPF ObservabilityLSFMM 2019 BPF Observability
LSFMM 2019 BPF Observability
 

Dernier

Dernier (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Parallel Computing with R

  • 1. Example Parallel Overview snow fork Summary Parallel Computing with R Péter Sólymos Edmonton R User Group meeting, April 26, 2013
  • 2. Example Parallel Overview snow fork Summary Ovenbird example from 'detect' package > str(oven) 'data.frame': 891 obs. of 11 variables: $ count : int 1 0 0 1 0 0 0 0 0 0 ... $ route : int 2 2 2 2 2 2 2 2 2 2 ... $ stop : int 2 4 6 8 10 12 14 16 18 20 ... $ pforest: num 0.947 0.903 0.814 0.89 0.542 ... $ pdecid : num 0.575 0.562 0.549 0.679 0.344 ... $ pagri : num 0 0 0 0 0.414 ... $ long : num 609343 608556 607738 607680 607944 ... $ lat : num 5949071 5947735 5946301 5944720 5943088 ... $ observ : Factor w/ 4 levels "ARS","DW","RDW",..: 4 4 4 4 4 4 4 4 4 4 ... $ julian : int 181 181 181 181 181 181 181 181 181 181 ... $ timeday: int 2 4 6 8 10 12 14 16 18 20 ...
  • 3. Example Parallel Overview snow fork Summary NegBin GLM with bootstrap > library(MASS) > m <- glm.nb(count ~ pforest, oven) > fun1 <- function(i) { + id <- sample.int(nrow(oven), nrow(oven), replace = TRUE) + coef(glm.nb(count ~ pforest, oven[id, ])) + } > B <- 199 > system.time(bm <- sapply(1:B, fun1)) user system elapsed 26.79 0.02 27.11 > bm <- cbind(coef(m), bm) > cbind(coef(summary(m))[, 1:2], `Boot. SE` = apply(bm, 1, sd)) Estimate Std. Error Boot. SE (Intercept) -2.177 0.1277 0.1229 pforest 2.674 0.1709 0.1553
  • 4. Example Parallel Overview snow fork Summary Parallel bootstrap > library(parallel) > (cl <- makePSOCKcluster(3)) socket cluster with 3 nodes on host 'localhost' > clusterExport(cl, "oven") > tmp <- clusterEvalQ(cl, library(MASS)) > t0 <- proc.time() > bm2 <- parSapply(cl, 1:B, fun1) > proc.time() - t0 user system elapsed 0.00 0.00 11.06 > stopCluster(cl)
  • 5. Example Parallel Overview snow fork Summary High performance computing (HPC) ˆ Parallel computing, ˆ large memory and out-of-memory data, ˆ interfaces for compiled code, ˆ proling tools, ˆ batch scheduling. CRAN Task View: High-Performance and Parallel Computing with R
  • 6. Example Parallel Overview snow fork Summary Parallel computing Embarassingly parallel problems: ˆ bootstrap, ˆ MCMC, ˆ simulations. Can be broken down into independent pieces.1 1Schmidberger et al. 2009 JSS: State of the Art in Parallel Computing with R
  • 7. Example Parallel Overview snow fork Summary Parallel computing ˆ explicit (distributed memory), ˆ implicit (shared memory), ˆ grid, ˆ Hadoop, ˆ GPUs.
  • 8. Example Parallel Overview snow fork Summary Starting a cluster library(snow) cl - makeCluster(3, type = SOCK) Cluster types: ˆ SOCK, multicore ˆ PVM, Parallel Virtual Machine ˆ MPI, Message Passing Interface ˆ NWS, NetWorkSpaces (multicore grid) Error: invalid connection
  • 9. Example Parallel Overview snow fork Summary Distribute stu, evaluate expressions clusterExport(cl, oven) clusterEvalQ(cl, library(MASS)) [[1]] [1] MASS methods stats graphics [5] grDevices utils datasets base [[2]] [1] MASS methods stats graphics [5] grDevices utils datasets base [[3]] [1] MASS methods stats graphics [5] grDevices utils datasets base
  • 10. Example Parallel Overview snow fork Summary Random Number Generation (RNG) library(rlecuyer) tmp - clusterEvalQ(cl, set.seed(1234)) clusterEvalQ(cl, rnorm(5)) [[1]] [1] -1.2071 0.2774 1.0844 -2.3457 0.4291 [[2]] [1] -1.2071 0.2774 1.0844 -2.3457 0.4291 snow:::clusterSetupRNG(cl) [1] RNGstream clusterEvalQ(cl, rnorm(5)) [[1]] [1] -1.14063 -0.49816 -0.76670 -0.04821 -1.09852 [[2]] [1] 0.7050 0.4821 -1.2848 0.7198 0.7386 Important when calculating indices or doing simulations.
  • 11. Example Parallel Overview snow fork Summary Apply operations: split parallel:::parLapply function (cl = NULL, X, fun, ...) { cl - defaultCluster(cl) do.call(c, clusterApply(cl, x = splitList(X, length(cl)), fun = lapply, fun, ...), quote = TRUE) } bytecode: 0x04c1eba8 environment: namespace:parallel snow:::splitList(1:10, length(cl)) [[1]] [1] 1 2 3 4 5 [[2]] [1] 6 7 8 9 10
  • 12. Example Parallel Overview snow fork Summary Apply operations: evaluate and combine f - function(i) i * 2 (res - clusterApply(cl, snow:::splitList(1:10, length(cl)), + f)) [[1]] [1] 2 4 6 [[2]] [1] 8 10 12 14 [[3]] [1] 16 18 20 do.call(c, res) [1] 2 4 6 8 10 12 14 16 18 20
  • 13. Example Parallel Overview snow fork Summary Apply operations: load balancing f - function(i) i * 2 unlist(parallel:::parLapplyLB(cl, 1:10, f)) [1] 2 4 6 8 10 12 14 16 18 20
  • 14. Example Parallel Overview snow fork Summary Implicit parallelism No need to distribute stu, only evaluate on child processes. mclapply(X, FUN, mc.cores)
  • 15. Example Parallel Overview snow fork Summary Summary Parallel computing is not hard on a single computer. Diculty comes in when using large, shared, and heterogeneous resources. stopCluster(cl)