1. Taking R on limit
Kutergin Alex
Perm State University, MiFIT
16 october 2012
Kutergin A. High performance computing with R
2. Outline
1 General words about R
2 Motivation and scope
3 The basic ways of speeding up the R-code
4 The special way of speeding up the R-code: package pnmath
5 Problem of data splitting: package iterator
6 Parallel computation with R: high-level parallelism (packages:
parallel, snow and additional packages)
7 Parallel computation with R: low-level parallelism (package: Rmpi)
8 Parallel computation with R: parallel execution of for-loops
(package: foreach)
9 Parallel computation with R: parallel computation with graphical
processing unit (package: gputools)
10 Working with vary large datasets: package filehash and package
bigmemory
11 Final words, some useful references and contacts
Kutergin A. High performance computing with R
3. Outline
1 General words about R
2 Motivation and scope
3 The basic ways of speeding up the R-code
4 The special way of speeding up the R-code: package pnmath
5 Problem of data splitting: package iterator
6 Parallel computation with R: high-level parallelism (packages:
parallel, snow and additional packages)
7 Parallel computation with R: low-level parallelism (package: Rmpi)
8 Parallel computation with R: parallel execution of for-loops
(package: foreach)
9 Parallel computation with R: parallel computation with graphical
processing unit (package: gputools)
10 Working with vary large datasets: package filehash and package
bigmemory
11 Final words, some useful references and contacts
Kutergin A. High performance computing with R
4. Outline
1 General words about R
2 Motivation and scope
3 The basic ways of speeding up the R-code
4 The special way of speeding up the R-code: package pnmath
5 Problem of data splitting: package iterator
6 Parallel computation with R: high-level parallelism (packages:
parallel, snow and additional packages)
7 Parallel computation with R: low-level parallelism (package: Rmpi)
8 Parallel computation with R: parallel execution of for-loops
(package: foreach)
9 Parallel computation with R: parallel computation with graphical
processing unit (package: gputools)
10 Working with vary large datasets: package filehash and package
bigmemory
11 Final words, some useful references and contacts
Kutergin A. High performance computing with R
5. Outline
1 General words about R
2 Motivation and scope
3 The basic ways of speeding up the R-code
4 The special way of speeding up the R-code: package pnmath
5 Problem of data splitting: package iterator
6 Parallel computation with R: high-level parallelism (packages:
parallel, snow and additional packages)
7 Parallel computation with R: low-level parallelism (package: Rmpi)
8 Parallel computation with R: parallel execution of for-loops
(package: foreach)
9 Parallel computation with R: parallel computation with graphical
processing unit (package: gputools)
10 Working with vary large datasets: package filehash and package
bigmemory
11 Final words, some useful references and contacts
Kutergin A. High performance computing with R
6. Outline
1 General words about R
2 Motivation and scope
3 The basic ways of speeding up the R-code
4 The special way of speeding up the R-code: package pnmath
5 Problem of data splitting: package iterator
6 Parallel computation with R: high-level parallelism (packages:
parallel, snow and additional packages)
7 Parallel computation with R: low-level parallelism (package: Rmpi)
8 Parallel computation with R: parallel execution of for-loops
(package: foreach)
9 Parallel computation with R: parallel computation with graphical
processing unit (package: gputools)
10 Working with vary large datasets: package filehash and package
bigmemory
11 Final words, some useful references and contacts
Kutergin A. High performance computing with R
7. Outline
1 General words about R
2 Motivation and scope
3 The basic ways of speeding up the R-code
4 The special way of speeding up the R-code: package pnmath
5 Problem of data splitting: package iterator
6 Parallel computation with R: high-level parallelism (packages:
parallel, snow and additional packages)
7 Parallel computation with R: low-level parallelism (package: Rmpi)
8 Parallel computation with R: parallel execution of for-loops
(package: foreach)
9 Parallel computation with R: parallel computation with graphical
processing unit (package: gputools)
10 Working with vary large datasets: package filehash and package
bigmemory
11 Final words, some useful references and contacts
Kutergin A. High performance computing with R
8. Outline
1 General words about R
2 Motivation and scope
3 The basic ways of speeding up the R-code
4 The special way of speeding up the R-code: package pnmath
5 Problem of data splitting: package iterator
6 Parallel computation with R: high-level parallelism (packages:
parallel, snow and additional packages)
7 Parallel computation with R: low-level parallelism (package: Rmpi)
8 Parallel computation with R: parallel execution of for-loops
(package: foreach)
9 Parallel computation with R: parallel computation with graphical
processing unit (package: gputools)
10 Working with vary large datasets: package filehash and package
bigmemory
11 Final words, some useful references and contacts
Kutergin A. High performance computing with R
9. Outline
1 General words about R
2 Motivation and scope
3 The basic ways of speeding up the R-code
4 The special way of speeding up the R-code: package pnmath
5 Problem of data splitting: package iterator
6 Parallel computation with R: high-level parallelism (packages:
parallel, snow and additional packages)
7 Parallel computation with R: low-level parallelism (package: Rmpi)
8 Parallel computation with R: parallel execution of for-loops
(package: foreach)
9 Parallel computation with R: parallel computation with graphical
processing unit (package: gputools)
10 Working with vary large datasets: package filehash and package
bigmemory
11 Final words, some useful references and contacts
Kutergin A. High performance computing with R
10. Outline
1 General words about R
2 Motivation and scope
3 The basic ways of speeding up the R-code
4 The special way of speeding up the R-code: package pnmath
5 Problem of data splitting: package iterator
6 Parallel computation with R: high-level parallelism (packages:
parallel, snow and additional packages)
7 Parallel computation with R: low-level parallelism (package: Rmpi)
8 Parallel computation with R: parallel execution of for-loops
(package: foreach)
9 Parallel computation with R: parallel computation with graphical
processing unit (package: gputools)
10 Working with vary large datasets: package filehash and package
bigmemory
11 Final words, some useful references and contacts
Kutergin A. High performance computing with R
11. Outline
1 General words about R
2 Motivation and scope
3 The basic ways of speeding up the R-code
4 The special way of speeding up the R-code: package pnmath
5 Problem of data splitting: package iterator
6 Parallel computation with R: high-level parallelism (packages:
parallel, snow and additional packages)
7 Parallel computation with R: low-level parallelism (package: Rmpi)
8 Parallel computation with R: parallel execution of for-loops
(package: foreach)
9 Parallel computation with R: parallel computation with graphical
processing unit (package: gputools)
10 Working with vary large datasets: package filehash and package
bigmemory
11 Final words, some useful references and contacts
Kutergin A. High performance computing with R
12. Outline
1 General words about R
2 Motivation and scope
3 The basic ways of speeding up the R-code
4 The special way of speeding up the R-code: package pnmath
5 Problem of data splitting: package iterator
6 Parallel computation with R: high-level parallelism (packages:
parallel, snow and additional packages)
7 Parallel computation with R: low-level parallelism (package: Rmpi)
8 Parallel computation with R: parallel execution of for-loops
(package: foreach)
9 Parallel computation with R: parallel computation with graphical
processing unit (package: gputools)
10 Working with vary large datasets: package filehash and package
bigmemory
11 Final words, some useful references and contacts
Kutergin A. High performance computing with R
13. Outline
1 General words about R
2 Motivation and scope
3 The basic ways of speeding up the R-code
4 The special way of speeding up the R-code: package pnmath
5 Problem of data splitting: package iterator
6 Parallel computation with R: high-level parallelism (packages:
parallel, snow and additional packages)
7 Parallel computation with R: low-level parallelism (package: Rmpi)
8 Parallel computation with R: parallel execution of for-loops
(package: foreach)
9 Parallel computation with R: parallel computation with graphical
processing unit (package: gputools)
10 Working with vary large datasets: package filehash and package
bigmemory
11 Final words, some useful references and contacts
Kutergin A. High performance computing with R
14. Outline
1 General words about R
2 Motivation and scope
3 The basic ways of speeding up the R-code
4 The special way of speeding up the R-code: package pnmath
5 Problem of data splitting: package iterator
6 Parallel computation with R: high-level parallelism (packages:
parallel, snow and additional packages)
7 Parallel computation with R: low-level parallelism (package: Rmpi)
8 Parallel computation with R: parallel execution of for-loops
(package: foreach)
9 Parallel computation with R: parallel computation with graphical
processing unit (package: gputools)
10 Working with vary large datasets: package filehash and package
bigmemory
11 Final words, some useful references and contacts
Kutergin A. High performance computing with R
15. Outline
1 General words about R
2 Motivation and scope
3 The basic ways of speeding up the R-code
4 The special way of speeding up the R-code: package pnmath
5 Problem of data splitting: package iterator
6 Parallel computation with R: high-level parallelism (packages:
parallel, snow and additional packages)
7 Parallel computation with R: low-level parallelism (package: Rmpi)
8 Parallel computation with R: parallel execution of for-loops
(package: foreach)
9 Parallel computation with R: parallel computation with graphical
processing unit (package: gputools)
10 Working with vary large datasets: package filehash and package
bigmemory
11 Final words, some useful references and contacts
Kutergin A. High performance computing with R
16. Outline
1 General words about R
2 Motivation and scope
3 The basic ways of speeding up the R-code
4 The special way of speeding up the R-code: package pnmath
5 Problem of data splitting: package iterator
6 Parallel computation with R: high-level parallelism (packages:
parallel, snow and additional packages)
7 Parallel computation with R: low-level parallelism (package: Rmpi)
8 Parallel computation with R: parallel execution of for-loops
(package: foreach)
9 Parallel computation with R: parallel computation with graphical
processing unit (package: gputools)
10 Working with vary large datasets: package filehash and package
bigmemory
11 Final words, some useful references and contacts
Kutergin A. High performance computing with R
17. Outline
1 General words about R
2 Motivation and scope
3 The basic ways of speeding up the R-code
4 The special way of speeding up the R-code: package pnmath
5 Problem of data splitting: package iterator
6 Parallel computation with R: high-level parallelism (packages:
parallel, snow and additional packages)
7 Parallel computation with R: low-level parallelism (package: Rmpi)
8 Parallel computation with R: parallel execution of for-loops
(package: foreach)
9 Parallel computation with R: parallel computation with graphical
processing unit (package: gputools)
10 Working with vary large datasets: package filehash and package
bigmemory
11 Final words, some useful references and contacts
Kutergin A. High performance computing with R
18. Outline
1 General words about R
2 Motivation and scope
3 The basic ways of speeding up the R-code
4 The special way of speeding up the R-code: package pnmath
5 Problem of data splitting: package iterator
6 Parallel computation with R: high-level parallelism (packages:
parallel, snow and additional packages)
7 Parallel computation with R: low-level parallelism (package: Rmpi)
8 Parallel computation with R: parallel execution of for-loops
(package: foreach)
9 Parallel computation with R: parallel computation with graphical
processing unit (package: gputools)
10 Working with vary large datasets: package filehash and package
bigmemory
11 Final words, some useful references and contacts
Kutergin A. High performance computing with R
19. Outline
1 General words about R
2 Motivation and scope
3 The basic ways of speeding up the R-code
4 The special way of speeding up the R-code: package pnmath
5 Problem of data splitting: package iterator
6 Parallel computation with R: high-level parallelism (packages:
parallel, snow and additional packages)
7 Parallel computation with R: low-level parallelism (package: Rmpi)
8 Parallel computation with R: parallel execution of for-loops
(package: foreach)
9 Parallel computation with R: parallel computation with graphical
processing unit (package: gputools)
10 Working with vary large datasets: package filehash and package
bigmemory
11 Final words, some useful references and contacts
Kutergin A. High performance computing with R
20. Outline
1 General words about R
2 Motivation and scope
3 The basic ways of speeding up the R-code
4 The special way of speeding up the R-code: package pnmath
5 Problem of data splitting: package iterator
6 Parallel computation with R: high-level parallelism (packages:
parallel, snow and additional packages)
7 Parallel computation with R: low-level parallelism (package: Rmpi)
8 Parallel computation with R: parallel execution of for-loops
(package: foreach)
9 Parallel computation with R: parallel computation with graphical
processing unit (package: gputools)
10 Working with vary large datasets: package filehash and package
bigmemory
11 Final words, some useful references and contacts
Kutergin A. High performance computing with R
21. Outline
1 General words about R
2 Motivation and scope
3 The basic ways of speeding up the R-code
4 The special way of speeding up the R-code: package pnmath
5 Problem of data splitting: package iterator
6 Parallel computation with R: high-level parallelism (packages:
parallel, snow and additional packages)
7 Parallel computation with R: low-level parallelism (package: Rmpi)
8 Parallel computation with R: parallel execution of for-loops
(package: foreach)
9 Parallel computation with R: parallel computation with graphical
processing unit (package: gputools)
10 Working with vary large datasets: package filehash and package
bigmemory
11 Final words, some useful references and contacts
Kutergin A. High performance computing with R
22. Outline
1 General words about R
2 Motivation and scope
3 The basic ways of speeding up the R-code
4 The special way of speeding up the R-code: package pnmath
5 Problem of data splitting: package iterator
6 Parallel computation with R: high-level parallelism (packages:
parallel, snow and additional packages)
7 Parallel computation with R: low-level parallelism (package: Rmpi)
8 Parallel computation with R: parallel execution of for-loops
(package: foreach)
9 Parallel computation with R: parallel computation with graphical
processing unit (package: gputools)
10 Working with vary large datasets: package filehash and package
bigmemory
11 Final words, some useful references and contacts
Kutergin A. High performance computing with R
23. Outline
1 General words about R
2 Motivation and scope
3 The basic ways of speeding up the R-code
4 The special way of speeding up the R-code: package pnmath
5 Problem of data splitting: package iterator
6 Parallel computation with R: high-level parallelism (packages:
parallel, snow and additional packages)
7 Parallel computation with R: low-level parallelism (package: Rmpi)
8 Parallel computation with R: parallel execution of for-loops
(package: foreach)
9 Parallel computation with R: parallel computation with graphical
processing unit (package: gputools)
10 Working with vary large datasets: package filehash and package
bigmemory
11 Final words, some useful references and contacts
Kutergin A. High performance computing with R
24. Outline
1 General words about R
2 Motivation and scope
3 The basic ways of speeding up the R-code
4 The special way of speeding up the R-code: package pnmath
5 Problem of data splitting: package iterator
6 Parallel computation with R: high-level parallelism (packages:
parallel, snow and additional packages)
7 Parallel computation with R: low-level parallelism (package: Rmpi)
8 Parallel computation with R: parallel execution of for-loops
(package: foreach)
9 Parallel computation with R: parallel computation with graphical
processing unit (package: gputools)
10 Working with vary large datasets: package filehash and package
bigmemory
11 Final words, some useful references and contacts
Kutergin A. High performance computing with R
25. General words about R
R software
R is free powerful software for data analysis and statistical computing. R
- console application with its own programming language running in
interpreter mode. Lack of sophisticated GUI provides a number of
advantages:
there is no need to learn which algorithm is behind each button
you can just learn the basic principles of R-programming and
effectively solve complex problems using R-programming language
Download R
R can be downloaded from following link:
http://cran.r-project.org/
Project page: www.r-project.org
Kutergin A. High performance computing with R
26. General words about R
R software
R is free powerful software for data analysis and statistical computing. R
- console application with its own programming language running in
interpreter mode. Lack of sophisticated GUI provides a number of
advantages:
there is no need to learn which algorithm is behind each button
you can just learn the basic principles of R-programming and
effectively solve complex problems using R-programming language
Download R
R can be downloaded from following link:
http://cran.r-project.org/
Project page: www.r-project.org
Kutergin A. High performance computing with R
27. General words about R
R software
R is free powerful software for data analysis and statistical computing. R
- console application with its own programming language running in
interpreter mode. Lack of sophisticated GUI provides a number of
advantages:
there is no need to learn which algorithm is behind each button
you can just learn the basic principles of R-programming and
effectively solve complex problems using R-programming language
Download R
R can be downloaded from following link:
http://cran.r-project.org/
Project page: www.r-project.org
Kutergin A. High performance computing with R
28. General words about R
R software
R is free powerful software for data analysis and statistical computing. R
- console application with its own programming language running in
interpreter mode. Lack of sophisticated GUI provides a number of
advantages:
there is no need to learn which algorithm is behind each button
you can just learn the basic principles of R-programming and
effectively solve complex problems using R-programming language
Download R
R can be downloaded from following link:
http://cran.r-project.org/
Project page: www.r-project.org
Kutergin A. High performance computing with R
29. General words about R
R software
R is free powerful software for data analysis and statistical computing. R
- console application with its own programming language running in
interpreter mode. Lack of sophisticated GUI provides a number of
advantages:
there is no need to learn which algorithm is behind each button
you can just learn the basic principles of R-programming and
effectively solve complex problems using R-programming language
Download R
R can be downloaded from following link:
http://cran.r-project.org/
Project page: www.r-project.org
Kutergin A. High performance computing with R
30. General words about R
R software
R is free powerful software for data analysis and statistical computing. R
- console application with its own programming language running in
interpreter mode. Lack of sophisticated GUI provides a number of
advantages:
there is no need to learn which algorithm is behind each button
you can just learn the basic principles of R-programming and
effectively solve complex problems using R-programming language
Download R
R can be downloaded from following link:
http://cran.r-project.org/
Project page: www.r-project.org
Kutergin A. High performance computing with R
31. General words about R
R software
R is free powerful software for data analysis and statistical computing. R
- console application with its own programming language running in
interpreter mode. Lack of sophisticated GUI provides a number of
advantages:
there is no need to learn which algorithm is behind each button
you can just learn the basic principles of R-programming and
effectively solve complex problems using R-programming language
Download R
R can be downloaded from following link:
http://cran.r-project.org/
Project page: www.r-project.org
Kutergin A. High performance computing with R
32. General words about R
R software
R is free powerful software for data analysis and statistical computing. R
- console application with its own programming language running in
interpreter mode. Lack of sophisticated GUI provides a number of
advantages:
there is no need to learn which algorithm is behind each button
you can just learn the basic principles of R-programming and
effectively solve complex problems using R-programming language
Download R
R can be downloaded from following link:
http://cran.r-project.org/
Project page: www.r-project.org
Kutergin A. High performance computing with R
33. General words about R
R software
R is free powerful software for data analysis and statistical computing. R
- console application with its own programming language running in
interpreter mode. Lack of sophisticated GUI provides a number of
advantages:
there is no need to learn which algorithm is behind each button
you can just learn the basic principles of R-programming and
effectively solve complex problems using R-programming language
Download R
R can be downloaded from following link:
http://cran.r-project.org/
Project page: www.r-project.org
Kutergin A. High performance computing with R
34. General words about R
R software
R is free powerful software for data analysis and statistical computing. R
- console application with its own programming language running in
interpreter mode. Lack of sophisticated GUI provides a number of
advantages:
there is no need to learn which algorithm is behind each button
you can just learn the basic principles of R-programming and
effectively solve complex problems using R-programming language
Download R
R can be downloaded from following link:
http://cran.r-project.org/
Project page: www.r-project.org
Kutergin A. High performance computing with R
35. General words about R
R software
R is free powerful software for data analysis and statistical computing. R
- console application with its own programming language running in
interpreter mode. Lack of sophisticated GUI provides a number of
advantages:
there is no need to learn which algorithm is behind each button
you can just learn the basic principles of R-programming and
effectively solve complex problems using R-programming language
Download R
R can be downloaded from following link:
http://cran.r-project.org/
Project page: www.r-project.org
Kutergin A. High performance computing with R
36. General words about R
View of R work session
Kutergin A. High performance computing with R
37. General words about R
packages and information sources
There are two sources of happiness for R-programmer
Source of information Source of packages
Kutergin A. High performance computing with R
38. Motivation and scope
Motivation
Computers become more productive. Progress in computer’s
hardware and software is amazing. These computing power became
available even in a laptop
Constantly increasing growth of data’s volume and the complexity of
problems associated with data processing
The emergence of multi-core PCs and CUDA technology
Scope
We: simple students or not powerful guys. So we don’t have
supercomputer
We have Core i5 or Core i7 or another multi-core laptop or PC with
support of CUDA technology
We have some computational tasks and we want to solve them more
effectively
Kutergin A. High performance computing with R
39. Motivation and scope
Motivation
Computers become more productive. Progress in computer’s
hardware and software is amazing. These computing power became
available even in a laptop
Constantly increasing growth of data’s volume and the complexity of
problems associated with data processing
The emergence of multi-core PCs and CUDA technology
Scope
We: simple students or not powerful guys. So we don’t have
supercomputer
We have Core i5 or Core i7 or another multi-core laptop or PC with
support of CUDA technology
We have some computational tasks and we want to solve them more
effectively
Kutergin A. High performance computing with R
40. Motivation and scope
Motivation
Computers become more productive. Progress in computer’s
hardware and software is amazing. These computing power became
available even in a laptop
Constantly increasing growth of data’s volume and the complexity of
problems associated with data processing
The emergence of multi-core PCs and CUDA technology
Scope
We: simple students or not powerful guys. So we don’t have
supercomputer
We have Core i5 or Core i7 or another multi-core laptop or PC with
support of CUDA technology
We have some computational tasks and we want to solve them more
effectively
Kutergin A. High performance computing with R
41. Motivation and scope
Motivation
Computers become more productive. Progress in computer’s
hardware and software is amazing. These computing power became
available even in a laptop
Constantly increasing growth of data’s volume and the complexity of
problems associated with data processing
The emergence of multi-core PCs and CUDA technology
Scope
We: simple students or not powerful guys. So we don’t have
supercomputer
We have Core i5 or Core i7 or another multi-core laptop or PC with
support of CUDA technology
We have some computational tasks and we want to solve them more
effectively
Kutergin A. High performance computing with R
42. Motivation and scope
Motivation
Computers become more productive. Progress in computer’s
hardware and software is amazing. These computing power became
available even in a laptop
Constantly increasing growth of data’s volume and the complexity of
problems associated with data processing
The emergence of multi-core PCs and CUDA technology
Scope
We: simple students or not powerful guys. So we don’t have
supercomputer
We have Core i5 or Core i7 or another multi-core laptop or PC with
support of CUDA technology
We have some computational tasks and we want to solve them more
effectively
Kutergin A. High performance computing with R
43. Motivation and scope
Motivation
Computers become more productive. Progress in computer’s
hardware and software is amazing. These computing power became
available even in a laptop
Constantly increasing growth of data’s volume and the complexity of
problems associated with data processing
The emergence of multi-core PCs and CUDA technology
Scope
We: simple students or not powerful guys. So we don’t have
supercomputer
We have Core i5 or Core i7 or another multi-core laptop or PC with
support of CUDA technology
We have some computational tasks and we want to solve them more
effectively
Kutergin A. High performance computing with R
44. Motivation and scope
Motivation
Computers become more productive. Progress in computer’s
hardware and software is amazing. These computing power became
available even in a laptop
Constantly increasing growth of data’s volume and the complexity of
problems associated with data processing
The emergence of multi-core PCs and CUDA technology
Scope
We: simple students or not powerful guys. So we don’t have
supercomputer
We have Core i5 or Core i7 or another multi-core laptop or PC with
support of CUDA technology
We have some computational tasks and we want to solve them more
effectively
Kutergin A. High performance computing with R
45. Motivation and scope
Motivation
Computers become more productive. Progress in computer’s
hardware and software is amazing. These computing power became
available even in a laptop
Constantly increasing growth of data’s volume and the complexity of
problems associated with data processing
The emergence of multi-core PCs and CUDA technology
Scope
We: simple students or not powerful guys. So we don’t have
supercomputer
We have Core i5 or Core i7 or another multi-core laptop or PC with
support of CUDA technology
We have some computational tasks and we want to solve them more
effectively
Kutergin A. High performance computing with R
46. Motivation and scope
Motivation
Computers become more productive. Progress in computer’s
hardware and software is amazing. These computing power became
available even in a laptop
Constantly increasing growth of data’s volume and the complexity of
problems associated with data processing
The emergence of multi-core PCs and CUDA technology
Scope
We: simple students or not powerful guys. So we don’t have
supercomputer
We have Core i5 or Core i7 or another multi-core laptop or PC with
support of CUDA technology
We have some computational tasks and we want to solve them more
effectively
Kutergin A. High performance computing with R
47. Motivation and scope
Motivation
Computers become more productive. Progress in computer’s
hardware and software is amazing. These computing power became
available even in a laptop
Constantly increasing growth of data’s volume and the complexity of
problems associated with data processing
The emergence of multi-core PCs and CUDA technology
Scope
We: simple students or not powerful guys. So we don’t have
supercomputer
We have Core i5 or Core i7 or another multi-core laptop or PC with
support of CUDA technology
We have some computational tasks and we want to solve them more
effectively
Kutergin A. High performance computing with R
48. Motivation and scope
Motivation
Computers become more productive. Progress in computer’s
hardware and software is amazing. These computing power became
available even in a laptop
Constantly increasing growth of data’s volume and the complexity of
problems associated with data processing
The emergence of multi-core PCs and CUDA technology
Scope
We: simple students or not powerful guys. So we don’t have
supercomputer
We have Core i5 or Core i7 or another multi-core laptop or PC with
support of CUDA technology
We have some computational tasks and we want to solve them more
effectively
Kutergin A. High performance computing with R
49. Motivation and scope
Motivation
Computers become more productive. Progress in computer’s
hardware and software is amazing. These computing power became
available even in a laptop
Constantly increasing growth of data’s volume and the complexity of
problems associated with data processing
The emergence of multi-core PCs and CUDA technology
Scope
We: simple students or not powerful guys. So we don’t have
supercomputer
We have Core i5 or Core i7 or another multi-core laptop or PC with
support of CUDA technology
We have some computational tasks and we want to solve them more
effectively
Kutergin A. High performance computing with R
50. Motivation and scope
Motivation
Computers become more productive. Progress in computer’s
hardware and software is amazing. These computing power became
available even in a laptop
Constantly increasing growth of data’s volume and the complexity of
problems associated with data processing
The emergence of multi-core PCs and CUDA technology
Scope
We: simple students or not powerful guys. So we don’t have
supercomputer
We have Core i5 or Core i7 or another multi-core laptop or PC with
support of CUDA technology
We have some computational tasks and we want to solve them more
effectively
Kutergin A. High performance computing with R
51. Motivation and scope
Motivation
Computers become more productive. Progress in computer’s
hardware and software is amazing. These computing power became
available even in a laptop
Constantly increasing growth of data’s volume and the complexity of
problems associated with data processing
The emergence of multi-core PCs and CUDA technology
Scope
We: simple students or not powerful guys. So we don’t have
supercomputer
We have Core i5 or Core i7 or another multi-core laptop or PC with
support of CUDA technology
We have some computational tasks and we want to solve them more
effectively
Kutergin A. High performance computing with R
52. Motivation and scope
Motivation
Computers become more productive. Progress in computer’s
hardware and software is amazing. These computing power became
available even in a laptop
Constantly increasing growth of data’s volume and the complexity of
problems associated with data processing
The emergence of multi-core PCs and CUDA technology
Scope
We: simple students or not powerful guys. So we don’t have
supercomputer
We have Core i5 or Core i7 or another multi-core laptop or PC with
support of CUDA technology
We have some computational tasks and we want to solve them more
effectively
Kutergin A. High performance computing with R
53. The basic ways of speeding up the R-code
How to check time of code’s execution?
First way to check time of code execution
#return CPU (and other) times that expr used
s y s t e m . t i m e ()
s y s t e m . t i m e ( s u m ( r u n i f (10000000) ) )
Second way to check time of code execution
#determines how much real and CPU time (in seconds) the
currently running R process has already taken
p r o c . t i m e ()
s t a r t _ t i m e < - p r o c . t i m e ()
s u m ( r u n i f (10000000) )
e n d _ t i m e < - p r o c . t i m e () - s t a r t _ t i m e
Kutergin A. High performance computing with R
54. The basic ways of speeding up the R-code
How to check time of code’s execution?
First way to check time of code execution
#return CPU (and other) times that expr used
s y s t e m . t i m e ()
s y s t e m . t i m e ( s u m ( r u n i f (10000000) ) )
Second way to check time of code execution
#determines how much real and CPU time (in seconds) the
currently running R process has already taken
p r o c . t i m e ()
s t a r t _ t i m e < - p r o c . t i m e ()
s u m ( r u n i f (10000000) )
e n d _ t i m e < - p r o c . t i m e () - s t a r t _ t i m e
Kutergin A. High performance computing with R
55. The basic ways of speeding up the R-code
How to check time of code’s execution?
First way to check time of code execution
#return CPU (and other) times that expr used
s y s t e m . t i m e ()
s y s t e m . t i m e ( s u m ( r u n i f (10000000) ) )
Second way to check time of code execution
#determines how much real and CPU time (in seconds) the
currently running R process has already taken
p r o c . t i m e ()
s t a r t _ t i m e < - p r o c . t i m e ()
s u m ( r u n i f (10000000) )
e n d _ t i m e < - p r o c . t i m e () - s t a r t _ t i m e
Kutergin A. High performance computing with R
56. The basic ways of speeding up the R-code
How to check time of code’s execution?
First way to check time of code execution
#return CPU (and other) times that expr used
s y s t e m . t i m e ()
s y s t e m . t i m e ( s u m ( r u n i f (10000000) ) )
Second way to check time of code execution
#determines how much real and CPU time (in seconds) the
currently running R process has already taken
p r o c . t i m e ()
s t a r t _ t i m e < - p r o c . t i m e ()
s u m ( r u n i f (10000000) )
e n d _ t i m e < - p r o c . t i m e () - s t a r t _ t i m e
Kutergin A. High performance computing with R
57. The basic ways of speeding up the R-code
How to check time of code’s execution?
First way to check time of code execution
#return CPU (and other) times that expr used
s y s t e m . t i m e ()
s y s t e m . t i m e ( s u m ( r u n i f (10000000) ) )
Second way to check time of code execution
#determines how much real and CPU time (in seconds) the
currently running R process has already taken
p r o c . t i m e ()
s t a r t _ t i m e < - p r o c . t i m e ()
s u m ( r u n i f (10000000) )
e n d _ t i m e < - p r o c . t i m e () - s t a r t _ t i m e
Kutergin A. High performance computing with R
58. The basic ways of speeding up the R-code
How to check time of code’s execution?
First way to check time of code execution
#return CPU (and other) times that expr used
s y s t e m . t i m e ()
s y s t e m . t i m e ( s u m ( r u n i f (10000000) ) )
Second way to check time of code execution
#determines how much real and CPU time (in seconds) the
currently running R process has already taken
p r o c . t i m e ()
s t a r t _ t i m e < - p r o c . t i m e ()
s u m ( r u n i f (10000000) )
e n d _ t i m e < - p r o c . t i m e () - s t a r t _ t i m e
Kutergin A. High performance computing with R
59. The basic ways of speeding up the R-code
How to check time of code’s execution?
First way to check time of code execution
#return CPU (and other) times that expr used
s y s t e m . t i m e ()
s y s t e m . t i m e ( s u m ( r u n i f (10000000) ) )
Second way to check time of code execution
#determines how much real and CPU time (in seconds) the
currently running R process has already taken
p r o c . t i m e ()
s t a r t _ t i m e < - p r o c . t i m e ()
s u m ( r u n i f (10000000) )
e n d _ t i m e < - p r o c . t i m e () - s t a r t _ t i m e
Kutergin A. High performance computing with R
60. The basic ways of speeding up the R-code
Analysis of the effectiveness of programs
Function’s profile
Let us compare work of universal function lm() and more specific
function lm.fit()
#Loading some dataset
d a t a ( longley )
#Recording profile to file lm.out
Rprof ( " l m . o u t " )
#Runnig lm() 1000 times
i n v i s i b l e ( r e p l i c a t e (1000 , l m ( Employed ~ . -1 , d a t a
= longley ) ) )
#Switch off profiling
Rprof ( NULL )
Kutergin A. High performance computing with R
61. The basic ways of speeding up the R-code
Analysis of the effectiveness of programs
#Preparing data for lm.fit()
longleydm < - d a t a . m a t r i x ( d a t a . f r a m e ( longley ) )
#Recording profile to file lm.fit.out
Rprof ( " l m . f i t . o u t " )
#Runnig lm.fit() 1000 times
i n v i s i b l e ( r e p l i c a t e (1000 ,
l m . fit ( longleydm [ , -7] , longleydm [ ,7]) ) )
#Switch off profiling
Rprof ( NULL )
#Results of profiling
summaryRprof ( " l m . o u t " ) $ sampling . t i m e
[1] 3.12
summaryRprof ( " l m . f i t . o u t " ) $ sampling . t i m e
[1] 0.18
#What a difference!
Kutergin A. High performance computing with R
62. The basic ways of speeding up the R-code
Analysis of the effectiveness of programs
Package profr
This package allows you to visualize the results of profiling
library (" profr ")
p l o t ( p a r s e _ rprof ( " l m . o u t " ) , main = " P r o f i l e ␣ o f ␣
lm () ")
p l o t ( p a r s e _ rprof ( " l m . f i t . o u t " ) , main = " P r o f i l e ␣
of ␣ lm . fit () ")
Package proftools
This package allows you to visualize call graph for a function
l i b r a r y (" R g r a p h v i z "); l i b r a r y (" p r o f t o o l s ")
lmfitprod < - readProfileData ( " l m . f i t . o u t " )
pl o t P r o f i l e C al l Gr a p h ( lmfitprod )
Kutergin A. High performance computing with R
63. The basic ways of speeding up the R-code
Analysis of the effectiveness of programs
Kutergin A. High performance computing with R
64. The basic ways of speeding up the R-code
Analysis of the effectiveness of programs
Сall graph
Kutergin A. High performance computing with R
65. The basic ways of speeding up the R-code
Analysis of the effectiveness of programs
Another example of profiling:
its = 2500; d i m = 1750
X = m a t r i x ( r n o r m ( its * d i m ) ,its , d i m )
my . cross . p r o d < - f u n c t i o n ( X )
{
C = m a t r i x (0 , n c o l ( X ) , n c o l ( X ) )
f o r ( i in 1: n r o w ( X ) )
{
C = C + X [i ,] % o % X [i ,]
}
return (C)
}
l i b r a r y ( proftools )
C = my . cross . p r o d ( X )
C1 = t ( X ) % * % X
C2 = c r o s s p r o d ( X )
Rprof ( NULL )
p r i n t ( a l l . e q u a l ( C , C1 , C2 ) )
Kutergin A. High performance computing with R
66. The basic ways of speeding up the R-code
Analysis of the effectiveness of programs
Result:
l i b r a r y ( proftools )
profile . data <-
readProfileData ( " m a t r i x - m u l t . o u t " )
flatProfile ( p r o f i l e . d a t a )
/ total . pct total . t i m e self . pct self . t i m e
my . cross . p r o d 87.31 88.36 0.04 0.04
+ 49.84 50.44 49.84 50.44
%o% 37.37 37.82 0.00 0.00
outer 37.37 37.82 37.27 37.72
%*% 7.75 7.84 7.75 7.84
crossprod 4.86 4.92 4.86 4.92
t 0.16 0.16 0.06 0.06
t. default 0.10 0.10 0.10 0.10
matrix 0.06 0.06 0.06 0.06
as . vector 0.02 0.02 0.02 0.02
Kutergin A. High performance computing with R
67. The basic ways of speeding up the R-code
Vectorization of code
Note!
Loops in R are slow! You can speed up your code by using operation with
vectors and matrix. It’s another style of programming, but you have to
use it!
#Simple example of vectorization:
#component-wise addition of two vectors
#Generating some random data
#First vector
a < - r n o r m ( n = 10000000)
#Second vector
b < - r n o r m ( n = 10000000)
#Vector for result
x < - r e p (0 , l e n g t h ( a ) )
Kutergin A. High performance computing with R
68. The basic ways of speeding up the R-code
Vectorization of code
So, what about results?
#Slow way
time _1 <- system . time
(
f o r ( i in 1: l e n g t h ( a ) )
{
x [ i ] < - a [ i ]+ b [ i ]
}
) ; t i m e _ 1[3]
36.97
#Fast way
t i m e _ 2 < - s y s t e m . t i m e ( x < - a + b ) ; t i m e _ 2[3]
0.04
Acceleration < - t i m e _ 1[3] / t i m e _ 2[3]
Acceleration
924.25
#That’s hot!!!!
Kutergin A. High performance computing with R
69. The basic ways of speeding up the R-code
Using magic of linear algebra
Using linear algebra operations
#Scalar product
#Slow way
s t a r t < - p r o c . t i m e ()
res < - 0
f o r ( i in 1: l e n g t h ( a ) )
{
res < - res + a [ i ] * b [ i ]
}
e n d < - p r o c . t i m e () - s t a r t ; e n d [3]
16.71
#Fast
s y s t e m . t i m e ( a % * % b ) [3]
0.09
#Even faster...
s y s t e m . t i m e ( s u m ( a * b ) ) [3]
0.08
Kutergin A. High performance computing with R
70. The basic ways of speeding up the R-code
Using magic of linear algebra
Using linear algebra operations
#Matrix multiplication slow version
its < - 2500; d i m < - 1750;
X < - m a t r i x ( r n o r m ( its * d i m ) ,its , d i m )
X _ transp < - t ( X )
res < - a r r a y ( NA , d i m = c (1750 , 1750) )
s t a r t < - p r o c . t i m e ()
f o r ( i in 1: n r o w ( X _ transp ) )
{
f o r ( j in 1: n c o l ( X ) )
{
res [i , j ] < - s u m ( X _ transp [i ,] * X [ , j ])
}
}
e n d < - p r o c . t i m e () - s t a r t ; e n d [3]
221.67
Kutergin A. High performance computing with R
71. The basic ways of speeding up the R-code
Using magic of linear algebra
Package BLAS
BLAS means: Basic Linear Algebra Subprogram. This package contains
the optimized algorithms for linear algebra operations and uses all cores
of multi-core machine automatically.
#Matrix multiplication fast version
#BLAS matrix mult
s y s t e m . t i m e ( X _ transp % * % X ) [3]
7.77
#Even faster...
s y s t e m . t i m e ( c r o s s p r o d ( X ) ) [3]
4.98
Kutergin A. High performance computing with R
72. The basic ways of speeding up the R-code
Using build-in R-functions
Package base
You can find full list of build-in R-function in the documentation for this
package
#Let us define a function
mySum < - f u n c t i o n ( N )
{
sumVal < - 0
f o r ( i in 1: N ) { sumVal < - sumVal + i }
r e t u r n ( sumVal )
}
s y s t e m . t i m e ( mySum (1000000) ) [3]
0.62
s y s t e m . t i m e ( s u m ( a s . n u m e r i c ( s e q (1 , 1000000) ) ) ) [3]
0.05
Kutergin A. High performance computing with R
73. The basic ways of speeding up the R-code
Using build-in R-functions
Why are build R-functions faster?
R programming language works in interpreter mode. This is always slowly
than using the compiled code. So, when you call build-in R-function, you
call optimized and compiled code. Also build-in functions are written in
more low-level programming language (like C/C++ or FORTRAN) and
this provides greater access to the capabilities of the hardware
Note!
You can select data from vector, matrix, data.frame or array using some
condition that applies to row or column of data object. It’s fast and
convenient
#Extracting only positive values from first column of X
its < - 2500; d i m < - 1750;
X < - m a t r i x ( r n o r m ( its * d i m ) ,its , d i m )
X [ X [ ,1] >0 , 1]
Kutergin A. High performance computing with R
74. The special way of speeding up the R-code
Package pnmath
Another easy way to get a speed-up is to use the pnmath package in R.
This package takes many of the standard math functions in R and
replaces them with multi-threaded versions, using OpenMP. Some
functions get more of a speed-up than others with pnmath.
#Generating random data
v1 < - r u n i f (1000)
v2 < - r u n i f (100000000)
#Time of execution without pnmath
s y s t e m . t i m e ( q t u k e y ( v1 ,2 ,3) )
s y s t e m . t i m e ( e x p ( v2 ) )
s y s t e m . t i m e ( s q r t ( v2 ) )
#Time of execution with pnmath
l i b r a r y ( pnmath )
s y s t e m . t i m e ( q t u k e y ( v1 ,2 ,3) )
s y s t e m . t i m e ( e x p ( v2 ) )
s y s t e m . t i m e ( s q r t ( v2 ) )
Kutergin A. High performance computing with R
75. Problem of data splitting
Our problem:
Before you start the calculation you need to split your data set according
the number of threads. Another reason is more effective data processing
in loops
Package iterator
The iterators package provides tools for iterating over various R data
structures. Iterators are available for vectors, lists, matrices, arrays, data
frames and files. By following very simple conventions, new iterators can
be written to support any type of data source, such as database queries
or dynamically generating data
Download
You can download this useful package from CRAN (available for
Windows!): http:
//cran.r-project.org/web/packages/iterators/index.html
Kutergin A. High performance computing with R
76. Problem of data splitting
Our problem:
Before you start the calculation you need to split your data set according
the number of threads. Another reason is more effective data processing
in loops
Package iterator
The iterators package provides tools for iterating over various R data
structures. Iterators are available for vectors, lists, matrices, arrays, data
frames and files. By following very simple conventions, new iterators can
be written to support any type of data source, such as database queries
or dynamically generating data
Download
You can download this useful package from CRAN (available for
Windows!): http:
//cran.r-project.org/web/packages/iterators/index.html
Kutergin A. High performance computing with R
77. Problem of data splitting
Our problem:
Before you start the calculation you need to split your data set according
the number of threads. Another reason is more effective data processing
in loops
Package iterator
The iterators package provides tools for iterating over various R data
structures. Iterators are available for vectors, lists, matrices, arrays, data
frames and files. By following very simple conventions, new iterators can
be written to support any type of data source, such as database queries
or dynamically generating data
Download
You can download this useful package from CRAN (available for
Windows!): http:
//cran.r-project.org/web/packages/iterators/index.html
Kutergin A. High performance computing with R
78. Problem of data splitting
Our problem:
Before you start the calculation you need to split your data set according
the number of threads. Another reason is more effective data processing
in loops
Package iterator
The iterators package provides tools for iterating over various R data
structures. Iterators are available for vectors, lists, matrices, arrays, data
frames and files. By following very simple conventions, new iterators can
be written to support any type of data source, such as database queries
or dynamically generating data
Download
You can download this useful package from CRAN (available for
Windows!): http:
//cran.r-project.org/web/packages/iterators/index.html
Kutergin A. High performance computing with R
79. Problem of data splitting: package iterators
Capabilities
icount(count)
This method returns the iterator that counts starting from one. Count -
number of times that iterator will be fire. If not specified, it will count
forever
nextElem()
This function returns next value of pre-define iterator. When the iterator
has no more values, it calls stop with massage "StopIteration"
l i b r a r y ( iterators )
#create an iterator that counts from 1 to 3.
it < - icount (2)
nextElem ( it )
Example: [1] 1
nextElem ( it )
[1] 2
t r y ( nextElem ( it ) ) # expect a StopIteration exception
Error : StopIteration
Kutergin A. High performance computing with R
80. Problem of data splitting: package iterators
Capabilities
You can create iterators by rows of your data structure using iter()
function:
l i b r a r y ( iterators )
#Creating iterator by rows of data set
irState < - iter ( state . x77 , b y = " r o w " )
nextElem ( irState )
Population Income Illiteracy Life Murder Area
Alabama 3615 3624 2.1 69.05 15.1 50708
nextElem ( irState )
Population Income Illiteracy Life Murder Area
Alaska 365 6315 1.5 69.31 11.3 566432
nextElem ( irState )
Population Income Illiteracy Life Murder Area
Arizona 2212 4530 1.8 70.55 7.8 113417
Kutergin A. High performance computing with R
81. Problem of data splitting: package iterators
Capabilities
You can create iterators by columns of your data structure using iter()
#Creating iterator by columns of data set
icState < - iter ( state . x77 , b y = " c o l " )
nextElem ( icState )
Population
Alabama 3615
Alaska 365
Arizona 2212
nextElem ( icState )
function: Illiteracy
Alabama 2.1
Alaska 1.5
Arizona 1.8
nextElem ( icState )
Income
Alabama 3624
Alaska 6315
Arizona 4530
Kutergin A. High performance computing with R
82. Problem of data splitting: package iterators
Capabilities
You can create iterators using iter() function from data object returned
by some other function:
l i b r a r y ( iterators )
#Define a function, wich generate random data
GetDataStructure < - f u n c t i o n ( meanVal1 , meanVal2 ,
sdVal1 , sdVal2 )
{
a < - r n o r m (4 , m e a n = meanVal1 , s d = sdVal1 )
b < - r n o r m (4 , m e a n = meanVal2 , s d = sdVal2 )
data <- a%o%b
return ( data )
}
ifun < - iter ( GetDataStructure (25 ,27 ,2.5 ,3.5) , b y = " r o w " )
nextElem ( ifun ) ; nextElem ( ifun )
[ ,1] [ ,2] [ ,3] [ ,4]
[1 ,] 701.7055 939.6574 764.7724 799.6965
[ ,1] [ ,2] [ ,3] [ ,4]
[1 ,] 647.6349 867.2512 705.8422 738.0752
Kutergin A. High performance computing with R
83. Problem of data splitting: package iterators
Capabilities
idiv(n, chunk, chunksize)
This is more interesting iterator. It provides the ability to divide a
numeric value into pieces
n - number of times that iterator will fire. If not specified, it will
count forever
chunks - the number of pieces that n should be divided into. It
useful when you know the number of pieces that you want. If
specified, the chunkSize should not be
chunkSize - the maximum size of the pieces, that n should be
divided into. It is useful when you know the size of the pieces that
you want. If specified, the chunk should not be
Some thoughts...
However, practical application of this iterator is unclear. Perhaps it can
be used to index vector or rows/columns of arrays
Kutergin A. High performance computing with R
84. Problem of data splitting: package iterators
Capabilities
idiv(n, chunk, chunksize)
This is more interesting iterator. It provides the ability to divide a
numeric value into pieces
n - number of times that iterator will fire. If not specified, it will
count forever
chunks - the number of pieces that n should be divided into. It
useful when you know the number of pieces that you want. If
specified, the chunkSize should not be
chunkSize - the maximum size of the pieces, that n should be
divided into. It is useful when you know the size of the pieces that
you want. If specified, the chunk should not be
Some thoughts...
However, practical application of this iterator is unclear. Perhaps it can
be used to index vector or rows/columns of arrays
Kutergin A. High performance computing with R
85. Problem of data splitting: package iterators
Capabilities
idiv(n, chunk, chunksize)
This is more interesting iterator. It provides the ability to divide a
numeric value into pieces
n - number of times that iterator will fire. If not specified, it will
count forever
chunks - the number of pieces that n should be divided into. It
useful when you know the number of pieces that you want. If
specified, the chunkSize should not be
chunkSize - the maximum size of the pieces, that n should be
divided into. It is useful when you know the size of the pieces that
you want. If specified, the chunk should not be
Some thoughts...
However, practical application of this iterator is unclear. Perhaps it can
be used to index vector or rows/columns of arrays
Kutergin A. High performance computing with R
86. Problem of data splitting: package iterators
Capabilities
idiv(n, chunk, chunksize)
This is more interesting iterator. It provides the ability to divide a
numeric value into pieces
n - number of times that iterator will fire. If not specified, it will
count forever
chunks - the number of pieces that n should be divided into. It
useful when you know the number of pieces that you want. If
specified, the chunkSize should not be
chunkSize - the maximum size of the pieces, that n should be
divided into. It is useful when you know the size of the pieces that
you want. If specified, the chunk should not be
Some thoughts...
However, practical application of this iterator is unclear. Perhaps it can
be used to index vector or rows/columns of arrays
Kutergin A. High performance computing with R
87. Problem of data splitting: package iterators
Capabilities
idiv(n, chunk, chunksize)
This is more interesting iterator. It provides the ability to divide a
numeric value into pieces
n - number of times that iterator will fire. If not specified, it will
count forever
chunks - the number of pieces that n should be divided into. It
useful when you know the number of pieces that you want. If
specified, the chunkSize should not be
chunkSize - the maximum size of the pieces, that n should be
divided into. It is useful when you know the size of the pieces that
you want. If specified, the chunk should not be
Some thoughts...
However, practical application of this iterator is unclear. Perhaps it can
be used to index vector or rows/columns of arrays
Kutergin A. High performance computing with R
88. Problem of data splitting: package iterators
Capabilities
idiv(n, chunk, chunksize)
This is more interesting iterator. It provides the ability to divide a
numeric value into pieces
n - number of times that iterator will fire. If not specified, it will
count forever
chunks - the number of pieces that n should be divided into. It
useful when you know the number of pieces that you want. If
specified, the chunkSize should not be
chunkSize - the maximum size of the pieces, that n should be
divided into. It is useful when you know the size of the pieces that
you want. If specified, the chunk should not be
Some thoughts...
However, practical application of this iterator is unclear. Perhaps it can
be used to index vector or rows/columns of arrays
Kutergin A. High performance computing with R
89. Problem of data splitting: package iterators
Capabilities
idiv(n, chunk, chunksize)
This is more interesting iterator. It provides the ability to divide a
numeric value into pieces
n - number of times that iterator will fire. If not specified, it will
count forever
chunks - the number of pieces that n should be divided into. It
useful when you know the number of pieces that you want. If
specified, the chunkSize should not be
chunkSize - the maximum size of the pieces, that n should be
divided into. It is useful when you know the size of the pieces that
you want. If specified, the chunk should not be
Some thoughts...
However, practical application of this iterator is unclear. Perhaps it can
be used to index vector or rows/columns of arrays
Kutergin A. High performance computing with R
90. Problem of data splitting: package iterators
Capabilities
idiv(n, chunk, chunksize)
This is more interesting iterator. It provides the ability to divide a
numeric value into pieces
n - number of times that iterator will fire. If not specified, it will
count forever
chunks - the number of pieces that n should be divided into. It
useful when you know the number of pieces that you want. If
specified, the chunkSize should not be
chunkSize - the maximum size of the pieces, that n should be
divided into. It is useful when you know the size of the pieces that
you want. If specified, the chunk should not be
Some thoughts...
However, practical application of this iterator is unclear. Perhaps it can
be used to index vector or rows/columns of arrays
Kutergin A. High performance computing with R
91. Problem of data splitting: package iterators
Capabilities
idiv(n, chunk, chunksize)
This is more interesting iterator. It provides the ability to divide a
numeric value into pieces
n - number of times that iterator will fire. If not specified, it will
count forever
chunks - the number of pieces that n should be divided into. It
useful when you know the number of pieces that you want. If
specified, the chunkSize should not be
chunkSize - the maximum size of the pieces, that n should be
divided into. It is useful when you know the size of the pieces that
you want. If specified, the chunk should not be
Some thoughts...
However, practical application of this iterator is unclear. Perhaps it can
be used to index vector or rows/columns of arrays
Kutergin A. High performance computing with R
92. Problem of data splitting: package iterators
Capabilities
idiv(n, chunk, chunksize)
This is more interesting iterator. It provides the ability to divide a
numeric value into pieces
n - number of times that iterator will fire. If not specified, it will
count forever
chunks - the number of pieces that n should be divided into. It
useful when you know the number of pieces that you want. If
specified, the chunkSize should not be
chunkSize - the maximum size of the pieces, that n should be
divided into. It is useful when you know the size of the pieces that
you want. If specified, the chunk should not be
Some thoughts...
However, practical application of this iterator is unclear. Perhaps it can
be used to index vector or rows/columns of arrays
Kutergin A. High performance computing with R
93. Problem of data splitting: package iterators
Capabilities
Example:
l i b r a r y ( iterators )
# divide the value 10 into 3 pieces
it < - idiv (10 , chunks =3)
nextElem ( it )
[1] 4
nextElem ( it )
[1] 3
nextElem ( it )
[1] 3
t r y ( nextElem ( it ) ) # expect a StopIteration exception
Error : StopIteration
Kutergin A. High performance computing with R
94. Problem of data splitting: package iterators
Capabilities
Example:
l i b r a r y ( iterators )
# divide the value 10 into pieces no larger than 3
it < - idiv (10 , chunkSize =3)
nextElem ( it )
[1] 3
nextElem ( it )
[1] 3
nextElem ( it )
[1] 2
nextElem ( it )
[1] 2
t r y ( nextElem ( it ) ) # expect a StopIteration exception
Error : StopIteration
Kutergin A. High performance computing with R
95. Problem of data splitting: package iterators
Capabilities
iread.table(file,...., verbose = FALSE)
This is very important iterator. It returns an iterator object over the rows
of the data frame stored in a file in table format
file - the name of the file to read data from
... - all additional arguments are passed on to the read.table
function. See the documentation for read.table for more information
verbose - logical flag indicating whether or not to print the calls to
read.table
Note!
In this version of iread.table, both the read.table arguments header and
row.names must be specified. This is because the default values of this
arguments depend on the contents of the beginning of the file. In order
to make the subsequent calls to read.table work consistently, the user
must specified those arguments explicitly
Kutergin A. High performance computing with R
96. Problem of data splitting: package iterators
Capabilities
iread.table(file,...., verbose = FALSE)
This is very important iterator. It returns an iterator object over the rows
of the data frame stored in a file in table format
file - the name of the file to read data from
... - all additional arguments are passed on to the read.table
function. See the documentation for read.table for more information
verbose - logical flag indicating whether or not to print the calls to
read.table
Note!
In this version of iread.table, both the read.table arguments header and
row.names must be specified. This is because the default values of this
arguments depend on the contents of the beginning of the file. In order
to make the subsequent calls to read.table work consistently, the user
must specified those arguments explicitly
Kutergin A. High performance computing with R
97. Problem of data splitting: package iterators
Capabilities
iread.table(file,...., verbose = FALSE)
This is very important iterator. It returns an iterator object over the rows
of the data frame stored in a file in table format
file - the name of the file to read data from
... - all additional arguments are passed on to the read.table
function. See the documentation for read.table for more information
verbose - logical flag indicating whether or not to print the calls to
read.table
Note!
In this version of iread.table, both the read.table arguments header and
row.names must be specified. This is because the default values of this
arguments depend on the contents of the beginning of the file. In order
to make the subsequent calls to read.table work consistently, the user
must specified those arguments explicitly
Kutergin A. High performance computing with R
98. Problem of data splitting: package iterators
Capabilities
iread.table(file,...., verbose = FALSE)
This is very important iterator. It returns an iterator object over the rows
of the data frame stored in a file in table format
file - the name of the file to read data from
... - all additional arguments are passed on to the read.table
function. See the documentation for read.table for more information
verbose - logical flag indicating whether or not to print the calls to
read.table
Note!
In this version of iread.table, both the read.table arguments header and
row.names must be specified. This is because the default values of this
arguments depend on the contents of the beginning of the file. In order
to make the subsequent calls to read.table work consistently, the user
must specified those arguments explicitly
Kutergin A. High performance computing with R
99. Problem of data splitting: package iterators
Capabilities
iread.table(file,...., verbose = FALSE)
This is very important iterator. It returns an iterator object over the rows
of the data frame stored in a file in table format
file - the name of the file to read data from
... - all additional arguments are passed on to the read.table
function. See the documentation for read.table for more information
verbose - logical flag indicating whether or not to print the calls to
read.table
Note!
In this version of iread.table, both the read.table arguments header and
row.names must be specified. This is because the default values of this
arguments depend on the contents of the beginning of the file. In order
to make the subsequent calls to read.table work consistently, the user
must specified those arguments explicitly
Kutergin A. High performance computing with R
100. Problem of data splitting: package iterators
Capabilities
iread.table(file,...., verbose = FALSE)
This is very important iterator. It returns an iterator object over the rows
of the data frame stored in a file in table format
file - the name of the file to read data from
... - all additional arguments are passed on to the read.table
function. See the documentation for read.table for more information
verbose - logical flag indicating whether or not to print the calls to
read.table
Note!
In this version of iread.table, both the read.table arguments header and
row.names must be specified. This is because the default values of this
arguments depend on the contents of the beginning of the file. In order
to make the subsequent calls to read.table work consistently, the user
must specified those arguments explicitly
Kutergin A. High performance computing with R
101. Problem of data splitting: package iterators
Capabilities
iread.table(file,...., verbose = FALSE)
This is very important iterator. It returns an iterator object over the rows
of the data frame stored in a file in table format
file - the name of the file to read data from
... - all additional arguments are passed on to the read.table
function. See the documentation for read.table for more information
verbose - logical flag indicating whether or not to print the calls to
read.table
Note!
In this version of iread.table, both the read.table arguments header and
row.names must be specified. This is because the default values of this
arguments depend on the contents of the beginning of the file. In order
to make the subsequent calls to read.table work consistently, the user
must specified those arguments explicitly
Kutergin A. High performance computing with R
102. Problem of data splitting: package iterators
Capabilities
iread.table(file,...., verbose = FALSE)
This is very important iterator. It returns an iterator object over the rows
of the data frame stored in a file in table format
file - the name of the file to read data from
... - all additional arguments are passed on to the read.table
function. See the documentation for read.table for more information
verbose - logical flag indicating whether or not to print the calls to
read.table
Note!
In this version of iread.table, both the read.table arguments header and
row.names must be specified. This is because the default values of this
arguments depend on the contents of the beginning of the file. In order
to make the subsequent calls to read.table work consistently, the user
must specified those arguments explicitly
Kutergin A. High performance computing with R
103. Problem of data splitting: package iterators
Capabilities
iread.table(file,...., verbose = FALSE)
This is very important iterator. It returns an iterator object over the rows
of the data frame stored in a file in table format
file - the name of the file to read data from
... - all additional arguments are passed on to the read.table
function. See the documentation for read.table for more information
verbose - logical flag indicating whether or not to print the calls to
read.table
Note!
In this version of iread.table, both the read.table arguments header and
row.names must be specified. This is because the default values of this
arguments depend on the contents of the beginning of the file. In order
to make the subsequent calls to read.table work consistently, the user
must specified those arguments explicitly
Kutergin A. High performance computing with R
104. Problem of data splitting: package iterators
Capabilities
iread.table(file,...., verbose = FALSE)
This is very important iterator. It returns an iterator object over the rows
of the data frame stored in a file in table format
file - the name of the file to read data from
... - all additional arguments are passed on to the read.table
function. See the documentation for read.table for more information
verbose - logical flag indicating whether or not to print the calls to
read.table
Note!
In this version of iread.table, both the read.table arguments header and
row.names must be specified. This is because the default values of this
arguments depend on the contents of the beginning of the file. In order
to make the subsequent calls to read.table work consistently, the user
must specified those arguments explicitly
Kutergin A. High performance computing with R
105. Problem of data splitting: package iterators
Capabilities
Example:
l i b r a r y ( iterators )
#Gnerating random data
its < - 2000000; d i m < - 3;
d a t a < - m a t r i x ( r n o r m ( its * d i m ) ,its , d i m )
#Writing them to HDD
DATA _ PATH < - " E : / R _ w o r k s / d a t a . t x t "
#Size of this file - 123 Mb
w r i t e . t a b l e ( d a t a , f i l e = DATA _ PATH ,
a p p e n d = FALSE , sep = " t " , dec = " . " )
Kutergin A. High performance computing with R
106. Problem of data splitting: package iterators
Capabilities
#Creating an iterator from these file
ifile < - iread . t a b l e ( DATA _ PATH , header = TRUE ,
r o w . n a m e s = NULL , verbose = FALSE )
row . names V1 V2 V3
1 1 -1.042623 -1.386382 0.399798
> nextElem ( ifile )
row . names V1 V2 V3
1 2 0.8841238 -1.296501 0.1580505
> nextElem ( ifile )
row . names V1 V2 V3
1 3 -0.3195784 -0.6830442 0.3647958
#It works very fast!!!!
#remove the file
f i l e . r e m o v e ( DATA _ PATH )
Kutergin A. High performance computing with R
107. Problem of data splitting: package iterators
Capabilities
isplit(x, f, drop = FALSE)
Another important type of iterator. It returns the the iterator that divides
the data in the vector x into the groups define by f
x - vector or data frame of values to be split into groups
f - a factor or list of factors used to categorize x
drop - logical indicating if levels that do not occur should be
dropped More detailed information you can find in documentation
Note!
This is very useful! For example, you have data-vector and vector
containing values of the factor corresponding these data. Factor has
pre-defined levels. Thus, you can extract data in loop for each of the
levels of the factor without additional operations. Also you can define in
loop’s body some conditions for each level of the factor and use this
condition as a condition for if() control structures
Kutergin A. High performance computing with R
108. Problem of data splitting: package iterators
Capabilities
isplit(x, f, drop = FALSE)
Another important type of iterator. It returns the the iterator that divides
the data in the vector x into the groups define by f
x - vector or data frame of values to be split into groups
f - a factor or list of factors used to categorize x
drop - logical indicating if levels that do not occur should be
dropped More detailed information you can find in documentation
Note!
This is very useful! For example, you have data-vector and vector
containing values of the factor corresponding these data. Factor has
pre-defined levels. Thus, you can extract data in loop for each of the
levels of the factor without additional operations. Also you can define in
loop’s body some conditions for each level of the factor and use this
condition as a condition for if() control structures
Kutergin A. High performance computing with R
109. Problem of data splitting: package iterators
Capabilities
isplit(x, f, drop = FALSE)
Another important type of iterator. It returns the the iterator that divides
the data in the vector x into the groups define by f
x - vector or data frame of values to be split into groups
f - a factor or list of factors used to categorize x
drop - logical indicating if levels that do not occur should be
dropped More detailed information you can find in documentation
Note!
This is very useful! For example, you have data-vector and vector
containing values of the factor corresponding these data. Factor has
pre-defined levels. Thus, you can extract data in loop for each of the
levels of the factor without additional operations. Also you can define in
loop’s body some conditions for each level of the factor and use this
condition as a condition for if() control structures
Kutergin A. High performance computing with R
110. Problem of data splitting: package iterators
Capabilities
isplit(x, f, drop = FALSE)
Another important type of iterator. It returns the the iterator that divides
the data in the vector x into the groups define by f
x - vector or data frame of values to be split into groups
f - a factor or list of factors used to categorize x
drop - logical indicating if levels that do not occur should be
dropped More detailed information you can find in documentation
Note!
This is very useful! For example, you have data-vector and vector
containing values of the factor corresponding these data. Factor has
pre-defined levels. Thus, you can extract data in loop for each of the
levels of the factor without additional operations. Also you can define in
loop’s body some conditions for each level of the factor and use this
condition as a condition for if() control structures
Kutergin A. High performance computing with R
111. Problem of data splitting: package iterators
Capabilities
isplit(x, f, drop = FALSE)
Another important type of iterator. It returns the the iterator that divides
the data in the vector x into the groups define by f
x - vector or data frame of values to be split into groups
f - a factor or list of factors used to categorize x
drop - logical indicating if levels that do not occur should be
dropped More detailed information you can find in documentation
Note!
This is very useful! For example, you have data-vector and vector
containing values of the factor corresponding these data. Factor has
pre-defined levels. Thus, you can extract data in loop for each of the
levels of the factor without additional operations. Also you can define in
loop’s body some conditions for each level of the factor and use this
condition as a condition for if() control structures
Kutergin A. High performance computing with R
112. Problem of data splitting: package iterators
Capabilities
isplit(x, f, drop = FALSE)
Another important type of iterator. It returns the the iterator that divides
the data in the vector x into the groups define by f
x - vector or data frame of values to be split into groups
f - a factor or list of factors used to categorize x
drop - logical indicating if levels that do not occur should be
dropped More detailed information you can find in documentation
Note!
This is very useful! For example, you have data-vector and vector
containing values of the factor corresponding these data. Factor has
pre-defined levels. Thus, you can extract data in loop for each of the
levels of the factor without additional operations. Also you can define in
loop’s body some conditions for each level of the factor and use this
condition as a condition for if() control structures
Kutergin A. High performance computing with R
113. Problem of data splitting: package iterators
Capabilities
isplit(x, f, drop = FALSE)
Another important type of iterator. It returns the the iterator that divides
the data in the vector x into the groups define by f
x - vector or data frame of values to be split into groups
f - a factor or list of factors used to categorize x
drop - logical indicating if levels that do not occur should be
dropped More detailed information you can find in documentation
Note!
This is very useful! For example, you have data-vector and vector
containing values of the factor corresponding these data. Factor has
pre-defined levels. Thus, you can extract data in loop for each of the
levels of the factor without additional operations. Also you can define in
loop’s body some conditions for each level of the factor and use this
condition as a condition for if() control structures
Kutergin A. High performance computing with R
114. Problem of data splitting: package iterators
Capabilities
isplit(x, f, drop = FALSE)
Another important type of iterator. It returns the the iterator that divides
the data in the vector x into the groups define by f
x - vector or data frame of values to be split into groups
f - a factor or list of factors used to categorize x
drop - logical indicating if levels that do not occur should be
dropped More detailed information you can find in documentation
Note!
This is very useful! For example, you have data-vector and vector
containing values of the factor corresponding these data. Factor has
pre-defined levels. Thus, you can extract data in loop for each of the
levels of the factor without additional operations. Also you can define in
loop’s body some conditions for each level of the factor and use this
condition as a condition for if() control structures
Kutergin A. High performance computing with R
115. Problem of data splitting: package iterators
Capabilities
isplit(x, f, drop = FALSE)
Another important type of iterator. It returns the the iterator that divides
the data in the vector x into the groups define by f
x - vector or data frame of values to be split into groups
f - a factor or list of factors used to categorize x
drop - logical indicating if levels that do not occur should be
dropped More detailed information you can find in documentation
Note!
This is very useful! For example, you have data-vector and vector
containing values of the factor corresponding these data. Factor has
pre-defined levels. Thus, you can extract data in loop for each of the
levels of the factor without additional operations. Also you can define in
loop’s body some conditions for each level of the factor and use this
condition as a condition for if() control structures
Kutergin A. High performance computing with R
116. Problem of data splitting: package iterators
Capabilities
isplit(x, f, drop = FALSE)
Another important type of iterator. It returns the the iterator that divides
the data in the vector x into the groups define by f
x - vector or data frame of values to be split into groups
f - a factor or list of factors used to categorize x
drop - logical indicating if levels that do not occur should be
dropped More detailed information you can find in documentation
Note!
This is very useful! For example, you have data-vector and vector
containing values of the factor corresponding these data. Factor has
pre-defined levels. Thus, you can extract data in loop for each of the
levels of the factor without additional operations. Also you can define in
loop’s body some conditions for each level of the factor and use this
condition as a condition for if() control structures
Kutergin A. High performance computing with R
117. Problem of data splitting: package iterators
Capabilities
x < - r n o r m (200)
f < - f a c t o r ( s a m p l e (1:10 , l e n g t h ( x ) ,
r e p l a c e = TRUE ) )
it < - isplit (x , f )
nextElem ( it )
$ value
[1] 0.14087878 -0.94439161 0.13593045
[4] -0.25732860 0.09422130 -0.55166303
[7] -0.18325419 -0.00871019 0.38344388
[10] -1.05761926 1.16126462 -0.02280205
[13] -0.67338941 1.68724264 0.92112983
[16] 1.39782337 -0.51060989
$ key
$ key [[1]]
[1] " 1 "
Kutergin A. High performance computing with R
118. Problem of data splitting: package iterators
Capabilities
Special types of iterators
Also there are special types of iterators. Like: irnorm(..., cont) or
irunif(..., count). These function returns an iterator that return random
number of various distributions. Each one is a wrapper around a standard
R function
count - number of times that the iterator will fire. If not specified, it
will fire values forever
... - arguments to pass to the underling rnorm function
Example:
# create an iterator that returns three random numbers
it < - irnorm (1 , c o u n t =2)
nextElem ( it ) ; nextElem ( it )
[1] 0.1592311
[1] -1.387449
t r y ( nextElem ( it ) ) # expect a StopIteration exception
Error : StopIteration
Kutergin A. High performance computing with R
119. Parallel computation with R: high-level parallelism
packages: parallel, snow
Scope
High-level parallelism means that you do not need to define ideology of
communication between thread. Which process is master, which
processes are slaves? You only initialize parallel environment and work
inside it. All the details are on the shoulders of the package’s methods
Package: snow
Package contains the basic function allow you to create different type of
clusters on a multicore machine
Package: parallel
This package is an add-on packages multicore and snow and provides
drop- in replacements for most of the functionality of those packages
Kutergin A. High performance computing with R
120. Parallel computation with R: high-level parallelism
packages: parallel, snow
Scope
High-level parallelism means that you do not need to define ideology of
communication between thread. Which process is master, which
processes are slaves? You only initialize parallel environment and work
inside it. All the details are on the shoulders of the package’s methods
Package: snow
Package contains the basic function allow you to create different type of
clusters on a multicore machine
Package: parallel
This package is an add-on packages multicore and snow and provides
drop- in replacements for most of the functionality of those packages
Kutergin A. High performance computing with R
121. Parallel computation with R: high-level parallelism
packages: parallel, snow
Scope
High-level parallelism means that you do not need to define ideology of
communication between thread. Which process is master, which
processes are slaves? You only initialize parallel environment and work
inside it. All the details are on the shoulders of the package’s methods
Package: snow
Package contains the basic function allow you to create different type of
clusters on a multicore machine
Package: parallel
This package is an add-on packages multicore and snow and provides
drop- in replacements for most of the functionality of those packages
Kutergin A. High performance computing with R