SlideShare une entreprise Scribd logo
1  sur  248
Télécharger pour lire hors ligne
Taking R on limit
       Kutergin Alex

Perm State University, MiFIT

      16 october 2012



        Kutergin A.   High performance computing with R
Outline

    1   General words about R
    2   Motivation and scope
    3   The basic ways of speeding up the R-code
    4   The special way of speeding up the R-code: package pnmath
    5   Problem of data splitting: package iterator
    6   Parallel computation with R: high-level parallelism (packages:
        parallel, snow and additional packages)
    7   Parallel computation with R: low-level parallelism (package: Rmpi)
    8   Parallel computation with R: parallel execution of for-loops
        (package: foreach)
    9   Parallel computation with R: parallel computation with graphical
        processing unit (package: gputools)
   10   Working with vary large datasets: package filehash and package
        bigmemory
   11   Final words, some useful references and contacts

                              Kutergin A.   High performance computing with R
Outline

    1   General words about R
    2   Motivation and scope
    3   The basic ways of speeding up the R-code
    4   The special way of speeding up the R-code: package pnmath
    5   Problem of data splitting: package iterator
    6   Parallel computation with R: high-level parallelism (packages:
        parallel, snow and additional packages)
    7   Parallel computation with R: low-level parallelism (package: Rmpi)
    8   Parallel computation with R: parallel execution of for-loops
        (package: foreach)
    9   Parallel computation with R: parallel computation with graphical
        processing unit (package: gputools)
   10   Working with vary large datasets: package filehash and package
        bigmemory
   11   Final words, some useful references and contacts

                              Kutergin A.   High performance computing with R
Outline

    1   General words about R
    2   Motivation and scope
    3   The basic ways of speeding up the R-code
    4   The special way of speeding up the R-code: package pnmath
    5   Problem of data splitting: package iterator
    6   Parallel computation with R: high-level parallelism (packages:
        parallel, snow and additional packages)
    7   Parallel computation with R: low-level parallelism (package: Rmpi)
    8   Parallel computation with R: parallel execution of for-loops
        (package: foreach)
    9   Parallel computation with R: parallel computation with graphical
        processing unit (package: gputools)
   10   Working with vary large datasets: package filehash and package
        bigmemory
   11   Final words, some useful references and contacts

                              Kutergin A.   High performance computing with R
Outline

    1   General words about R
    2   Motivation and scope
    3   The basic ways of speeding up the R-code
    4   The special way of speeding up the R-code: package pnmath
    5   Problem of data splitting: package iterator
    6   Parallel computation with R: high-level parallelism (packages:
        parallel, snow and additional packages)
    7   Parallel computation with R: low-level parallelism (package: Rmpi)
    8   Parallel computation with R: parallel execution of for-loops
        (package: foreach)
    9   Parallel computation with R: parallel computation with graphical
        processing unit (package: gputools)
   10   Working with vary large datasets: package filehash and package
        bigmemory
   11   Final words, some useful references and contacts

                              Kutergin A.   High performance computing with R
Outline

    1   General words about R
    2   Motivation and scope
    3   The basic ways of speeding up the R-code
    4   The special way of speeding up the R-code: package pnmath
    5   Problem of data splitting: package iterator
    6   Parallel computation with R: high-level parallelism (packages:
        parallel, snow and additional packages)
    7   Parallel computation with R: low-level parallelism (package: Rmpi)
    8   Parallel computation with R: parallel execution of for-loops
        (package: foreach)
    9   Parallel computation with R: parallel computation with graphical
        processing unit (package: gputools)
   10   Working with vary large datasets: package filehash and package
        bigmemory
   11   Final words, some useful references and contacts

                              Kutergin A.   High performance computing with R
Outline

    1   General words about R
    2   Motivation and scope
    3   The basic ways of speeding up the R-code
    4   The special way of speeding up the R-code: package pnmath
    5   Problem of data splitting: package iterator
    6   Parallel computation with R: high-level parallelism (packages:
        parallel, snow and additional packages)
    7   Parallel computation with R: low-level parallelism (package: Rmpi)
    8   Parallel computation with R: parallel execution of for-loops
        (package: foreach)
    9   Parallel computation with R: parallel computation with graphical
        processing unit (package: gputools)
   10   Working with vary large datasets: package filehash and package
        bigmemory
   11   Final words, some useful references and contacts

                              Kutergin A.   High performance computing with R
Outline

    1   General words about R
    2   Motivation and scope
    3   The basic ways of speeding up the R-code
    4   The special way of speeding up the R-code: package pnmath
    5   Problem of data splitting: package iterator
    6   Parallel computation with R: high-level parallelism (packages:
        parallel, snow and additional packages)
    7   Parallel computation with R: low-level parallelism (package: Rmpi)
    8   Parallel computation with R: parallel execution of for-loops
        (package: foreach)
    9   Parallel computation with R: parallel computation with graphical
        processing unit (package: gputools)
   10   Working with vary large datasets: package filehash and package
        bigmemory
   11   Final words, some useful references and contacts

                              Kutergin A.   High performance computing with R
Outline

    1   General words about R
    2   Motivation and scope
    3   The basic ways of speeding up the R-code
    4   The special way of speeding up the R-code: package pnmath
    5   Problem of data splitting: package iterator
    6   Parallel computation with R: high-level parallelism (packages:
        parallel, snow and additional packages)
    7   Parallel computation with R: low-level parallelism (package: Rmpi)
    8   Parallel computation with R: parallel execution of for-loops
        (package: foreach)
    9   Parallel computation with R: parallel computation with graphical
        processing unit (package: gputools)
   10   Working with vary large datasets: package filehash and package
        bigmemory
   11   Final words, some useful references and contacts

                              Kutergin A.   High performance computing with R
Outline

    1   General words about R
    2   Motivation and scope
    3   The basic ways of speeding up the R-code
    4   The special way of speeding up the R-code: package pnmath
    5   Problem of data splitting: package iterator
    6   Parallel computation with R: high-level parallelism (packages:
        parallel, snow and additional packages)
    7   Parallel computation with R: low-level parallelism (package: Rmpi)
    8   Parallel computation with R: parallel execution of for-loops
        (package: foreach)
    9   Parallel computation with R: parallel computation with graphical
        processing unit (package: gputools)
   10   Working with vary large datasets: package filehash and package
        bigmemory
   11   Final words, some useful references and contacts

                              Kutergin A.   High performance computing with R
Outline

    1   General words about R
    2   Motivation and scope
    3   The basic ways of speeding up the R-code
    4   The special way of speeding up the R-code: package pnmath
    5   Problem of data splitting: package iterator
    6   Parallel computation with R: high-level parallelism (packages:
        parallel, snow and additional packages)
    7   Parallel computation with R: low-level parallelism (package: Rmpi)
    8   Parallel computation with R: parallel execution of for-loops
        (package: foreach)
    9   Parallel computation with R: parallel computation with graphical
        processing unit (package: gputools)
   10   Working with vary large datasets: package filehash and package
        bigmemory
   11   Final words, some useful references and contacts

                              Kutergin A.   High performance computing with R
Outline

    1   General words about R
    2   Motivation and scope
    3   The basic ways of speeding up the R-code
    4   The special way of speeding up the R-code: package pnmath
    5   Problem of data splitting: package iterator
    6   Parallel computation with R: high-level parallelism (packages:
        parallel, snow and additional packages)
    7   Parallel computation with R: low-level parallelism (package: Rmpi)
    8   Parallel computation with R: parallel execution of for-loops
        (package: foreach)
    9   Parallel computation with R: parallel computation with graphical
        processing unit (package: gputools)
   10   Working with vary large datasets: package filehash and package
        bigmemory
   11   Final words, some useful references and contacts

                              Kutergin A.   High performance computing with R
Outline

    1   General words about R
    2   Motivation and scope
    3   The basic ways of speeding up the R-code
    4   The special way of speeding up the R-code: package pnmath
    5   Problem of data splitting: package iterator
    6   Parallel computation with R: high-level parallelism (packages:
        parallel, snow and additional packages)
    7   Parallel computation with R: low-level parallelism (package: Rmpi)
    8   Parallel computation with R: parallel execution of for-loops
        (package: foreach)
    9   Parallel computation with R: parallel computation with graphical
        processing unit (package: gputools)
   10   Working with vary large datasets: package filehash and package
        bigmemory
   11   Final words, some useful references and contacts

                              Kutergin A.   High performance computing with R
Outline

    1   General words about R
    2   Motivation and scope
    3   The basic ways of speeding up the R-code
    4   The special way of speeding up the R-code: package pnmath
    5   Problem of data splitting: package iterator
    6   Parallel computation with R: high-level parallelism (packages:
        parallel, snow and additional packages)
    7   Parallel computation with R: low-level parallelism (package: Rmpi)
    8   Parallel computation with R: parallel execution of for-loops
        (package: foreach)
    9   Parallel computation with R: parallel computation with graphical
        processing unit (package: gputools)
   10   Working with vary large datasets: package filehash and package
        bigmemory
   11   Final words, some useful references and contacts

                              Kutergin A.   High performance computing with R
Outline

    1   General words about R
    2   Motivation and scope
    3   The basic ways of speeding up the R-code
    4   The special way of speeding up the R-code: package pnmath
    5   Problem of data splitting: package iterator
    6   Parallel computation with R: high-level parallelism (packages:
        parallel, snow and additional packages)
    7   Parallel computation with R: low-level parallelism (package: Rmpi)
    8   Parallel computation with R: parallel execution of for-loops
        (package: foreach)
    9   Parallel computation with R: parallel computation with graphical
        processing unit (package: gputools)
   10   Working with vary large datasets: package filehash and package
        bigmemory
   11   Final words, some useful references and contacts

                              Kutergin A.   High performance computing with R
Outline

    1   General words about R
    2   Motivation and scope
    3   The basic ways of speeding up the R-code
    4   The special way of speeding up the R-code: package pnmath
    5   Problem of data splitting: package iterator
    6   Parallel computation with R: high-level parallelism (packages:
        parallel, snow and additional packages)
    7   Parallel computation with R: low-level parallelism (package: Rmpi)
    8   Parallel computation with R: parallel execution of for-loops
        (package: foreach)
    9   Parallel computation with R: parallel computation with graphical
        processing unit (package: gputools)
   10   Working with vary large datasets: package filehash and package
        bigmemory
   11   Final words, some useful references and contacts

                              Kutergin A.   High performance computing with R
Outline

    1   General words about R
    2   Motivation and scope
    3   The basic ways of speeding up the R-code
    4   The special way of speeding up the R-code: package pnmath
    5   Problem of data splitting: package iterator
    6   Parallel computation with R: high-level parallelism (packages:
        parallel, snow and additional packages)
    7   Parallel computation with R: low-level parallelism (package: Rmpi)
    8   Parallel computation with R: parallel execution of for-loops
        (package: foreach)
    9   Parallel computation with R: parallel computation with graphical
        processing unit (package: gputools)
   10   Working with vary large datasets: package filehash and package
        bigmemory
   11   Final words, some useful references and contacts

                              Kutergin A.   High performance computing with R
Outline

    1   General words about R
    2   Motivation and scope
    3   The basic ways of speeding up the R-code
    4   The special way of speeding up the R-code: package pnmath
    5   Problem of data splitting: package iterator
    6   Parallel computation with R: high-level parallelism (packages:
        parallel, snow and additional packages)
    7   Parallel computation with R: low-level parallelism (package: Rmpi)
    8   Parallel computation with R: parallel execution of for-loops
        (package: foreach)
    9   Parallel computation with R: parallel computation with graphical
        processing unit (package: gputools)
   10   Working with vary large datasets: package filehash and package
        bigmemory
   11   Final words, some useful references and contacts

                              Kutergin A.   High performance computing with R
Outline

    1   General words about R
    2   Motivation and scope
    3   The basic ways of speeding up the R-code
    4   The special way of speeding up the R-code: package pnmath
    5   Problem of data splitting: package iterator
    6   Parallel computation with R: high-level parallelism (packages:
        parallel, snow and additional packages)
    7   Parallel computation with R: low-level parallelism (package: Rmpi)
    8   Parallel computation with R: parallel execution of for-loops
        (package: foreach)
    9   Parallel computation with R: parallel computation with graphical
        processing unit (package: gputools)
   10   Working with vary large datasets: package filehash and package
        bigmemory
   11   Final words, some useful references and contacts

                              Kutergin A.   High performance computing with R
Outline

    1   General words about R
    2   Motivation and scope
    3   The basic ways of speeding up the R-code
    4   The special way of speeding up the R-code: package pnmath
    5   Problem of data splitting: package iterator
    6   Parallel computation with R: high-level parallelism (packages:
        parallel, snow and additional packages)
    7   Parallel computation with R: low-level parallelism (package: Rmpi)
    8   Parallel computation with R: parallel execution of for-loops
        (package: foreach)
    9   Parallel computation with R: parallel computation with graphical
        processing unit (package: gputools)
   10   Working with vary large datasets: package filehash and package
        bigmemory
   11   Final words, some useful references and contacts

                              Kutergin A.   High performance computing with R
Outline

    1   General words about R
    2   Motivation and scope
    3   The basic ways of speeding up the R-code
    4   The special way of speeding up the R-code: package pnmath
    5   Problem of data splitting: package iterator
    6   Parallel computation with R: high-level parallelism (packages:
        parallel, snow and additional packages)
    7   Parallel computation with R: low-level parallelism (package: Rmpi)
    8   Parallel computation with R: parallel execution of for-loops
        (package: foreach)
    9   Parallel computation with R: parallel computation with graphical
        processing unit (package: gputools)
   10   Working with vary large datasets: package filehash and package
        bigmemory
   11   Final words, some useful references and contacts

                              Kutergin A.   High performance computing with R
Outline

    1   General words about R
    2   Motivation and scope
    3   The basic ways of speeding up the R-code
    4   The special way of speeding up the R-code: package pnmath
    5   Problem of data splitting: package iterator
    6   Parallel computation with R: high-level parallelism (packages:
        parallel, snow and additional packages)
    7   Parallel computation with R: low-level parallelism (package: Rmpi)
    8   Parallel computation with R: parallel execution of for-loops
        (package: foreach)
    9   Parallel computation with R: parallel computation with graphical
        processing unit (package: gputools)
   10   Working with vary large datasets: package filehash and package
        bigmemory
   11   Final words, some useful references and contacts

                              Kutergin A.   High performance computing with R
Outline

    1   General words about R
    2   Motivation and scope
    3   The basic ways of speeding up the R-code
    4   The special way of speeding up the R-code: package pnmath
    5   Problem of data splitting: package iterator
    6   Parallel computation with R: high-level parallelism (packages:
        parallel, snow and additional packages)
    7   Parallel computation with R: low-level parallelism (package: Rmpi)
    8   Parallel computation with R: parallel execution of for-loops
        (package: foreach)
    9   Parallel computation with R: parallel computation with graphical
        processing unit (package: gputools)
   10   Working with vary large datasets: package filehash and package
        bigmemory
   11   Final words, some useful references and contacts

                              Kutergin A.   High performance computing with R
Outline

    1   General words about R
    2   Motivation and scope
    3   The basic ways of speeding up the R-code
    4   The special way of speeding up the R-code: package pnmath
    5   Problem of data splitting: package iterator
    6   Parallel computation with R: high-level parallelism (packages:
        parallel, snow and additional packages)
    7   Parallel computation with R: low-level parallelism (package: Rmpi)
    8   Parallel computation with R: parallel execution of for-loops
        (package: foreach)
    9   Parallel computation with R: parallel computation with graphical
        processing unit (package: gputools)
   10   Working with vary large datasets: package filehash and package
        bigmemory
   11   Final words, some useful references and contacts

                              Kutergin A.   High performance computing with R
General words about R


  R software
  R is free powerful software for data analysis and statistical computing. R
  - console application with its own programming language running in
  interpreter mode. Lack of sophisticated GUI provides a number of
  advantages:
      there is no need to learn which algorithm is behind each button
      you can just learn the basic principles of R-programming and
      effectively solve complex problems using R-programming language

  Download R
      R can be downloaded from following link:
      http://cran.r-project.org/
      Project page: www.r-project.org



                             Kutergin A.   High performance computing with R
General words about R


  R software
  R is free powerful software for data analysis and statistical computing. R
  - console application with its own programming language running in
  interpreter mode. Lack of sophisticated GUI provides a number of
  advantages:
      there is no need to learn which algorithm is behind each button
      you can just learn the basic principles of R-programming and
      effectively solve complex problems using R-programming language

  Download R
      R can be downloaded from following link:
      http://cran.r-project.org/
      Project page: www.r-project.org



                             Kutergin A.   High performance computing with R
General words about R


  R software
  R is free powerful software for data analysis and statistical computing. R
  - console application with its own programming language running in
  interpreter mode. Lack of sophisticated GUI provides a number of
  advantages:
      there is no need to learn which algorithm is behind each button
      you can just learn the basic principles of R-programming and
      effectively solve complex problems using R-programming language

  Download R
      R can be downloaded from following link:
      http://cran.r-project.org/
      Project page: www.r-project.org



                             Kutergin A.   High performance computing with R
General words about R


  R software
  R is free powerful software for data analysis and statistical computing. R
  - console application with its own programming language running in
  interpreter mode. Lack of sophisticated GUI provides a number of
  advantages:
      there is no need to learn which algorithm is behind each button
      you can just learn the basic principles of R-programming and
      effectively solve complex problems using R-programming language

  Download R
      R can be downloaded from following link:
      http://cran.r-project.org/
      Project page: www.r-project.org



                             Kutergin A.   High performance computing with R
General words about R


  R software
  R is free powerful software for data analysis and statistical computing. R
  - console application with its own programming language running in
  interpreter mode. Lack of sophisticated GUI provides a number of
  advantages:
      there is no need to learn which algorithm is behind each button
      you can just learn the basic principles of R-programming and
      effectively solve complex problems using R-programming language

  Download R
      R can be downloaded from following link:
      http://cran.r-project.org/
      Project page: www.r-project.org



                             Kutergin A.   High performance computing with R
General words about R


  R software
  R is free powerful software for data analysis and statistical computing. R
  - console application with its own programming language running in
  interpreter mode. Lack of sophisticated GUI provides a number of
  advantages:
      there is no need to learn which algorithm is behind each button
      you can just learn the basic principles of R-programming and
      effectively solve complex problems using R-programming language

  Download R
      R can be downloaded from following link:
      http://cran.r-project.org/
      Project page: www.r-project.org



                             Kutergin A.   High performance computing with R
General words about R


  R software
  R is free powerful software for data analysis and statistical computing. R
  - console application with its own programming language running in
  interpreter mode. Lack of sophisticated GUI provides a number of
  advantages:
      there is no need to learn which algorithm is behind each button
      you can just learn the basic principles of R-programming and
      effectively solve complex problems using R-programming language

  Download R
      R can be downloaded from following link:
      http://cran.r-project.org/
      Project page: www.r-project.org



                             Kutergin A.   High performance computing with R
General words about R


  R software
  R is free powerful software for data analysis and statistical computing. R
  - console application with its own programming language running in
  interpreter mode. Lack of sophisticated GUI provides a number of
  advantages:
      there is no need to learn which algorithm is behind each button
      you can just learn the basic principles of R-programming and
      effectively solve complex problems using R-programming language

  Download R
      R can be downloaded from following link:
      http://cran.r-project.org/
      Project page: www.r-project.org



                             Kutergin A.   High performance computing with R
General words about R


  R software
  R is free powerful software for data analysis and statistical computing. R
  - console application with its own programming language running in
  interpreter mode. Lack of sophisticated GUI provides a number of
  advantages:
      there is no need to learn which algorithm is behind each button
      you can just learn the basic principles of R-programming and
      effectively solve complex problems using R-programming language

  Download R
      R can be downloaded from following link:
      http://cran.r-project.org/
      Project page: www.r-project.org



                             Kutergin A.   High performance computing with R
General words about R


  R software
  R is free powerful software for data analysis and statistical computing. R
  - console application with its own programming language running in
  interpreter mode. Lack of sophisticated GUI provides a number of
  advantages:
      there is no need to learn which algorithm is behind each button
      you can just learn the basic principles of R-programming and
      effectively solve complex problems using R-programming language

  Download R
      R can be downloaded from following link:
      http://cran.r-project.org/
      Project page: www.r-project.org



                             Kutergin A.   High performance computing with R
General words about R


  R software
  R is free powerful software for data analysis and statistical computing. R
  - console application with its own programming language running in
  interpreter mode. Lack of sophisticated GUI provides a number of
  advantages:
      there is no need to learn which algorithm is behind each button
      you can just learn the basic principles of R-programming and
      effectively solve complex problems using R-programming language

  Download R
      R can be downloaded from following link:
      http://cran.r-project.org/
      Project page: www.r-project.org



                             Kutergin A.   High performance computing with R
General words about R
View of R work session




                         Kutergin A.   High performance computing with R
General words about R
packages and information sources




    There are two sources of happiness for R-programmer
            Source of information                         Source of packages




                                   Kutergin A.   High performance computing with R
Motivation and scope

  Motivation
      Computers become more productive. Progress in computer’s
      hardware and software is amazing. These computing power became
      available even in a laptop
      Constantly increasing growth of data’s volume and the complexity of
      problems associated with data processing
      The emergence of multi-core PCs and CUDA technology

  Scope
      We: simple students or not powerful guys. So we don’t have
      supercomputer
      We have Core i5 or Core i7 or another multi-core laptop or PC with
      support of CUDA technology
      We have some computational tasks and we want to solve them more
      effectively


                           Kutergin A.   High performance computing with R
Motivation and scope

  Motivation
      Computers become more productive. Progress in computer’s
      hardware and software is amazing. These computing power became
      available even in a laptop
      Constantly increasing growth of data’s volume and the complexity of
      problems associated with data processing
      The emergence of multi-core PCs and CUDA technology

  Scope
      We: simple students or not powerful guys. So we don’t have
      supercomputer
      We have Core i5 or Core i7 or another multi-core laptop or PC with
      support of CUDA technology
      We have some computational tasks and we want to solve them more
      effectively


                           Kutergin A.   High performance computing with R
Motivation and scope

  Motivation
      Computers become more productive. Progress in computer’s
      hardware and software is amazing. These computing power became
      available even in a laptop
      Constantly increasing growth of data’s volume and the complexity of
      problems associated with data processing
      The emergence of multi-core PCs and CUDA technology

  Scope
      We: simple students or not powerful guys. So we don’t have
      supercomputer
      We have Core i5 or Core i7 or another multi-core laptop or PC with
      support of CUDA technology
      We have some computational tasks and we want to solve them more
      effectively


                           Kutergin A.   High performance computing with R
Motivation and scope

  Motivation
      Computers become more productive. Progress in computer’s
      hardware and software is amazing. These computing power became
      available even in a laptop
      Constantly increasing growth of data’s volume and the complexity of
      problems associated with data processing
      The emergence of multi-core PCs and CUDA technology

  Scope
      We: simple students or not powerful guys. So we don’t have
      supercomputer
      We have Core i5 or Core i7 or another multi-core laptop or PC with
      support of CUDA technology
      We have some computational tasks and we want to solve them more
      effectively


                           Kutergin A.   High performance computing with R
Motivation and scope

  Motivation
      Computers become more productive. Progress in computer’s
      hardware and software is amazing. These computing power became
      available even in a laptop
      Constantly increasing growth of data’s volume and the complexity of
      problems associated with data processing
      The emergence of multi-core PCs and CUDA technology

  Scope
      We: simple students or not powerful guys. So we don’t have
      supercomputer
      We have Core i5 or Core i7 or another multi-core laptop or PC with
      support of CUDA technology
      We have some computational tasks and we want to solve them more
      effectively


                           Kutergin A.   High performance computing with R
Motivation and scope

  Motivation
      Computers become more productive. Progress in computer’s
      hardware and software is amazing. These computing power became
      available even in a laptop
      Constantly increasing growth of data’s volume and the complexity of
      problems associated with data processing
      The emergence of multi-core PCs and CUDA technology

  Scope
      We: simple students or not powerful guys. So we don’t have
      supercomputer
      We have Core i5 or Core i7 or another multi-core laptop or PC with
      support of CUDA technology
      We have some computational tasks and we want to solve them more
      effectively


                           Kutergin A.   High performance computing with R
Motivation and scope

  Motivation
      Computers become more productive. Progress in computer’s
      hardware and software is amazing. These computing power became
      available even in a laptop
      Constantly increasing growth of data’s volume and the complexity of
      problems associated with data processing
      The emergence of multi-core PCs and CUDA technology

  Scope
      We: simple students or not powerful guys. So we don’t have
      supercomputer
      We have Core i5 or Core i7 or another multi-core laptop or PC with
      support of CUDA technology
      We have some computational tasks and we want to solve them more
      effectively


                           Kutergin A.   High performance computing with R
Motivation and scope

  Motivation
      Computers become more productive. Progress in computer’s
      hardware and software is amazing. These computing power became
      available even in a laptop
      Constantly increasing growth of data’s volume and the complexity of
      problems associated with data processing
      The emergence of multi-core PCs and CUDA technology

  Scope
      We: simple students or not powerful guys. So we don’t have
      supercomputer
      We have Core i5 or Core i7 or another multi-core laptop or PC with
      support of CUDA technology
      We have some computational tasks and we want to solve them more
      effectively


                           Kutergin A.   High performance computing with R
Motivation and scope

  Motivation
      Computers become more productive. Progress in computer’s
      hardware and software is amazing. These computing power became
      available even in a laptop
      Constantly increasing growth of data’s volume and the complexity of
      problems associated with data processing
      The emergence of multi-core PCs and CUDA technology

  Scope
      We: simple students or not powerful guys. So we don’t have
      supercomputer
      We have Core i5 or Core i7 or another multi-core laptop or PC with
      support of CUDA technology
      We have some computational tasks and we want to solve them more
      effectively


                           Kutergin A.   High performance computing with R
Motivation and scope

  Motivation
      Computers become more productive. Progress in computer’s
      hardware and software is amazing. These computing power became
      available even in a laptop
      Constantly increasing growth of data’s volume and the complexity of
      problems associated with data processing
      The emergence of multi-core PCs and CUDA technology

  Scope
      We: simple students or not powerful guys. So we don’t have
      supercomputer
      We have Core i5 or Core i7 or another multi-core laptop or PC with
      support of CUDA technology
      We have some computational tasks and we want to solve them more
      effectively


                           Kutergin A.   High performance computing with R
Motivation and scope

  Motivation
      Computers become more productive. Progress in computer’s
      hardware and software is amazing. These computing power became
      available even in a laptop
      Constantly increasing growth of data’s volume and the complexity of
      problems associated with data processing
      The emergence of multi-core PCs and CUDA technology

  Scope
      We: simple students or not powerful guys. So we don’t have
      supercomputer
      We have Core i5 or Core i7 or another multi-core laptop or PC with
      support of CUDA technology
      We have some computational tasks and we want to solve them more
      effectively


                           Kutergin A.   High performance computing with R
Motivation and scope

  Motivation
      Computers become more productive. Progress in computer’s
      hardware and software is amazing. These computing power became
      available even in a laptop
      Constantly increasing growth of data’s volume and the complexity of
      problems associated with data processing
      The emergence of multi-core PCs and CUDA technology

  Scope
      We: simple students or not powerful guys. So we don’t have
      supercomputer
      We have Core i5 or Core i7 or another multi-core laptop or PC with
      support of CUDA technology
      We have some computational tasks and we want to solve them more
      effectively


                           Kutergin A.   High performance computing with R
Motivation and scope

  Motivation
      Computers become more productive. Progress in computer’s
      hardware and software is amazing. These computing power became
      available even in a laptop
      Constantly increasing growth of data’s volume and the complexity of
      problems associated with data processing
      The emergence of multi-core PCs and CUDA technology

  Scope
      We: simple students or not powerful guys. So we don’t have
      supercomputer
      We have Core i5 or Core i7 or another multi-core laptop or PC with
      support of CUDA technology
      We have some computational tasks and we want to solve them more
      effectively


                           Kutergin A.   High performance computing with R
Motivation and scope

  Motivation
      Computers become more productive. Progress in computer’s
      hardware and software is amazing. These computing power became
      available even in a laptop
      Constantly increasing growth of data’s volume and the complexity of
      problems associated with data processing
      The emergence of multi-core PCs and CUDA technology

  Scope
      We: simple students or not powerful guys. So we don’t have
      supercomputer
      We have Core i5 or Core i7 or another multi-core laptop or PC with
      support of CUDA technology
      We have some computational tasks and we want to solve them more
      effectively


                           Kutergin A.   High performance computing with R
Motivation and scope

  Motivation
      Computers become more productive. Progress in computer’s
      hardware and software is amazing. These computing power became
      available even in a laptop
      Constantly increasing growth of data’s volume and the complexity of
      problems associated with data processing
      The emergence of multi-core PCs and CUDA technology

  Scope
      We: simple students or not powerful guys. So we don’t have
      supercomputer
      We have Core i5 or Core i7 or another multi-core laptop or PC with
      support of CUDA technology
      We have some computational tasks and we want to solve them more
      effectively


                           Kutergin A.   High performance computing with R
The basic ways of speeding up the R-code
How to check time of code’s execution?


    First way to check time of code execution

    #return CPU (and other) times that expr used
    s y s t e m . t i m e ()

    s y s t e m . t i m e ( s u m ( r u n i f (10000000) ) )


    Second way to check time of code execution

    #determines how much real and CPU time (in seconds) the
          currently running R process has already taken
    p r o c . t i m e ()

    s t a r t _ t i m e < - p r o c . t i m e ()
    s u m ( r u n i f (10000000) )
    e n d _ t i m e < - p r o c . t i m e () - s t a r t _ t i m e


                                   Kutergin A.   High performance computing with R
The basic ways of speeding up the R-code
How to check time of code’s execution?


    First way to check time of code execution

    #return CPU (and other) times that expr used
    s y s t e m . t i m e ()

    s y s t e m . t i m e ( s u m ( r u n i f (10000000) ) )


    Second way to check time of code execution

    #determines how much real and CPU time (in seconds) the
          currently running R process has already taken
    p r o c . t i m e ()

    s t a r t _ t i m e < - p r o c . t i m e ()
    s u m ( r u n i f (10000000) )
    e n d _ t i m e < - p r o c . t i m e () - s t a r t _ t i m e


                                   Kutergin A.   High performance computing with R
The basic ways of speeding up the R-code
How to check time of code’s execution?


    First way to check time of code execution

    #return CPU (and other) times that expr used
    s y s t e m . t i m e ()

    s y s t e m . t i m e ( s u m ( r u n i f (10000000) ) )


    Second way to check time of code execution

    #determines how much real and CPU time (in seconds) the
          currently running R process has already taken
    p r o c . t i m e ()

    s t a r t _ t i m e < - p r o c . t i m e ()
    s u m ( r u n i f (10000000) )
    e n d _ t i m e < - p r o c . t i m e () - s t a r t _ t i m e


                                   Kutergin A.   High performance computing with R
The basic ways of speeding up the R-code
How to check time of code’s execution?


    First way to check time of code execution

    #return CPU (and other) times that expr used
    s y s t e m . t i m e ()

    s y s t e m . t i m e ( s u m ( r u n i f (10000000) ) )


    Second way to check time of code execution

    #determines how much real and CPU time (in seconds) the
          currently running R process has already taken
    p r o c . t i m e ()

    s t a r t _ t i m e < - p r o c . t i m e ()
    s u m ( r u n i f (10000000) )
    e n d _ t i m e < - p r o c . t i m e () - s t a r t _ t i m e


                                   Kutergin A.   High performance computing with R
The basic ways of speeding up the R-code
How to check time of code’s execution?


    First way to check time of code execution

    #return CPU (and other) times that expr used
    s y s t e m . t i m e ()

    s y s t e m . t i m e ( s u m ( r u n i f (10000000) ) )


    Second way to check time of code execution

    #determines how much real and CPU time (in seconds) the
          currently running R process has already taken
    p r o c . t i m e ()

    s t a r t _ t i m e < - p r o c . t i m e ()
    s u m ( r u n i f (10000000) )
    e n d _ t i m e < - p r o c . t i m e () - s t a r t _ t i m e


                                   Kutergin A.   High performance computing with R
The basic ways of speeding up the R-code
How to check time of code’s execution?


    First way to check time of code execution

    #return CPU (and other) times that expr used
    s y s t e m . t i m e ()

    s y s t e m . t i m e ( s u m ( r u n i f (10000000) ) )


    Second way to check time of code execution

    #determines how much real and CPU time (in seconds) the
          currently running R process has already taken
    p r o c . t i m e ()

    s t a r t _ t i m e < - p r o c . t i m e ()
    s u m ( r u n i f (10000000) )
    e n d _ t i m e < - p r o c . t i m e () - s t a r t _ t i m e


                                   Kutergin A.   High performance computing with R
The basic ways of speeding up the R-code
How to check time of code’s execution?


    First way to check time of code execution

    #return CPU (and other) times that expr used
    s y s t e m . t i m e ()

    s y s t e m . t i m e ( s u m ( r u n i f (10000000) ) )


    Second way to check time of code execution

    #determines how much real and CPU time (in seconds) the
          currently running R process has already taken
    p r o c . t i m e ()

    s t a r t _ t i m e < - p r o c . t i m e ()
    s u m ( r u n i f (10000000) )
    e n d _ t i m e < - p r o c . t i m e () - s t a r t _ t i m e


                                   Kutergin A.   High performance computing with R
The basic ways of speeding up the R-code
Analysis of the effectiveness of programs




    Function’s profile
    Let us compare work of universal function lm() and more specific
    function lm.fit()

    #Loading some dataset
    d a t a ( longley )
    #Recording profile to file lm.out
    Rprof ( " l m . o u t " )
    #Runnig lm() 1000 times
    i n v i s i b l e ( r e p l i c a t e (1000 , l m ( Employed ~ . -1 , d a t a
         = longley ) ) )
    #Switch off profiling
    Rprof ( NULL )



                                   Kutergin A.   High performance computing with R
The basic ways of speeding up the R-code
Analysis of the effectiveness of programs



    #Preparing data for lm.fit()
    longleydm < - d a t a . m a t r i x ( d a t a . f r a m e ( longley ) )
    #Recording profile to file lm.fit.out
    Rprof ( " l m . f i t . o u t " )
    #Runnig lm.fit() 1000 times
    i n v i s i b l e ( r e p l i c a t e (1000 ,
          l m . fit ( longleydm [ , -7] , longleydm [ ,7]) ) )
    #Switch off profiling
    Rprof ( NULL )

    #Results of profiling
    summaryRprof ( " l m . o u t " )  $ sampling . t i m e
    [1] 3.12
    summaryRprof ( " l m . f i t . o u t " )  $ sampling . t i m e
    [1] 0.18
    #What a difference!


                                   Kutergin A.   High performance computing with R
The basic ways of speeding up the R-code
Analysis of the effectiveness of programs


    Package profr
    This package allows you to visualize the results of profiling

    library (" profr ")
    p l o t ( p a r s e _ rprof ( " l m . o u t " ) , main = " P r o f i l e ␣ o f ␣
          lm () ")
    p l o t ( p a r s e _ rprof ( " l m . f i t . o u t " ) , main = " P r o f i l e ␣
          of ␣ lm . fit () ")


    Package proftools
    This package allows you to visualize call graph for a function

    l i b r a r y (" R g r a p h v i z "); l i b r a r y (" p r o f t o o l s ")
    lmfitprod < - readProfileData ( " l m . f i t . o u t " )
    pl o t P r o f i l e C al l Gr a p h ( lmfitprod )

                                    Kutergin A.   High performance computing with R
The basic ways of speeding up the R-code
Analysis of the effectiveness of programs




                                   Kutergin A.   High performance computing with R
The basic ways of speeding up the R-code
Analysis of the effectiveness of programs

    Сall graph




                                   Kutergin A.   High performance computing with R
The basic ways of speeding up the R-code
Analysis of the effectiveness of programs

    Another example of profiling:
    its = 2500; d i m = 1750
    X = m a t r i x ( r n o r m ( its * d i m ) ,its , d i m )
    my . cross . p r o d < - f u n c t i o n ( X )
    {
          C = m a t r i x (0 , n c o l ( X ) , n c o l ( X ) )
          f o r ( i in 1: n r o w ( X ) )
          {
                  C = C + X [i ,] % o % X [i ,]
          }
          return (C)
    }
    l i b r a r y ( proftools )
    C = my . cross . p r o d ( X )
    C1 = t ( X ) % * % X
    C2 = c r o s s p r o d ( X )
    Rprof ( NULL )
    p r i n t ( a l l . e q u a l ( C , C1 , C2 ) )

                                   Kutergin A.   High performance computing with R
The basic ways of speeding up the R-code
Analysis of the effectiveness of programs


    Result:
    l i b r a r y ( proftools )
    profile . data <-
         readProfileData ( " m a t r i x - m u l t . o u t " )
    flatProfile ( p r o f i l e . d a t a )
            / total . pct total . t i m e self . pct self . t i m e
    my . cross . p r o d          87.31     88.36           0.04 0.04
    +                             49.84     50.44 49.84 50.44
    %o%                           37.37     37.82           0.00 0.00
    outer                         37.37     37.82 37.27 37.72
    %*%                             7.75      7.84          7.75 7.84
    crossprod                       4.86      4.92          4.86 4.92
    t                               0.16      0.16          0.06 0.06
    t. default                      0.10      0.10          0.10 0.10
    matrix                          0.06      0.06          0.06 0.06
    as . vector                     0.02      0.02          0.02 0.02


                                   Kutergin A.   High performance computing with R
The basic ways of speeding up the R-code
Vectorization of code




    Note!
    Loops in R are slow! You can speed up your code by using operation with
    vectors and matrix. It’s another style of programming, but you have to
    use it!

                    #Simple example of vectorization:
                    #component-wise addition of two vectors
                    #Generating some random data
                    #First vector
                    a < - r n o r m ( n = 10000000)
                    #Second vector
                    b < - r n o r m ( n = 10000000)
                    #Vector for result
                    x < - r e p (0 , l e n g t h ( a ) )



                               Kutergin A.   High performance computing with R
The basic ways of speeding up the R-code
Vectorization of code

    So, what about results?
    #Slow way
    time _1 <- system . time
    (
         f o r ( i in 1: l e n g t h ( a ) )
         {
              x [ i ] < - a [ i ]+ b [ i ]
         }
    ) ; t i m e _ 1[3]
    36.97
    #Fast way
    t i m e _ 2 < - s y s t e m . t i m e ( x < - a + b ) ; t i m e _ 2[3]
    0.04
    Acceleration < - t i m e _ 1[3] / t i m e _ 2[3]
    Acceleration
    924.25
    #That’s hot!!!!

                                  Kutergin A.   High performance computing with R
The basic ways of speeding up the R-code
Using magic of linear algebra

    Using linear algebra operations
    #Scalar product
    #Slow way
    s t a r t < - p r o c . t i m e ()
    res < - 0
    f o r ( i in 1: l e n g t h ( a ) )
    {
         res < - res + a [ i ] * b [ i ]
    }
    e n d < - p r o c . t i m e () - s t a r t ; e n d [3]
    16.71
    #Fast
    s y s t e m . t i m e ( a % * % b ) [3]
    0.09
    #Even faster...
    s y s t e m . t i m e ( s u m ( a * b ) ) [3]
    0.08

                                     Kutergin A.    High performance computing with R
The basic ways of speeding up the R-code
Using magic of linear algebra


    Using linear algebra operations
    #Matrix multiplication slow version
    its < - 2500; d i m < - 1750;
    X < - m a t r i x ( r n o r m ( its * d i m ) ,its , d i m )
    X _ transp < - t ( X )
    res < - a r r a y ( NA , d i m = c (1750 , 1750) )
    s t a r t < - p r o c . t i m e ()
    f o r ( i in 1: n r o w ( X _ transp ) )
    {
         f o r ( j in 1: n c o l ( X ) )
         {
               res [i , j ] < - s u m ( X _ transp [i ,] * X [ , j ])
         }
    }
    e n d < - p r o c . t i m e () - s t a r t ; e n d [3]
    221.67


                                 Kutergin A.   High performance computing with R
The basic ways of speeding up the R-code
Using magic of linear algebra




    Package BLAS
    BLAS means: Basic Linear Algebra Subprogram. This package contains
    the optimized algorithms for linear algebra operations and uses all cores
    of multi-core machine automatically.

                    #Matrix multiplication fast version
                    #BLAS matrix mult
                    s y s t e m . t i m e ( X _ transp % * % X ) [3]
                    7.77
                    #Even faster...
                    s y s t e m . t i m e ( c r o s s p r o d ( X ) ) [3]
                    4.98




                                     Kutergin A.   High performance computing with R
The basic ways of speeding up the R-code
Using build-in R-functions



    Package base
    You can find full list of build-in R-function in the documentation for this
    package

    #Let us define a function
    mySum < - f u n c t i o n ( N )
    {
         sumVal < - 0
         f o r ( i in 1: N ) { sumVal < - sumVal + i }
         r e t u r n ( sumVal )
    }
    s y s t e m . t i m e ( mySum (1000000) ) [3]
    0.62
    s y s t e m . t i m e ( s u m ( a s . n u m e r i c ( s e q (1 , 1000000) ) ) ) [3]
    0.05


                                   Kutergin A.   High performance computing with R
The basic ways of speeding up the R-code
Using build-in R-functions


    Why are build R-functions faster?
    R programming language works in interpreter mode. This is always slowly
    than using the compiled code. So, when you call build-in R-function, you
    call optimized and compiled code. Also build-in functions are written in
    more low-level programming language (like C/C++ or FORTRAN) and
    this provides greater access to the capabilities of the hardware

    Note!
    You can select data from vector, matrix, data.frame or array using some
    condition that applies to row or column of data object. It’s fast and
    convenient
             #Extracting only positive values from first column of X
             its < - 2500; d i m < - 1750;
             X < - m a t r i x ( r n o r m ( its * d i m ) ,its , d i m )
             X [ X [ ,1] >0 , 1]


                                   Kutergin A.   High performance computing with R
The special way of speeding up the R-code
  Package pnmath
  Another easy way to get a speed-up is to use the pnmath package in R.
  This package takes many of the standard math functions in R and
  replaces them with multi-threaded versions, using OpenMP. Some
  functions get more of a speed-up than others with pnmath.

             #Generating random data
             v1 < - r u n i f (1000)
             v2 < - r u n i f (100000000)
             #Time of execution without pnmath
             s y s t e m . t i m e ( q t u k e y ( v1 ,2 ,3) )
             s y s t e m . t i m e ( e x p ( v2 ) )
             s y s t e m . t i m e ( s q r t ( v2 ) )
             #Time of execution with pnmath
             l i b r a r y ( pnmath )
             s y s t e m . t i m e ( q t u k e y ( v1 ,2 ,3) )
             s y s t e m . t i m e ( e x p ( v2 ) )
             s y s t e m . t i m e ( s q r t ( v2 ) )

                               Kutergin A.   High performance computing with R
Problem of data splitting

  Our problem:
  Before you start the calculation you need to split your data set according
  the number of threads. Another reason is more effective data processing
  in loops

  Package iterator
  The iterators package provides tools for iterating over various R data
  structures. Iterators are available for vectors, lists, matrices, arrays, data
  frames and files. By following very simple conventions, new iterators can
  be written to support any type of data source, such as database queries
  or dynamically generating data

  Download
  You can download this useful package from CRAN (available for
  Windows!): http:
  //cran.r-project.org/web/packages/iterators/index.html

                               Kutergin A.   High performance computing with R
Problem of data splitting

  Our problem:
  Before you start the calculation you need to split your data set according
  the number of threads. Another reason is more effective data processing
  in loops

  Package iterator
  The iterators package provides tools for iterating over various R data
  structures. Iterators are available for vectors, lists, matrices, arrays, data
  frames and files. By following very simple conventions, new iterators can
  be written to support any type of data source, such as database queries
  or dynamically generating data

  Download
  You can download this useful package from CRAN (available for
  Windows!): http:
  //cran.r-project.org/web/packages/iterators/index.html

                               Kutergin A.   High performance computing with R
Problem of data splitting

  Our problem:
  Before you start the calculation you need to split your data set according
  the number of threads. Another reason is more effective data processing
  in loops

  Package iterator
  The iterators package provides tools for iterating over various R data
  structures. Iterators are available for vectors, lists, matrices, arrays, data
  frames and files. By following very simple conventions, new iterators can
  be written to support any type of data source, such as database queries
  or dynamically generating data

  Download
  You can download this useful package from CRAN (available for
  Windows!): http:
  //cran.r-project.org/web/packages/iterators/index.html

                               Kutergin A.   High performance computing with R
Problem of data splitting

  Our problem:
  Before you start the calculation you need to split your data set according
  the number of threads. Another reason is more effective data processing
  in loops

  Package iterator
  The iterators package provides tools for iterating over various R data
  structures. Iterators are available for vectors, lists, matrices, arrays, data
  frames and files. By following very simple conventions, new iterators can
  be written to support any type of data source, such as database queries
  or dynamically generating data

  Download
  You can download this useful package from CRAN (available for
  Windows!): http:
  //cran.r-project.org/web/packages/iterators/index.html

                               Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    icount(count)
    This method returns the iterator that counts starting from one. Count -
    number of times that iterator will be fire. If not specified, it will count
    forever

    nextElem()
    This function returns next value of pre-define iterator. When the iterator
    has no more values, it calls stop with massage "StopIteration"
                 l i b r a r y ( iterators )
                 #create an iterator that counts from 1 to 3.
                 it < - icount (2)
                 nextElem ( it )
    Example:     [1] 1
                 nextElem ( it )
                 [1] 2
                 t r y ( nextElem ( it ) ) # expect a StopIteration exception
                 Error : StopIteration
                               Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities



    You can create iterators by rows of your data structure using iter()
    function:
    l i b r a r y ( iterators )
    #Creating iterator by rows of data set
    irState < - iter ( state . x77 , b y = " r o w " )
    nextElem ( irState )
      Population Income Illiteracy Life Murder Area
    Alabama 3615     3624   2.1    69.05 15.1 50708
    nextElem ( irState )
      Population Income Illiteracy Life Murder Area
    Alaska     365   6315   1.5    69.31 11.3 566432
    nextElem ( irState )
      Population Income Illiteracy Life Murder Area
    Arizona 2212     4530   1.8    70.55 7.8 113417



                               Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities

    You can create iterators by columns of your data structure using iter()
                #Creating iterator by columns of data set
                icState < - iter ( state . x77 , b y = " c o l " )
                nextElem ( icState )
                                 Population
                Alabama                   3615
                Alaska                     365
                Arizona                   2212
                nextElem ( icState )
    function:                    Illiteracy
                Alabama                    2.1
                Alaska                     1.5
                Arizona                    1.8
                nextElem ( icState )
                                 Income
                Alabama              3624
                Alaska               6315
                Arizona              4530
                               Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities

    You can create iterators using iter() function from data object returned
    by some other function:
      l i b r a r y ( iterators )
      #Define a function, wich generate random data
      GetDataStructure < - f u n c t i o n ( meanVal1 , meanVal2 ,
                                                                           sdVal1 , sdVal2 )
      {
               a < - r n o r m (4 , m e a n = meanVal1 , s d = sdVal1 )
               b < - r n o r m (4 , m e a n = meanVal2 , s d = sdVal2 )
               data <- a%o%b
               return ( data )
      }
      ifun < - iter ( GetDataStructure (25 ,27 ,2.5 ,3.5) , b y = " r o w " )
      nextElem ( ifun ) ; nextElem ( ifun )
                  [ ,1]      [ ,2]      [ ,3]     [ ,4]
      [1 ,] 701.7055 939.6574 764.7724 799.6965
                  [ ,1]      [ ,2]      [ ,3]     [ ,4]
      [1 ,] 647.6349 867.2512 705.8422 738.0752
                                 Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    idiv(n, chunk, chunksize)
    This is more interesting iterator. It provides the ability to divide a
    numeric value into pieces
          n - number of times that iterator will fire. If not specified, it will
          count forever
          chunks - the number of pieces that n should be divided into. It
          useful when you know the number of pieces that you want. If
          specified, the chunkSize should not be
          chunkSize - the maximum size of the pieces, that n should be
          divided into. It is useful when you know the size of the pieces that
          you want. If specified, the chunk should not be

    Some thoughts...
    However, practical application of this iterator is unclear. Perhaps it can
    be used to index vector or rows/columns of arrays

                                 Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    idiv(n, chunk, chunksize)
    This is more interesting iterator. It provides the ability to divide a
    numeric value into pieces
          n - number of times that iterator will fire. If not specified, it will
          count forever
          chunks - the number of pieces that n should be divided into. It
          useful when you know the number of pieces that you want. If
          specified, the chunkSize should not be
          chunkSize - the maximum size of the pieces, that n should be
          divided into. It is useful when you know the size of the pieces that
          you want. If specified, the chunk should not be

    Some thoughts...
    However, practical application of this iterator is unclear. Perhaps it can
    be used to index vector or rows/columns of arrays

                                 Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    idiv(n, chunk, chunksize)
    This is more interesting iterator. It provides the ability to divide a
    numeric value into pieces
          n - number of times that iterator will fire. If not specified, it will
          count forever
          chunks - the number of pieces that n should be divided into. It
          useful when you know the number of pieces that you want. If
          specified, the chunkSize should not be
          chunkSize - the maximum size of the pieces, that n should be
          divided into. It is useful when you know the size of the pieces that
          you want. If specified, the chunk should not be

    Some thoughts...
    However, practical application of this iterator is unclear. Perhaps it can
    be used to index vector or rows/columns of arrays

                                 Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    idiv(n, chunk, chunksize)
    This is more interesting iterator. It provides the ability to divide a
    numeric value into pieces
          n - number of times that iterator will fire. If not specified, it will
          count forever
          chunks - the number of pieces that n should be divided into. It
          useful when you know the number of pieces that you want. If
          specified, the chunkSize should not be
          chunkSize - the maximum size of the pieces, that n should be
          divided into. It is useful when you know the size of the pieces that
          you want. If specified, the chunk should not be

    Some thoughts...
    However, practical application of this iterator is unclear. Perhaps it can
    be used to index vector or rows/columns of arrays

                                 Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    idiv(n, chunk, chunksize)
    This is more interesting iterator. It provides the ability to divide a
    numeric value into pieces
          n - number of times that iterator will fire. If not specified, it will
          count forever
          chunks - the number of pieces that n should be divided into. It
          useful when you know the number of pieces that you want. If
          specified, the chunkSize should not be
          chunkSize - the maximum size of the pieces, that n should be
          divided into. It is useful when you know the size of the pieces that
          you want. If specified, the chunk should not be

    Some thoughts...
    However, practical application of this iterator is unclear. Perhaps it can
    be used to index vector or rows/columns of arrays

                                 Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    idiv(n, chunk, chunksize)
    This is more interesting iterator. It provides the ability to divide a
    numeric value into pieces
          n - number of times that iterator will fire. If not specified, it will
          count forever
          chunks - the number of pieces that n should be divided into. It
          useful when you know the number of pieces that you want. If
          specified, the chunkSize should not be
          chunkSize - the maximum size of the pieces, that n should be
          divided into. It is useful when you know the size of the pieces that
          you want. If specified, the chunk should not be

    Some thoughts...
    However, practical application of this iterator is unclear. Perhaps it can
    be used to index vector or rows/columns of arrays

                                 Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    idiv(n, chunk, chunksize)
    This is more interesting iterator. It provides the ability to divide a
    numeric value into pieces
          n - number of times that iterator will fire. If not specified, it will
          count forever
          chunks - the number of pieces that n should be divided into. It
          useful when you know the number of pieces that you want. If
          specified, the chunkSize should not be
          chunkSize - the maximum size of the pieces, that n should be
          divided into. It is useful when you know the size of the pieces that
          you want. If specified, the chunk should not be

    Some thoughts...
    However, practical application of this iterator is unclear. Perhaps it can
    be used to index vector or rows/columns of arrays

                                 Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    idiv(n, chunk, chunksize)
    This is more interesting iterator. It provides the ability to divide a
    numeric value into pieces
          n - number of times that iterator will fire. If not specified, it will
          count forever
          chunks - the number of pieces that n should be divided into. It
          useful when you know the number of pieces that you want. If
          specified, the chunkSize should not be
          chunkSize - the maximum size of the pieces, that n should be
          divided into. It is useful when you know the size of the pieces that
          you want. If specified, the chunk should not be

    Some thoughts...
    However, practical application of this iterator is unclear. Perhaps it can
    be used to index vector or rows/columns of arrays

                                 Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    idiv(n, chunk, chunksize)
    This is more interesting iterator. It provides the ability to divide a
    numeric value into pieces
          n - number of times that iterator will fire. If not specified, it will
          count forever
          chunks - the number of pieces that n should be divided into. It
          useful when you know the number of pieces that you want. If
          specified, the chunkSize should not be
          chunkSize - the maximum size of the pieces, that n should be
          divided into. It is useful when you know the size of the pieces that
          you want. If specified, the chunk should not be

    Some thoughts...
    However, practical application of this iterator is unclear. Perhaps it can
    be used to index vector or rows/columns of arrays

                                 Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    idiv(n, chunk, chunksize)
    This is more interesting iterator. It provides the ability to divide a
    numeric value into pieces
          n - number of times that iterator will fire. If not specified, it will
          count forever
          chunks - the number of pieces that n should be divided into. It
          useful when you know the number of pieces that you want. If
          specified, the chunkSize should not be
          chunkSize - the maximum size of the pieces, that n should be
          divided into. It is useful when you know the size of the pieces that
          you want. If specified, the chunk should not be

    Some thoughts...
    However, practical application of this iterator is unclear. Perhaps it can
    be used to index vector or rows/columns of arrays

                                 Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities




                                   Example:
    l i b r a r y ( iterators )
    # divide the value 10 into 3 pieces
    it < - idiv (10 , chunks =3)
    nextElem ( it )
    [1] 4
    nextElem ( it )
    [1] 3
    nextElem ( it )
    [1] 3
    t r y ( nextElem ( it ) ) # expect a StopIteration exception
    Error : StopIteration




                            Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities



                                   Example:
    l i b r a r y ( iterators )
    # divide the value 10 into pieces no larger than 3
    it < - idiv (10 , chunkSize =3)
    nextElem ( it )
    [1] 3
    nextElem ( it )
    [1] 3
    nextElem ( it )
    [1] 2
    nextElem ( it )
    [1] 2
    t r y ( nextElem ( it ) ) # expect a StopIteration exception
    Error : StopIteration



                            Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    iread.table(file,...., verbose = FALSE)
    This is very important iterator. It returns an iterator object over the rows
    of the data frame stored in a file in table format
          file - the name of the file to read data from
          ... - all additional arguments are passed on to the read.table
          function. See the documentation for read.table for more information
          verbose - logical flag indicating whether or not to print the calls to
          read.table

    Note!
    In this version of iread.table, both the read.table arguments header and
    row.names must be specified. This is because the default values of this
    arguments depend on the contents of the beginning of the file. In order
    to make the subsequent calls to read.table work consistently, the user
    must specified those arguments explicitly

                                 Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    iread.table(file,...., verbose = FALSE)
    This is very important iterator. It returns an iterator object over the rows
    of the data frame stored in a file in table format
          file - the name of the file to read data from
          ... - all additional arguments are passed on to the read.table
          function. See the documentation for read.table for more information
          verbose - logical flag indicating whether or not to print the calls to
          read.table

    Note!
    In this version of iread.table, both the read.table arguments header and
    row.names must be specified. This is because the default values of this
    arguments depend on the contents of the beginning of the file. In order
    to make the subsequent calls to read.table work consistently, the user
    must specified those arguments explicitly

                                 Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    iread.table(file,...., verbose = FALSE)
    This is very important iterator. It returns an iterator object over the rows
    of the data frame stored in a file in table format
          file - the name of the file to read data from
          ... - all additional arguments are passed on to the read.table
          function. See the documentation for read.table for more information
          verbose - logical flag indicating whether or not to print the calls to
          read.table

    Note!
    In this version of iread.table, both the read.table arguments header and
    row.names must be specified. This is because the default values of this
    arguments depend on the contents of the beginning of the file. In order
    to make the subsequent calls to read.table work consistently, the user
    must specified those arguments explicitly

                                 Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    iread.table(file,...., verbose = FALSE)
    This is very important iterator. It returns an iterator object over the rows
    of the data frame stored in a file in table format
          file - the name of the file to read data from
          ... - all additional arguments are passed on to the read.table
          function. See the documentation for read.table for more information
          verbose - logical flag indicating whether or not to print the calls to
          read.table

    Note!
    In this version of iread.table, both the read.table arguments header and
    row.names must be specified. This is because the default values of this
    arguments depend on the contents of the beginning of the file. In order
    to make the subsequent calls to read.table work consistently, the user
    must specified those arguments explicitly

                                 Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    iread.table(file,...., verbose = FALSE)
    This is very important iterator. It returns an iterator object over the rows
    of the data frame stored in a file in table format
          file - the name of the file to read data from
          ... - all additional arguments are passed on to the read.table
          function. See the documentation for read.table for more information
          verbose - logical flag indicating whether or not to print the calls to
          read.table

    Note!
    In this version of iread.table, both the read.table arguments header and
    row.names must be specified. This is because the default values of this
    arguments depend on the contents of the beginning of the file. In order
    to make the subsequent calls to read.table work consistently, the user
    must specified those arguments explicitly

                                 Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    iread.table(file,...., verbose = FALSE)
    This is very important iterator. It returns an iterator object over the rows
    of the data frame stored in a file in table format
          file - the name of the file to read data from
          ... - all additional arguments are passed on to the read.table
          function. See the documentation for read.table for more information
          verbose - logical flag indicating whether or not to print the calls to
          read.table

    Note!
    In this version of iread.table, both the read.table arguments header and
    row.names must be specified. This is because the default values of this
    arguments depend on the contents of the beginning of the file. In order
    to make the subsequent calls to read.table work consistently, the user
    must specified those arguments explicitly

                                 Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    iread.table(file,...., verbose = FALSE)
    This is very important iterator. It returns an iterator object over the rows
    of the data frame stored in a file in table format
          file - the name of the file to read data from
          ... - all additional arguments are passed on to the read.table
          function. See the documentation for read.table for more information
          verbose - logical flag indicating whether or not to print the calls to
          read.table

    Note!
    In this version of iread.table, both the read.table arguments header and
    row.names must be specified. This is because the default values of this
    arguments depend on the contents of the beginning of the file. In order
    to make the subsequent calls to read.table work consistently, the user
    must specified those arguments explicitly

                                 Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    iread.table(file,...., verbose = FALSE)
    This is very important iterator. It returns an iterator object over the rows
    of the data frame stored in a file in table format
          file - the name of the file to read data from
          ... - all additional arguments are passed on to the read.table
          function. See the documentation for read.table for more information
          verbose - logical flag indicating whether or not to print the calls to
          read.table

    Note!
    In this version of iread.table, both the read.table arguments header and
    row.names must be specified. This is because the default values of this
    arguments depend on the contents of the beginning of the file. In order
    to make the subsequent calls to read.table work consistently, the user
    must specified those arguments explicitly

                                 Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    iread.table(file,...., verbose = FALSE)
    This is very important iterator. It returns an iterator object over the rows
    of the data frame stored in a file in table format
          file - the name of the file to read data from
          ... - all additional arguments are passed on to the read.table
          function. See the documentation for read.table for more information
          verbose - logical flag indicating whether or not to print the calls to
          read.table

    Note!
    In this version of iread.table, both the read.table arguments header and
    row.names must be specified. This is because the default values of this
    arguments depend on the contents of the beginning of the file. In order
    to make the subsequent calls to read.table work consistently, the user
    must specified those arguments explicitly

                                 Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    iread.table(file,...., verbose = FALSE)
    This is very important iterator. It returns an iterator object over the rows
    of the data frame stored in a file in table format
          file - the name of the file to read data from
          ... - all additional arguments are passed on to the read.table
          function. See the documentation for read.table for more information
          verbose - logical flag indicating whether or not to print the calls to
          read.table

    Note!
    In this version of iread.table, both the read.table arguments header and
    row.names must be specified. This is because the default values of this
    arguments depend on the contents of the beginning of the file. In order
    to make the subsequent calls to read.table work consistently, the user
    must specified those arguments explicitly

                                 Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities




                                         Example:
    l i b r a r y ( iterators )

    #Gnerating random data
    its < - 2000000; d i m < - 3;
    d a t a < - m a t r i x ( r n o r m ( its * d i m ) ,its , d i m )

    #Writing them to HDD
    DATA _ PATH < - " E : / R _ w o r k s / d a t a . t x t "
    #Size of this file - 123 Mb
    w r i t e . t a b l e ( d a t a , f i l e = DATA _ PATH ,
                            a p p e n d = FALSE , sep = "  t " , dec = " . " )




                                  Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities



    #Creating an iterator from these file
    ifile < - iread . t a b l e ( DATA _ PATH , header = TRUE ,
                     r o w . n a m e s = NULL , verbose = FALSE )
       row . names               V1           V2       V3
    1           1 -1.042623 -1.386382 0.399798
    > nextElem ( ifile )
      row . names        V1        V2        V3
    1           2 0.8841238 -1.296501 0.1580505
    > nextElem ( ifile )
      row . names         V1          V2        V3
    1           3 -0.3195784 -0.6830442 0.3647958

    #It works very fast!!!!
    #remove the file
    f i l e . r e m o v e ( DATA _ PATH )



                                  Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    isplit(x, f, drop = FALSE)
    Another important type of iterator. It returns the the iterator that divides
    the data in the vector x into the groups define by f
          x - vector or data frame of values to be split into groups
          f - a factor or list of factors used to categorize x
          drop - logical indicating if levels that do not occur should be
          dropped More detailed information you can find in documentation

    Note!
    This is very useful! For example, you have data-vector and vector
    containing values of the factor corresponding these data. Factor has
    pre-defined levels. Thus, you can extract data in loop for each of the
    levels of the factor without additional operations. Also you can define in
    loop’s body some conditions for each level of the factor and use this
    condition as a condition for if() control structures

                                  Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    isplit(x, f, drop = FALSE)
    Another important type of iterator. It returns the the iterator that divides
    the data in the vector x into the groups define by f
          x - vector or data frame of values to be split into groups
          f - a factor or list of factors used to categorize x
          drop - logical indicating if levels that do not occur should be
          dropped More detailed information you can find in documentation

    Note!
    This is very useful! For example, you have data-vector and vector
    containing values of the factor corresponding these data. Factor has
    pre-defined levels. Thus, you can extract data in loop for each of the
    levels of the factor without additional operations. Also you can define in
    loop’s body some conditions for each level of the factor and use this
    condition as a condition for if() control structures

                                  Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    isplit(x, f, drop = FALSE)
    Another important type of iterator. It returns the the iterator that divides
    the data in the vector x into the groups define by f
          x - vector or data frame of values to be split into groups
          f - a factor or list of factors used to categorize x
          drop - logical indicating if levels that do not occur should be
          dropped More detailed information you can find in documentation

    Note!
    This is very useful! For example, you have data-vector and vector
    containing values of the factor corresponding these data. Factor has
    pre-defined levels. Thus, you can extract data in loop for each of the
    levels of the factor without additional operations. Also you can define in
    loop’s body some conditions for each level of the factor and use this
    condition as a condition for if() control structures

                                  Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    isplit(x, f, drop = FALSE)
    Another important type of iterator. It returns the the iterator that divides
    the data in the vector x into the groups define by f
          x - vector or data frame of values to be split into groups
          f - a factor or list of factors used to categorize x
          drop - logical indicating if levels that do not occur should be
          dropped More detailed information you can find in documentation

    Note!
    This is very useful! For example, you have data-vector and vector
    containing values of the factor corresponding these data. Factor has
    pre-defined levels. Thus, you can extract data in loop for each of the
    levels of the factor without additional operations. Also you can define in
    loop’s body some conditions for each level of the factor and use this
    condition as a condition for if() control structures

                                  Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    isplit(x, f, drop = FALSE)
    Another important type of iterator. It returns the the iterator that divides
    the data in the vector x into the groups define by f
          x - vector or data frame of values to be split into groups
          f - a factor or list of factors used to categorize x
          drop - logical indicating if levels that do not occur should be
          dropped More detailed information you can find in documentation

    Note!
    This is very useful! For example, you have data-vector and vector
    containing values of the factor corresponding these data. Factor has
    pre-defined levels. Thus, you can extract data in loop for each of the
    levels of the factor without additional operations. Also you can define in
    loop’s body some conditions for each level of the factor and use this
    condition as a condition for if() control structures

                                  Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    isplit(x, f, drop = FALSE)
    Another important type of iterator. It returns the the iterator that divides
    the data in the vector x into the groups define by f
          x - vector or data frame of values to be split into groups
          f - a factor or list of factors used to categorize x
          drop - logical indicating if levels that do not occur should be
          dropped More detailed information you can find in documentation

    Note!
    This is very useful! For example, you have data-vector and vector
    containing values of the factor corresponding these data. Factor has
    pre-defined levels. Thus, you can extract data in loop for each of the
    levels of the factor without additional operations. Also you can define in
    loop’s body some conditions for each level of the factor and use this
    condition as a condition for if() control structures

                                  Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    isplit(x, f, drop = FALSE)
    Another important type of iterator. It returns the the iterator that divides
    the data in the vector x into the groups define by f
          x - vector or data frame of values to be split into groups
          f - a factor or list of factors used to categorize x
          drop - logical indicating if levels that do not occur should be
          dropped More detailed information you can find in documentation

    Note!
    This is very useful! For example, you have data-vector and vector
    containing values of the factor corresponding these data. Factor has
    pre-defined levels. Thus, you can extract data in loop for each of the
    levels of the factor without additional operations. Also you can define in
    loop’s body some conditions for each level of the factor and use this
    condition as a condition for if() control structures

                                  Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    isplit(x, f, drop = FALSE)
    Another important type of iterator. It returns the the iterator that divides
    the data in the vector x into the groups define by f
          x - vector or data frame of values to be split into groups
          f - a factor or list of factors used to categorize x
          drop - logical indicating if levels that do not occur should be
          dropped More detailed information you can find in documentation

    Note!
    This is very useful! For example, you have data-vector and vector
    containing values of the factor corresponding these data. Factor has
    pre-defined levels. Thus, you can extract data in loop for each of the
    levels of the factor without additional operations. Also you can define in
    loop’s body some conditions for each level of the factor and use this
    condition as a condition for if() control structures

                                  Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    isplit(x, f, drop = FALSE)
    Another important type of iterator. It returns the the iterator that divides
    the data in the vector x into the groups define by f
          x - vector or data frame of values to be split into groups
          f - a factor or list of factors used to categorize x
          drop - logical indicating if levels that do not occur should be
          dropped More detailed information you can find in documentation

    Note!
    This is very useful! For example, you have data-vector and vector
    containing values of the factor corresponding these data. Factor has
    pre-defined levels. Thus, you can extract data in loop for each of the
    levels of the factor without additional operations. Also you can define in
    loop’s body some conditions for each level of the factor and use this
    condition as a condition for if() control structures

                                  Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    isplit(x, f, drop = FALSE)
    Another important type of iterator. It returns the the iterator that divides
    the data in the vector x into the groups define by f
          x - vector or data frame of values to be split into groups
          f - a factor or list of factors used to categorize x
          drop - logical indicating if levels that do not occur should be
          dropped More detailed information you can find in documentation

    Note!
    This is very useful! For example, you have data-vector and vector
    containing values of the factor corresponding these data. Factor has
    pre-defined levels. Thus, you can extract data in loop for each of the
    levels of the factor without additional operations. Also you can define in
    loop’s body some conditions for each level of the factor and use this
    condition as a condition for if() control structures

                                  Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities



      x < - r n o r m (200)
      f < - f a c t o r ( s a m p l e (1:10 , l e n g t h ( x ) ,
          r e p l a c e = TRUE ) )
      it < - isplit (x , f )

      nextElem ( it )
     $ value
      [1] 0.14087878 -0.94439161 0.13593045
      [4] -0.25732860 0.09422130 -0.55166303
      [7] -0.18325419 -0.00871019 0.38344388
    [10] -1.05761926 1.16126462 -0.02280205
    [13] -0.67338941 1.68724264 0.92112983
    [16] 1.39782337 -0.51060989
     $ key
     $ key [[1]]
    [1] " 1 "


                                  Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    Special types of iterators
    Also there are special types of iterators. Like: irnorm(..., cont) or
    irunif(..., count). These function returns an iterator that return random
    number of various distributions. Each one is a wrapper around a standard
    R function
          count - number of times that the iterator will fire. If not specified, it
          will fire values forever
          ... - arguments to pass to the underling rnorm function

    Example:
    # create an iterator that returns three random numbers
    it < - irnorm (1 , c o u n t =2)
    nextElem ( it ) ; nextElem ( it )
    [1] 0.1592311
    [1] -1.387449
    t r y ( nextElem ( it ) ) # expect a StopIteration exception
    Error : StopIteration
                                 Kutergin A.   High performance computing with R
Parallel computation with R: high-level parallelism
packages: parallel, snow




    Scope
    High-level parallelism means that you do not need to define ideology of
    communication between thread. Which process is master, which
    processes are slaves? You only initialize parallel environment and work
    inside it. All the details are on the shoulders of the package’s methods

    Package: snow
    Package contains the basic function allow you to create different type of
    clusters on a multicore machine

    Package: parallel
    This package is an add-on packages multicore and snow and provides
    drop- in replacements for most of the functionality of those packages



                               Kutergin A.   High performance computing with R
Parallel computation with R: high-level parallelism
packages: parallel, snow




    Scope
    High-level parallelism means that you do not need to define ideology of
    communication between thread. Which process is master, which
    processes are slaves? You only initialize parallel environment and work
    inside it. All the details are on the shoulders of the package’s methods

    Package: snow
    Package contains the basic function allow you to create different type of
    clusters on a multicore machine

    Package: parallel
    This package is an add-on packages multicore and snow and provides
    drop- in replacements for most of the functionality of those packages



                               Kutergin A.   High performance computing with R
Parallel computation with R: high-level parallelism
packages: parallel, snow




    Scope
    High-level parallelism means that you do not need to define ideology of
    communication between thread. Which process is master, which
    processes are slaves? You only initialize parallel environment and work
    inside it. All the details are on the shoulders of the package’s methods

    Package: snow
    Package contains the basic function allow you to create different type of
    clusters on a multicore machine

    Package: parallel
    This package is an add-on packages multicore and snow and provides
    drop- in replacements for most of the functionality of those packages



                               Kutergin A.   High performance computing with R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R
HPC in R

Contenu connexe

Similaire à HPC in R

High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopDataWorks Summit
 
Open source analytics
Open source analyticsOpen source analytics
Open source analyticsAjay Ohri
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopRevolution Analytics
 
Big Data - Analytics with R
Big Data - Analytics with RBig Data - Analytics with R
Big Data - Analytics with RTechsparks
 
Model Building with RevoScaleR: Using R and Hadoop for Statistical Computation
Model Building with RevoScaleR: Using R and Hadoop for Statistical ComputationModel Building with RevoScaleR: Using R and Hadoop for Statistical Computation
Model Building with RevoScaleR: Using R and Hadoop for Statistical ComputationRevolution Analytics
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopRevolution Analytics
 
Big Data Processing: Performance Gain Through In-Memory Computation
Big Data Processing: Performance Gain Through In-Memory ComputationBig Data Processing: Performance Gain Through In-Memory Computation
Big Data Processing: Performance Gain Through In-Memory ComputationUT, San Antonio
 
Dataframes Showdown (miniConf 2022)
Dataframes Showdown (miniConf 2022)Dataframes Showdown (miniConf 2022)
Dataframes Showdown (miniConf 2022)8thLight
 

Similaire à HPC in R (8)

High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
Open source analytics
Open source analyticsOpen source analytics
Open source analytics
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
Big Data - Analytics with R
Big Data - Analytics with RBig Data - Analytics with R
Big Data - Analytics with R
 
Model Building with RevoScaleR: Using R and Hadoop for Statistical Computation
Model Building with RevoScaleR: Using R and Hadoop for Statistical ComputationModel Building with RevoScaleR: Using R and Hadoop for Statistical Computation
Model Building with RevoScaleR: Using R and Hadoop for Statistical Computation
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
Big Data Processing: Performance Gain Through In-Memory Computation
Big Data Processing: Performance Gain Through In-Memory ComputationBig Data Processing: Performance Gain Through In-Memory Computation
Big Data Processing: Performance Gain Through In-Memory Computation
 
Dataframes Showdown (miniConf 2022)
Dataframes Showdown (miniConf 2022)Dataframes Showdown (miniConf 2022)
Dataframes Showdown (miniConf 2022)
 

Plus de Vyacheslav Arbuzov

Seminar psu 21.10.2013 financial bubble diagnostics based on log-periodic p...
Seminar psu 21.10.2013   financial bubble diagnostics based on log-periodic p...Seminar psu 21.10.2013   financial bubble diagnostics based on log-periodic p...
Seminar psu 21.10.2013 financial bubble diagnostics based on log-periodic p...Vyacheslav Arbuzov
 
Lppl models MiFIT 2013: Vyacheslav Arbuzov
Lppl models MiFIT 2013: Vyacheslav ArbuzovLppl models MiFIT 2013: Vyacheslav Arbuzov
Lppl models MiFIT 2013: Vyacheslav ArbuzovVyacheslav Arbuzov
 
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavVyacheslav Arbuzov
 

Plus de Vyacheslav Arbuzov (6)

Seminar PSU 10.10.2014 mme
Seminar PSU 10.10.2014 mmeSeminar PSU 10.10.2014 mme
Seminar PSU 10.10.2014 mme
 
Perm winter school 2014.01.31
Perm winter school 2014.01.31Perm winter school 2014.01.31
Perm winter school 2014.01.31
 
Seminar psu 21.10.2013 financial bubble diagnostics based on log-periodic p...
Seminar psu 21.10.2013   financial bubble diagnostics based on log-periodic p...Seminar psu 21.10.2013   financial bubble diagnostics based on log-periodic p...
Seminar psu 21.10.2013 financial bubble diagnostics based on log-periodic p...
 
Seminar psu 20.10.2013
Seminar psu 20.10.2013Seminar psu 20.10.2013
Seminar psu 20.10.2013
 
Lppl models MiFIT 2013: Vyacheslav Arbuzov
Lppl models MiFIT 2013: Vyacheslav ArbuzovLppl models MiFIT 2013: Vyacheslav Arbuzov
Lppl models MiFIT 2013: Vyacheslav Arbuzov
 
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
 

Dernier

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 

Dernier (20)

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 

HPC in R

  • 1. Taking R on limit Kutergin Alex Perm State University, MiFIT 16 october 2012 Kutergin A. High performance computing with R
  • 2. Outline 1 General words about R 2 Motivation and scope 3 The basic ways of speeding up the R-code 4 The special way of speeding up the R-code: package pnmath 5 Problem of data splitting: package iterator 6 Parallel computation with R: high-level parallelism (packages: parallel, snow and additional packages) 7 Parallel computation with R: low-level parallelism (package: Rmpi) 8 Parallel computation with R: parallel execution of for-loops (package: foreach) 9 Parallel computation with R: parallel computation with graphical processing unit (package: gputools) 10 Working with vary large datasets: package filehash and package bigmemory 11 Final words, some useful references and contacts Kutergin A. High performance computing with R
  • 3. Outline 1 General words about R 2 Motivation and scope 3 The basic ways of speeding up the R-code 4 The special way of speeding up the R-code: package pnmath 5 Problem of data splitting: package iterator 6 Parallel computation with R: high-level parallelism (packages: parallel, snow and additional packages) 7 Parallel computation with R: low-level parallelism (package: Rmpi) 8 Parallel computation with R: parallel execution of for-loops (package: foreach) 9 Parallel computation with R: parallel computation with graphical processing unit (package: gputools) 10 Working with vary large datasets: package filehash and package bigmemory 11 Final words, some useful references and contacts Kutergin A. High performance computing with R
  • 4. Outline 1 General words about R 2 Motivation and scope 3 The basic ways of speeding up the R-code 4 The special way of speeding up the R-code: package pnmath 5 Problem of data splitting: package iterator 6 Parallel computation with R: high-level parallelism (packages: parallel, snow and additional packages) 7 Parallel computation with R: low-level parallelism (package: Rmpi) 8 Parallel computation with R: parallel execution of for-loops (package: foreach) 9 Parallel computation with R: parallel computation with graphical processing unit (package: gputools) 10 Working with vary large datasets: package filehash and package bigmemory 11 Final words, some useful references and contacts Kutergin A. High performance computing with R
  • 5. Outline 1 General words about R 2 Motivation and scope 3 The basic ways of speeding up the R-code 4 The special way of speeding up the R-code: package pnmath 5 Problem of data splitting: package iterator 6 Parallel computation with R: high-level parallelism (packages: parallel, snow and additional packages) 7 Parallel computation with R: low-level parallelism (package: Rmpi) 8 Parallel computation with R: parallel execution of for-loops (package: foreach) 9 Parallel computation with R: parallel computation with graphical processing unit (package: gputools) 10 Working with vary large datasets: package filehash and package bigmemory 11 Final words, some useful references and contacts Kutergin A. High performance computing with R
  • 6. Outline 1 General words about R 2 Motivation and scope 3 The basic ways of speeding up the R-code 4 The special way of speeding up the R-code: package pnmath 5 Problem of data splitting: package iterator 6 Parallel computation with R: high-level parallelism (packages: parallel, snow and additional packages) 7 Parallel computation with R: low-level parallelism (package: Rmpi) 8 Parallel computation with R: parallel execution of for-loops (package: foreach) 9 Parallel computation with R: parallel computation with graphical processing unit (package: gputools) 10 Working with vary large datasets: package filehash and package bigmemory 11 Final words, some useful references and contacts Kutergin A. High performance computing with R
  • 7. Outline 1 General words about R 2 Motivation and scope 3 The basic ways of speeding up the R-code 4 The special way of speeding up the R-code: package pnmath 5 Problem of data splitting: package iterator 6 Parallel computation with R: high-level parallelism (packages: parallel, snow and additional packages) 7 Parallel computation with R: low-level parallelism (package: Rmpi) 8 Parallel computation with R: parallel execution of for-loops (package: foreach) 9 Parallel computation with R: parallel computation with graphical processing unit (package: gputools) 10 Working with vary large datasets: package filehash and package bigmemory 11 Final words, some useful references and contacts Kutergin A. High performance computing with R
  • 8. Outline 1 General words about R 2 Motivation and scope 3 The basic ways of speeding up the R-code 4 The special way of speeding up the R-code: package pnmath 5 Problem of data splitting: package iterator 6 Parallel computation with R: high-level parallelism (packages: parallel, snow and additional packages) 7 Parallel computation with R: low-level parallelism (package: Rmpi) 8 Parallel computation with R: parallel execution of for-loops (package: foreach) 9 Parallel computation with R: parallel computation with graphical processing unit (package: gputools) 10 Working with vary large datasets: package filehash and package bigmemory 11 Final words, some useful references and contacts Kutergin A. High performance computing with R
  • 9. Outline 1 General words about R 2 Motivation and scope 3 The basic ways of speeding up the R-code 4 The special way of speeding up the R-code: package pnmath 5 Problem of data splitting: package iterator 6 Parallel computation with R: high-level parallelism (packages: parallel, snow and additional packages) 7 Parallel computation with R: low-level parallelism (package: Rmpi) 8 Parallel computation with R: parallel execution of for-loops (package: foreach) 9 Parallel computation with R: parallel computation with graphical processing unit (package: gputools) 10 Working with vary large datasets: package filehash and package bigmemory 11 Final words, some useful references and contacts Kutergin A. High performance computing with R
  • 10. Outline 1 General words about R 2 Motivation and scope 3 The basic ways of speeding up the R-code 4 The special way of speeding up the R-code: package pnmath 5 Problem of data splitting: package iterator 6 Parallel computation with R: high-level parallelism (packages: parallel, snow and additional packages) 7 Parallel computation with R: low-level parallelism (package: Rmpi) 8 Parallel computation with R: parallel execution of for-loops (package: foreach) 9 Parallel computation with R: parallel computation with graphical processing unit (package: gputools) 10 Working with vary large datasets: package filehash and package bigmemory 11 Final words, some useful references and contacts Kutergin A. High performance computing with R
  • 11. Outline 1 General words about R 2 Motivation and scope 3 The basic ways of speeding up the R-code 4 The special way of speeding up the R-code: package pnmath 5 Problem of data splitting: package iterator 6 Parallel computation with R: high-level parallelism (packages: parallel, snow and additional packages) 7 Parallel computation with R: low-level parallelism (package: Rmpi) 8 Parallel computation with R: parallel execution of for-loops (package: foreach) 9 Parallel computation with R: parallel computation with graphical processing unit (package: gputools) 10 Working with vary large datasets: package filehash and package bigmemory 11 Final words, some useful references and contacts Kutergin A. High performance computing with R
  • 12. Outline 1 General words about R 2 Motivation and scope 3 The basic ways of speeding up the R-code 4 The special way of speeding up the R-code: package pnmath 5 Problem of data splitting: package iterator 6 Parallel computation with R: high-level parallelism (packages: parallel, snow and additional packages) 7 Parallel computation with R: low-level parallelism (package: Rmpi) 8 Parallel computation with R: parallel execution of for-loops (package: foreach) 9 Parallel computation with R: parallel computation with graphical processing unit (package: gputools) 10 Working with vary large datasets: package filehash and package bigmemory 11 Final words, some useful references and contacts Kutergin A. High performance computing with R
  • 13. Outline 1 General words about R 2 Motivation and scope 3 The basic ways of speeding up the R-code 4 The special way of speeding up the R-code: package pnmath 5 Problem of data splitting: package iterator 6 Parallel computation with R: high-level parallelism (packages: parallel, snow and additional packages) 7 Parallel computation with R: low-level parallelism (package: Rmpi) 8 Parallel computation with R: parallel execution of for-loops (package: foreach) 9 Parallel computation with R: parallel computation with graphical processing unit (package: gputools) 10 Working with vary large datasets: package filehash and package bigmemory 11 Final words, some useful references and contacts Kutergin A. High performance computing with R
  • 14. Outline 1 General words about R 2 Motivation and scope 3 The basic ways of speeding up the R-code 4 The special way of speeding up the R-code: package pnmath 5 Problem of data splitting: package iterator 6 Parallel computation with R: high-level parallelism (packages: parallel, snow and additional packages) 7 Parallel computation with R: low-level parallelism (package: Rmpi) 8 Parallel computation with R: parallel execution of for-loops (package: foreach) 9 Parallel computation with R: parallel computation with graphical processing unit (package: gputools) 10 Working with vary large datasets: package filehash and package bigmemory 11 Final words, some useful references and contacts Kutergin A. High performance computing with R
  • 15. Outline 1 General words about R 2 Motivation and scope 3 The basic ways of speeding up the R-code 4 The special way of speeding up the R-code: package pnmath 5 Problem of data splitting: package iterator 6 Parallel computation with R: high-level parallelism (packages: parallel, snow and additional packages) 7 Parallel computation with R: low-level parallelism (package: Rmpi) 8 Parallel computation with R: parallel execution of for-loops (package: foreach) 9 Parallel computation with R: parallel computation with graphical processing unit (package: gputools) 10 Working with vary large datasets: package filehash and package bigmemory 11 Final words, some useful references and contacts Kutergin A. High performance computing with R
  • 16. Outline 1 General words about R 2 Motivation and scope 3 The basic ways of speeding up the R-code 4 The special way of speeding up the R-code: package pnmath 5 Problem of data splitting: package iterator 6 Parallel computation with R: high-level parallelism (packages: parallel, snow and additional packages) 7 Parallel computation with R: low-level parallelism (package: Rmpi) 8 Parallel computation with R: parallel execution of for-loops (package: foreach) 9 Parallel computation with R: parallel computation with graphical processing unit (package: gputools) 10 Working with vary large datasets: package filehash and package bigmemory 11 Final words, some useful references and contacts Kutergin A. High performance computing with R
  • 17. Outline 1 General words about R 2 Motivation and scope 3 The basic ways of speeding up the R-code 4 The special way of speeding up the R-code: package pnmath 5 Problem of data splitting: package iterator 6 Parallel computation with R: high-level parallelism (packages: parallel, snow and additional packages) 7 Parallel computation with R: low-level parallelism (package: Rmpi) 8 Parallel computation with R: parallel execution of for-loops (package: foreach) 9 Parallel computation with R: parallel computation with graphical processing unit (package: gputools) 10 Working with vary large datasets: package filehash and package bigmemory 11 Final words, some useful references and contacts Kutergin A. High performance computing with R
  • 18. Outline 1 General words about R 2 Motivation and scope 3 The basic ways of speeding up the R-code 4 The special way of speeding up the R-code: package pnmath 5 Problem of data splitting: package iterator 6 Parallel computation with R: high-level parallelism (packages: parallel, snow and additional packages) 7 Parallel computation with R: low-level parallelism (package: Rmpi) 8 Parallel computation with R: parallel execution of for-loops (package: foreach) 9 Parallel computation with R: parallel computation with graphical processing unit (package: gputools) 10 Working with vary large datasets: package filehash and package bigmemory 11 Final words, some useful references and contacts Kutergin A. High performance computing with R
  • 19. Outline 1 General words about R 2 Motivation and scope 3 The basic ways of speeding up the R-code 4 The special way of speeding up the R-code: package pnmath 5 Problem of data splitting: package iterator 6 Parallel computation with R: high-level parallelism (packages: parallel, snow and additional packages) 7 Parallel computation with R: low-level parallelism (package: Rmpi) 8 Parallel computation with R: parallel execution of for-loops (package: foreach) 9 Parallel computation with R: parallel computation with graphical processing unit (package: gputools) 10 Working with vary large datasets: package filehash and package bigmemory 11 Final words, some useful references and contacts Kutergin A. High performance computing with R
  • 20. Outline 1 General words about R 2 Motivation and scope 3 The basic ways of speeding up the R-code 4 The special way of speeding up the R-code: package pnmath 5 Problem of data splitting: package iterator 6 Parallel computation with R: high-level parallelism (packages: parallel, snow and additional packages) 7 Parallel computation with R: low-level parallelism (package: Rmpi) 8 Parallel computation with R: parallel execution of for-loops (package: foreach) 9 Parallel computation with R: parallel computation with graphical processing unit (package: gputools) 10 Working with vary large datasets: package filehash and package bigmemory 11 Final words, some useful references and contacts Kutergin A. High performance computing with R
  • 21. Outline 1 General words about R 2 Motivation and scope 3 The basic ways of speeding up the R-code 4 The special way of speeding up the R-code: package pnmath 5 Problem of data splitting: package iterator 6 Parallel computation with R: high-level parallelism (packages: parallel, snow and additional packages) 7 Parallel computation with R: low-level parallelism (package: Rmpi) 8 Parallel computation with R: parallel execution of for-loops (package: foreach) 9 Parallel computation with R: parallel computation with graphical processing unit (package: gputools) 10 Working with vary large datasets: package filehash and package bigmemory 11 Final words, some useful references and contacts Kutergin A. High performance computing with R
  • 22. Outline 1 General words about R 2 Motivation and scope 3 The basic ways of speeding up the R-code 4 The special way of speeding up the R-code: package pnmath 5 Problem of data splitting: package iterator 6 Parallel computation with R: high-level parallelism (packages: parallel, snow and additional packages) 7 Parallel computation with R: low-level parallelism (package: Rmpi) 8 Parallel computation with R: parallel execution of for-loops (package: foreach) 9 Parallel computation with R: parallel computation with graphical processing unit (package: gputools) 10 Working with vary large datasets: package filehash and package bigmemory 11 Final words, some useful references and contacts Kutergin A. High performance computing with R
  • 23. Outline 1 General words about R 2 Motivation and scope 3 The basic ways of speeding up the R-code 4 The special way of speeding up the R-code: package pnmath 5 Problem of data splitting: package iterator 6 Parallel computation with R: high-level parallelism (packages: parallel, snow and additional packages) 7 Parallel computation with R: low-level parallelism (package: Rmpi) 8 Parallel computation with R: parallel execution of for-loops (package: foreach) 9 Parallel computation with R: parallel computation with graphical processing unit (package: gputools) 10 Working with vary large datasets: package filehash and package bigmemory 11 Final words, some useful references and contacts Kutergin A. High performance computing with R
  • 24. Outline 1 General words about R 2 Motivation and scope 3 The basic ways of speeding up the R-code 4 The special way of speeding up the R-code: package pnmath 5 Problem of data splitting: package iterator 6 Parallel computation with R: high-level parallelism (packages: parallel, snow and additional packages) 7 Parallel computation with R: low-level parallelism (package: Rmpi) 8 Parallel computation with R: parallel execution of for-loops (package: foreach) 9 Parallel computation with R: parallel computation with graphical processing unit (package: gputools) 10 Working with vary large datasets: package filehash and package bigmemory 11 Final words, some useful references and contacts Kutergin A. High performance computing with R
  • 25. General words about R R software R is free powerful software for data analysis and statistical computing. R - console application with its own programming language running in interpreter mode. Lack of sophisticated GUI provides a number of advantages: there is no need to learn which algorithm is behind each button you can just learn the basic principles of R-programming and effectively solve complex problems using R-programming language Download R R can be downloaded from following link: http://cran.r-project.org/ Project page: www.r-project.org Kutergin A. High performance computing with R
  • 26. General words about R R software R is free powerful software for data analysis and statistical computing. R - console application with its own programming language running in interpreter mode. Lack of sophisticated GUI provides a number of advantages: there is no need to learn which algorithm is behind each button you can just learn the basic principles of R-programming and effectively solve complex problems using R-programming language Download R R can be downloaded from following link: http://cran.r-project.org/ Project page: www.r-project.org Kutergin A. High performance computing with R
  • 27. General words about R R software R is free powerful software for data analysis and statistical computing. R - console application with its own programming language running in interpreter mode. Lack of sophisticated GUI provides a number of advantages: there is no need to learn which algorithm is behind each button you can just learn the basic principles of R-programming and effectively solve complex problems using R-programming language Download R R can be downloaded from following link: http://cran.r-project.org/ Project page: www.r-project.org Kutergin A. High performance computing with R
  • 28. General words about R R software R is free powerful software for data analysis and statistical computing. R - console application with its own programming language running in interpreter mode. Lack of sophisticated GUI provides a number of advantages: there is no need to learn which algorithm is behind each button you can just learn the basic principles of R-programming and effectively solve complex problems using R-programming language Download R R can be downloaded from following link: http://cran.r-project.org/ Project page: www.r-project.org Kutergin A. High performance computing with R
  • 29. General words about R R software R is free powerful software for data analysis and statistical computing. R - console application with its own programming language running in interpreter mode. Lack of sophisticated GUI provides a number of advantages: there is no need to learn which algorithm is behind each button you can just learn the basic principles of R-programming and effectively solve complex problems using R-programming language Download R R can be downloaded from following link: http://cran.r-project.org/ Project page: www.r-project.org Kutergin A. High performance computing with R
  • 30. General words about R R software R is free powerful software for data analysis and statistical computing. R - console application with its own programming language running in interpreter mode. Lack of sophisticated GUI provides a number of advantages: there is no need to learn which algorithm is behind each button you can just learn the basic principles of R-programming and effectively solve complex problems using R-programming language Download R R can be downloaded from following link: http://cran.r-project.org/ Project page: www.r-project.org Kutergin A. High performance computing with R
  • 31. General words about R R software R is free powerful software for data analysis and statistical computing. R - console application with its own programming language running in interpreter mode. Lack of sophisticated GUI provides a number of advantages: there is no need to learn which algorithm is behind each button you can just learn the basic principles of R-programming and effectively solve complex problems using R-programming language Download R R can be downloaded from following link: http://cran.r-project.org/ Project page: www.r-project.org Kutergin A. High performance computing with R
  • 32. General words about R R software R is free powerful software for data analysis and statistical computing. R - console application with its own programming language running in interpreter mode. Lack of sophisticated GUI provides a number of advantages: there is no need to learn which algorithm is behind each button you can just learn the basic principles of R-programming and effectively solve complex problems using R-programming language Download R R can be downloaded from following link: http://cran.r-project.org/ Project page: www.r-project.org Kutergin A. High performance computing with R
  • 33. General words about R R software R is free powerful software for data analysis and statistical computing. R - console application with its own programming language running in interpreter mode. Lack of sophisticated GUI provides a number of advantages: there is no need to learn which algorithm is behind each button you can just learn the basic principles of R-programming and effectively solve complex problems using R-programming language Download R R can be downloaded from following link: http://cran.r-project.org/ Project page: www.r-project.org Kutergin A. High performance computing with R
  • 34. General words about R R software R is free powerful software for data analysis and statistical computing. R - console application with its own programming language running in interpreter mode. Lack of sophisticated GUI provides a number of advantages: there is no need to learn which algorithm is behind each button you can just learn the basic principles of R-programming and effectively solve complex problems using R-programming language Download R R can be downloaded from following link: http://cran.r-project.org/ Project page: www.r-project.org Kutergin A. High performance computing with R
  • 35. General words about R R software R is free powerful software for data analysis and statistical computing. R - console application with its own programming language running in interpreter mode. Lack of sophisticated GUI provides a number of advantages: there is no need to learn which algorithm is behind each button you can just learn the basic principles of R-programming and effectively solve complex problems using R-programming language Download R R can be downloaded from following link: http://cran.r-project.org/ Project page: www.r-project.org Kutergin A. High performance computing with R
  • 36. General words about R View of R work session Kutergin A. High performance computing with R
  • 37. General words about R packages and information sources There are two sources of happiness for R-programmer Source of information Source of packages Kutergin A. High performance computing with R
  • 38. Motivation and scope Motivation Computers become more productive. Progress in computer’s hardware and software is amazing. These computing power became available even in a laptop Constantly increasing growth of data’s volume and the complexity of problems associated with data processing The emergence of multi-core PCs and CUDA technology Scope We: simple students or not powerful guys. So we don’t have supercomputer We have Core i5 or Core i7 or another multi-core laptop or PC with support of CUDA technology We have some computational tasks and we want to solve them more effectively Kutergin A. High performance computing with R
  • 39. Motivation and scope Motivation Computers become more productive. Progress in computer’s hardware and software is amazing. These computing power became available even in a laptop Constantly increasing growth of data’s volume and the complexity of problems associated with data processing The emergence of multi-core PCs and CUDA technology Scope We: simple students or not powerful guys. So we don’t have supercomputer We have Core i5 or Core i7 or another multi-core laptop or PC with support of CUDA technology We have some computational tasks and we want to solve them more effectively Kutergin A. High performance computing with R
  • 40. Motivation and scope Motivation Computers become more productive. Progress in computer’s hardware and software is amazing. These computing power became available even in a laptop Constantly increasing growth of data’s volume and the complexity of problems associated with data processing The emergence of multi-core PCs and CUDA technology Scope We: simple students or not powerful guys. So we don’t have supercomputer We have Core i5 or Core i7 or another multi-core laptop or PC with support of CUDA technology We have some computational tasks and we want to solve them more effectively Kutergin A. High performance computing with R
  • 41. Motivation and scope Motivation Computers become more productive. Progress in computer’s hardware and software is amazing. These computing power became available even in a laptop Constantly increasing growth of data’s volume and the complexity of problems associated with data processing The emergence of multi-core PCs and CUDA technology Scope We: simple students or not powerful guys. So we don’t have supercomputer We have Core i5 or Core i7 or another multi-core laptop or PC with support of CUDA technology We have some computational tasks and we want to solve them more effectively Kutergin A. High performance computing with R
  • 42. Motivation and scope Motivation Computers become more productive. Progress in computer’s hardware and software is amazing. These computing power became available even in a laptop Constantly increasing growth of data’s volume and the complexity of problems associated with data processing The emergence of multi-core PCs and CUDA technology Scope We: simple students or not powerful guys. So we don’t have supercomputer We have Core i5 or Core i7 or another multi-core laptop or PC with support of CUDA technology We have some computational tasks and we want to solve them more effectively Kutergin A. High performance computing with R
  • 43. Motivation and scope Motivation Computers become more productive. Progress in computer’s hardware and software is amazing. These computing power became available even in a laptop Constantly increasing growth of data’s volume and the complexity of problems associated with data processing The emergence of multi-core PCs and CUDA technology Scope We: simple students or not powerful guys. So we don’t have supercomputer We have Core i5 or Core i7 or another multi-core laptop or PC with support of CUDA technology We have some computational tasks and we want to solve them more effectively Kutergin A. High performance computing with R
  • 44. Motivation and scope Motivation Computers become more productive. Progress in computer’s hardware and software is amazing. These computing power became available even in a laptop Constantly increasing growth of data’s volume and the complexity of problems associated with data processing The emergence of multi-core PCs and CUDA technology Scope We: simple students or not powerful guys. So we don’t have supercomputer We have Core i5 or Core i7 or another multi-core laptop or PC with support of CUDA technology We have some computational tasks and we want to solve them more effectively Kutergin A. High performance computing with R
  • 45. Motivation and scope Motivation Computers become more productive. Progress in computer’s hardware and software is amazing. These computing power became available even in a laptop Constantly increasing growth of data’s volume and the complexity of problems associated with data processing The emergence of multi-core PCs and CUDA technology Scope We: simple students or not powerful guys. So we don’t have supercomputer We have Core i5 or Core i7 or another multi-core laptop or PC with support of CUDA technology We have some computational tasks and we want to solve them more effectively Kutergin A. High performance computing with R
  • 46. Motivation and scope Motivation Computers become more productive. Progress in computer’s hardware and software is amazing. These computing power became available even in a laptop Constantly increasing growth of data’s volume and the complexity of problems associated with data processing The emergence of multi-core PCs and CUDA technology Scope We: simple students or not powerful guys. So we don’t have supercomputer We have Core i5 or Core i7 or another multi-core laptop or PC with support of CUDA technology We have some computational tasks and we want to solve them more effectively Kutergin A. High performance computing with R
  • 47. Motivation and scope Motivation Computers become more productive. Progress in computer’s hardware and software is amazing. These computing power became available even in a laptop Constantly increasing growth of data’s volume and the complexity of problems associated with data processing The emergence of multi-core PCs and CUDA technology Scope We: simple students or not powerful guys. So we don’t have supercomputer We have Core i5 or Core i7 or another multi-core laptop or PC with support of CUDA technology We have some computational tasks and we want to solve them more effectively Kutergin A. High performance computing with R
  • 48. Motivation and scope Motivation Computers become more productive. Progress in computer’s hardware and software is amazing. These computing power became available even in a laptop Constantly increasing growth of data’s volume and the complexity of problems associated with data processing The emergence of multi-core PCs and CUDA technology Scope We: simple students or not powerful guys. So we don’t have supercomputer We have Core i5 or Core i7 or another multi-core laptop or PC with support of CUDA technology We have some computational tasks and we want to solve them more effectively Kutergin A. High performance computing with R
  • 49. Motivation and scope Motivation Computers become more productive. Progress in computer’s hardware and software is amazing. These computing power became available even in a laptop Constantly increasing growth of data’s volume and the complexity of problems associated with data processing The emergence of multi-core PCs and CUDA technology Scope We: simple students or not powerful guys. So we don’t have supercomputer We have Core i5 or Core i7 or another multi-core laptop or PC with support of CUDA technology We have some computational tasks and we want to solve them more effectively Kutergin A. High performance computing with R
  • 50. Motivation and scope Motivation Computers become more productive. Progress in computer’s hardware and software is amazing. These computing power became available even in a laptop Constantly increasing growth of data’s volume and the complexity of problems associated with data processing The emergence of multi-core PCs and CUDA technology Scope We: simple students or not powerful guys. So we don’t have supercomputer We have Core i5 or Core i7 or another multi-core laptop or PC with support of CUDA technology We have some computational tasks and we want to solve them more effectively Kutergin A. High performance computing with R
  • 51. Motivation and scope Motivation Computers become more productive. Progress in computer’s hardware and software is amazing. These computing power became available even in a laptop Constantly increasing growth of data’s volume and the complexity of problems associated with data processing The emergence of multi-core PCs and CUDA technology Scope We: simple students or not powerful guys. So we don’t have supercomputer We have Core i5 or Core i7 or another multi-core laptop or PC with support of CUDA technology We have some computational tasks and we want to solve them more effectively Kutergin A. High performance computing with R
  • 52. Motivation and scope Motivation Computers become more productive. Progress in computer’s hardware and software is amazing. These computing power became available even in a laptop Constantly increasing growth of data’s volume and the complexity of problems associated with data processing The emergence of multi-core PCs and CUDA technology Scope We: simple students or not powerful guys. So we don’t have supercomputer We have Core i5 or Core i7 or another multi-core laptop or PC with support of CUDA technology We have some computational tasks and we want to solve them more effectively Kutergin A. High performance computing with R
  • 53. The basic ways of speeding up the R-code How to check time of code’s execution? First way to check time of code execution #return CPU (and other) times that expr used s y s t e m . t i m e () s y s t e m . t i m e ( s u m ( r u n i f (10000000) ) ) Second way to check time of code execution #determines how much real and CPU time (in seconds) the currently running R process has already taken p r o c . t i m e () s t a r t _ t i m e < - p r o c . t i m e () s u m ( r u n i f (10000000) ) e n d _ t i m e < - p r o c . t i m e () - s t a r t _ t i m e Kutergin A. High performance computing with R
  • 54. The basic ways of speeding up the R-code How to check time of code’s execution? First way to check time of code execution #return CPU (and other) times that expr used s y s t e m . t i m e () s y s t e m . t i m e ( s u m ( r u n i f (10000000) ) ) Second way to check time of code execution #determines how much real and CPU time (in seconds) the currently running R process has already taken p r o c . t i m e () s t a r t _ t i m e < - p r o c . t i m e () s u m ( r u n i f (10000000) ) e n d _ t i m e < - p r o c . t i m e () - s t a r t _ t i m e Kutergin A. High performance computing with R
  • 55. The basic ways of speeding up the R-code How to check time of code’s execution? First way to check time of code execution #return CPU (and other) times that expr used s y s t e m . t i m e () s y s t e m . t i m e ( s u m ( r u n i f (10000000) ) ) Second way to check time of code execution #determines how much real and CPU time (in seconds) the currently running R process has already taken p r o c . t i m e () s t a r t _ t i m e < - p r o c . t i m e () s u m ( r u n i f (10000000) ) e n d _ t i m e < - p r o c . t i m e () - s t a r t _ t i m e Kutergin A. High performance computing with R
  • 56. The basic ways of speeding up the R-code How to check time of code’s execution? First way to check time of code execution #return CPU (and other) times that expr used s y s t e m . t i m e () s y s t e m . t i m e ( s u m ( r u n i f (10000000) ) ) Second way to check time of code execution #determines how much real and CPU time (in seconds) the currently running R process has already taken p r o c . t i m e () s t a r t _ t i m e < - p r o c . t i m e () s u m ( r u n i f (10000000) ) e n d _ t i m e < - p r o c . t i m e () - s t a r t _ t i m e Kutergin A. High performance computing with R
  • 57. The basic ways of speeding up the R-code How to check time of code’s execution? First way to check time of code execution #return CPU (and other) times that expr used s y s t e m . t i m e () s y s t e m . t i m e ( s u m ( r u n i f (10000000) ) ) Second way to check time of code execution #determines how much real and CPU time (in seconds) the currently running R process has already taken p r o c . t i m e () s t a r t _ t i m e < - p r o c . t i m e () s u m ( r u n i f (10000000) ) e n d _ t i m e < - p r o c . t i m e () - s t a r t _ t i m e Kutergin A. High performance computing with R
  • 58. The basic ways of speeding up the R-code How to check time of code’s execution? First way to check time of code execution #return CPU (and other) times that expr used s y s t e m . t i m e () s y s t e m . t i m e ( s u m ( r u n i f (10000000) ) ) Second way to check time of code execution #determines how much real and CPU time (in seconds) the currently running R process has already taken p r o c . t i m e () s t a r t _ t i m e < - p r o c . t i m e () s u m ( r u n i f (10000000) ) e n d _ t i m e < - p r o c . t i m e () - s t a r t _ t i m e Kutergin A. High performance computing with R
  • 59. The basic ways of speeding up the R-code How to check time of code’s execution? First way to check time of code execution #return CPU (and other) times that expr used s y s t e m . t i m e () s y s t e m . t i m e ( s u m ( r u n i f (10000000) ) ) Second way to check time of code execution #determines how much real and CPU time (in seconds) the currently running R process has already taken p r o c . t i m e () s t a r t _ t i m e < - p r o c . t i m e () s u m ( r u n i f (10000000) ) e n d _ t i m e < - p r o c . t i m e () - s t a r t _ t i m e Kutergin A. High performance computing with R
  • 60. The basic ways of speeding up the R-code Analysis of the effectiveness of programs Function’s profile Let us compare work of universal function lm() and more specific function lm.fit() #Loading some dataset d a t a ( longley ) #Recording profile to file lm.out Rprof ( " l m . o u t " ) #Runnig lm() 1000 times i n v i s i b l e ( r e p l i c a t e (1000 , l m ( Employed ~ . -1 , d a t a = longley ) ) ) #Switch off profiling Rprof ( NULL ) Kutergin A. High performance computing with R
  • 61. The basic ways of speeding up the R-code Analysis of the effectiveness of programs #Preparing data for lm.fit() longleydm < - d a t a . m a t r i x ( d a t a . f r a m e ( longley ) ) #Recording profile to file lm.fit.out Rprof ( " l m . f i t . o u t " ) #Runnig lm.fit() 1000 times i n v i s i b l e ( r e p l i c a t e (1000 , l m . fit ( longleydm [ , -7] , longleydm [ ,7]) ) ) #Switch off profiling Rprof ( NULL ) #Results of profiling summaryRprof ( " l m . o u t " ) $ sampling . t i m e [1] 3.12 summaryRprof ( " l m . f i t . o u t " ) $ sampling . t i m e [1] 0.18 #What a difference! Kutergin A. High performance computing with R
  • 62. The basic ways of speeding up the R-code Analysis of the effectiveness of programs Package profr This package allows you to visualize the results of profiling library (" profr ") p l o t ( p a r s e _ rprof ( " l m . o u t " ) , main = " P r o f i l e ␣ o f ␣ lm () ") p l o t ( p a r s e _ rprof ( " l m . f i t . o u t " ) , main = " P r o f i l e ␣ of ␣ lm . fit () ") Package proftools This package allows you to visualize call graph for a function l i b r a r y (" R g r a p h v i z "); l i b r a r y (" p r o f t o o l s ") lmfitprod < - readProfileData ( " l m . f i t . o u t " ) pl o t P r o f i l e C al l Gr a p h ( lmfitprod ) Kutergin A. High performance computing with R
  • 63. The basic ways of speeding up the R-code Analysis of the effectiveness of programs Kutergin A. High performance computing with R
  • 64. The basic ways of speeding up the R-code Analysis of the effectiveness of programs Сall graph Kutergin A. High performance computing with R
  • 65. The basic ways of speeding up the R-code Analysis of the effectiveness of programs Another example of profiling: its = 2500; d i m = 1750 X = m a t r i x ( r n o r m ( its * d i m ) ,its , d i m ) my . cross . p r o d < - f u n c t i o n ( X ) { C = m a t r i x (0 , n c o l ( X ) , n c o l ( X ) ) f o r ( i in 1: n r o w ( X ) ) { C = C + X [i ,] % o % X [i ,] } return (C) } l i b r a r y ( proftools ) C = my . cross . p r o d ( X ) C1 = t ( X ) % * % X C2 = c r o s s p r o d ( X ) Rprof ( NULL ) p r i n t ( a l l . e q u a l ( C , C1 , C2 ) ) Kutergin A. High performance computing with R
  • 66. The basic ways of speeding up the R-code Analysis of the effectiveness of programs Result: l i b r a r y ( proftools ) profile . data <- readProfileData ( " m a t r i x - m u l t . o u t " ) flatProfile ( p r o f i l e . d a t a ) / total . pct total . t i m e self . pct self . t i m e my . cross . p r o d 87.31 88.36 0.04 0.04 + 49.84 50.44 49.84 50.44 %o% 37.37 37.82 0.00 0.00 outer 37.37 37.82 37.27 37.72 %*% 7.75 7.84 7.75 7.84 crossprod 4.86 4.92 4.86 4.92 t 0.16 0.16 0.06 0.06 t. default 0.10 0.10 0.10 0.10 matrix 0.06 0.06 0.06 0.06 as . vector 0.02 0.02 0.02 0.02 Kutergin A. High performance computing with R
  • 67. The basic ways of speeding up the R-code Vectorization of code Note! Loops in R are slow! You can speed up your code by using operation with vectors and matrix. It’s another style of programming, but you have to use it! #Simple example of vectorization: #component-wise addition of two vectors #Generating some random data #First vector a < - r n o r m ( n = 10000000) #Second vector b < - r n o r m ( n = 10000000) #Vector for result x < - r e p (0 , l e n g t h ( a ) ) Kutergin A. High performance computing with R
  • 68. The basic ways of speeding up the R-code Vectorization of code So, what about results? #Slow way time _1 <- system . time ( f o r ( i in 1: l e n g t h ( a ) ) { x [ i ] < - a [ i ]+ b [ i ] } ) ; t i m e _ 1[3] 36.97 #Fast way t i m e _ 2 < - s y s t e m . t i m e ( x < - a + b ) ; t i m e _ 2[3] 0.04 Acceleration < - t i m e _ 1[3] / t i m e _ 2[3] Acceleration 924.25 #That’s hot!!!! Kutergin A. High performance computing with R
  • 69. The basic ways of speeding up the R-code Using magic of linear algebra Using linear algebra operations #Scalar product #Slow way s t a r t < - p r o c . t i m e () res < - 0 f o r ( i in 1: l e n g t h ( a ) ) { res < - res + a [ i ] * b [ i ] } e n d < - p r o c . t i m e () - s t a r t ; e n d [3] 16.71 #Fast s y s t e m . t i m e ( a % * % b ) [3] 0.09 #Even faster... s y s t e m . t i m e ( s u m ( a * b ) ) [3] 0.08 Kutergin A. High performance computing with R
  • 70. The basic ways of speeding up the R-code Using magic of linear algebra Using linear algebra operations #Matrix multiplication slow version its < - 2500; d i m < - 1750; X < - m a t r i x ( r n o r m ( its * d i m ) ,its , d i m ) X _ transp < - t ( X ) res < - a r r a y ( NA , d i m = c (1750 , 1750) ) s t a r t < - p r o c . t i m e () f o r ( i in 1: n r o w ( X _ transp ) ) { f o r ( j in 1: n c o l ( X ) ) { res [i , j ] < - s u m ( X _ transp [i ,] * X [ , j ]) } } e n d < - p r o c . t i m e () - s t a r t ; e n d [3] 221.67 Kutergin A. High performance computing with R
  • 71. The basic ways of speeding up the R-code Using magic of linear algebra Package BLAS BLAS means: Basic Linear Algebra Subprogram. This package contains the optimized algorithms for linear algebra operations and uses all cores of multi-core machine automatically. #Matrix multiplication fast version #BLAS matrix mult s y s t e m . t i m e ( X _ transp % * % X ) [3] 7.77 #Even faster... s y s t e m . t i m e ( c r o s s p r o d ( X ) ) [3] 4.98 Kutergin A. High performance computing with R
  • 72. The basic ways of speeding up the R-code Using build-in R-functions Package base You can find full list of build-in R-function in the documentation for this package #Let us define a function mySum < - f u n c t i o n ( N ) { sumVal < - 0 f o r ( i in 1: N ) { sumVal < - sumVal + i } r e t u r n ( sumVal ) } s y s t e m . t i m e ( mySum (1000000) ) [3] 0.62 s y s t e m . t i m e ( s u m ( a s . n u m e r i c ( s e q (1 , 1000000) ) ) ) [3] 0.05 Kutergin A. High performance computing with R
  • 73. The basic ways of speeding up the R-code Using build-in R-functions Why are build R-functions faster? R programming language works in interpreter mode. This is always slowly than using the compiled code. So, when you call build-in R-function, you call optimized and compiled code. Also build-in functions are written in more low-level programming language (like C/C++ or FORTRAN) and this provides greater access to the capabilities of the hardware Note! You can select data from vector, matrix, data.frame or array using some condition that applies to row or column of data object. It’s fast and convenient #Extracting only positive values from first column of X its < - 2500; d i m < - 1750; X < - m a t r i x ( r n o r m ( its * d i m ) ,its , d i m ) X [ X [ ,1] >0 , 1] Kutergin A. High performance computing with R
  • 74. The special way of speeding up the R-code Package pnmath Another easy way to get a speed-up is to use the pnmath package in R. This package takes many of the standard math functions in R and replaces them with multi-threaded versions, using OpenMP. Some functions get more of a speed-up than others with pnmath. #Generating random data v1 < - r u n i f (1000) v2 < - r u n i f (100000000) #Time of execution without pnmath s y s t e m . t i m e ( q t u k e y ( v1 ,2 ,3) ) s y s t e m . t i m e ( e x p ( v2 ) ) s y s t e m . t i m e ( s q r t ( v2 ) ) #Time of execution with pnmath l i b r a r y ( pnmath ) s y s t e m . t i m e ( q t u k e y ( v1 ,2 ,3) ) s y s t e m . t i m e ( e x p ( v2 ) ) s y s t e m . t i m e ( s q r t ( v2 ) ) Kutergin A. High performance computing with R
  • 75. Problem of data splitting Our problem: Before you start the calculation you need to split your data set according the number of threads. Another reason is more effective data processing in loops Package iterator The iterators package provides tools for iterating over various R data structures. Iterators are available for vectors, lists, matrices, arrays, data frames and files. By following very simple conventions, new iterators can be written to support any type of data source, such as database queries or dynamically generating data Download You can download this useful package from CRAN (available for Windows!): http: //cran.r-project.org/web/packages/iterators/index.html Kutergin A. High performance computing with R
  • 76. Problem of data splitting Our problem: Before you start the calculation you need to split your data set according the number of threads. Another reason is more effective data processing in loops Package iterator The iterators package provides tools for iterating over various R data structures. Iterators are available for vectors, lists, matrices, arrays, data frames and files. By following very simple conventions, new iterators can be written to support any type of data source, such as database queries or dynamically generating data Download You can download this useful package from CRAN (available for Windows!): http: //cran.r-project.org/web/packages/iterators/index.html Kutergin A. High performance computing with R
  • 77. Problem of data splitting Our problem: Before you start the calculation you need to split your data set according the number of threads. Another reason is more effective data processing in loops Package iterator The iterators package provides tools for iterating over various R data structures. Iterators are available for vectors, lists, matrices, arrays, data frames and files. By following very simple conventions, new iterators can be written to support any type of data source, such as database queries or dynamically generating data Download You can download this useful package from CRAN (available for Windows!): http: //cran.r-project.org/web/packages/iterators/index.html Kutergin A. High performance computing with R
  • 78. Problem of data splitting Our problem: Before you start the calculation you need to split your data set according the number of threads. Another reason is more effective data processing in loops Package iterator The iterators package provides tools for iterating over various R data structures. Iterators are available for vectors, lists, matrices, arrays, data frames and files. By following very simple conventions, new iterators can be written to support any type of data source, such as database queries or dynamically generating data Download You can download this useful package from CRAN (available for Windows!): http: //cran.r-project.org/web/packages/iterators/index.html Kutergin A. High performance computing with R
  • 79. Problem of data splitting: package iterators Capabilities icount(count) This method returns the iterator that counts starting from one. Count - number of times that iterator will be fire. If not specified, it will count forever nextElem() This function returns next value of pre-define iterator. When the iterator has no more values, it calls stop with massage "StopIteration" l i b r a r y ( iterators ) #create an iterator that counts from 1 to 3. it < - icount (2) nextElem ( it ) Example: [1] 1 nextElem ( it ) [1] 2 t r y ( nextElem ( it ) ) # expect a StopIteration exception Error : StopIteration Kutergin A. High performance computing with R
  • 80. Problem of data splitting: package iterators Capabilities You can create iterators by rows of your data structure using iter() function: l i b r a r y ( iterators ) #Creating iterator by rows of data set irState < - iter ( state . x77 , b y = " r o w " ) nextElem ( irState ) Population Income Illiteracy Life Murder Area Alabama 3615 3624 2.1 69.05 15.1 50708 nextElem ( irState ) Population Income Illiteracy Life Murder Area Alaska 365 6315 1.5 69.31 11.3 566432 nextElem ( irState ) Population Income Illiteracy Life Murder Area Arizona 2212 4530 1.8 70.55 7.8 113417 Kutergin A. High performance computing with R
  • 81. Problem of data splitting: package iterators Capabilities You can create iterators by columns of your data structure using iter() #Creating iterator by columns of data set icState < - iter ( state . x77 , b y = " c o l " ) nextElem ( icState ) Population Alabama 3615 Alaska 365 Arizona 2212 nextElem ( icState ) function: Illiteracy Alabama 2.1 Alaska 1.5 Arizona 1.8 nextElem ( icState ) Income Alabama 3624 Alaska 6315 Arizona 4530 Kutergin A. High performance computing with R
  • 82. Problem of data splitting: package iterators Capabilities You can create iterators using iter() function from data object returned by some other function: l i b r a r y ( iterators ) #Define a function, wich generate random data GetDataStructure < - f u n c t i o n ( meanVal1 , meanVal2 , sdVal1 , sdVal2 ) { a < - r n o r m (4 , m e a n = meanVal1 , s d = sdVal1 ) b < - r n o r m (4 , m e a n = meanVal2 , s d = sdVal2 ) data <- a%o%b return ( data ) } ifun < - iter ( GetDataStructure (25 ,27 ,2.5 ,3.5) , b y = " r o w " ) nextElem ( ifun ) ; nextElem ( ifun ) [ ,1] [ ,2] [ ,3] [ ,4] [1 ,] 701.7055 939.6574 764.7724 799.6965 [ ,1] [ ,2] [ ,3] [ ,4] [1 ,] 647.6349 867.2512 705.8422 738.0752 Kutergin A. High performance computing with R
  • 83. Problem of data splitting: package iterators Capabilities idiv(n, chunk, chunksize) This is more interesting iterator. It provides the ability to divide a numeric value into pieces n - number of times that iterator will fire. If not specified, it will count forever chunks - the number of pieces that n should be divided into. It useful when you know the number of pieces that you want. If specified, the chunkSize should not be chunkSize - the maximum size of the pieces, that n should be divided into. It is useful when you know the size of the pieces that you want. If specified, the chunk should not be Some thoughts... However, practical application of this iterator is unclear. Perhaps it can be used to index vector or rows/columns of arrays Kutergin A. High performance computing with R
  • 84. Problem of data splitting: package iterators Capabilities idiv(n, chunk, chunksize) This is more interesting iterator. It provides the ability to divide a numeric value into pieces n - number of times that iterator will fire. If not specified, it will count forever chunks - the number of pieces that n should be divided into. It useful when you know the number of pieces that you want. If specified, the chunkSize should not be chunkSize - the maximum size of the pieces, that n should be divided into. It is useful when you know the size of the pieces that you want. If specified, the chunk should not be Some thoughts... However, practical application of this iterator is unclear. Perhaps it can be used to index vector or rows/columns of arrays Kutergin A. High performance computing with R
  • 85. Problem of data splitting: package iterators Capabilities idiv(n, chunk, chunksize) This is more interesting iterator. It provides the ability to divide a numeric value into pieces n - number of times that iterator will fire. If not specified, it will count forever chunks - the number of pieces that n should be divided into. It useful when you know the number of pieces that you want. If specified, the chunkSize should not be chunkSize - the maximum size of the pieces, that n should be divided into. It is useful when you know the size of the pieces that you want. If specified, the chunk should not be Some thoughts... However, practical application of this iterator is unclear. Perhaps it can be used to index vector or rows/columns of arrays Kutergin A. High performance computing with R
  • 86. Problem of data splitting: package iterators Capabilities idiv(n, chunk, chunksize) This is more interesting iterator. It provides the ability to divide a numeric value into pieces n - number of times that iterator will fire. If not specified, it will count forever chunks - the number of pieces that n should be divided into. It useful when you know the number of pieces that you want. If specified, the chunkSize should not be chunkSize - the maximum size of the pieces, that n should be divided into. It is useful when you know the size of the pieces that you want. If specified, the chunk should not be Some thoughts... However, practical application of this iterator is unclear. Perhaps it can be used to index vector or rows/columns of arrays Kutergin A. High performance computing with R
  • 87. Problem of data splitting: package iterators Capabilities idiv(n, chunk, chunksize) This is more interesting iterator. It provides the ability to divide a numeric value into pieces n - number of times that iterator will fire. If not specified, it will count forever chunks - the number of pieces that n should be divided into. It useful when you know the number of pieces that you want. If specified, the chunkSize should not be chunkSize - the maximum size of the pieces, that n should be divided into. It is useful when you know the size of the pieces that you want. If specified, the chunk should not be Some thoughts... However, practical application of this iterator is unclear. Perhaps it can be used to index vector or rows/columns of arrays Kutergin A. High performance computing with R
  • 88. Problem of data splitting: package iterators Capabilities idiv(n, chunk, chunksize) This is more interesting iterator. It provides the ability to divide a numeric value into pieces n - number of times that iterator will fire. If not specified, it will count forever chunks - the number of pieces that n should be divided into. It useful when you know the number of pieces that you want. If specified, the chunkSize should not be chunkSize - the maximum size of the pieces, that n should be divided into. It is useful when you know the size of the pieces that you want. If specified, the chunk should not be Some thoughts... However, practical application of this iterator is unclear. Perhaps it can be used to index vector or rows/columns of arrays Kutergin A. High performance computing with R
  • 89. Problem of data splitting: package iterators Capabilities idiv(n, chunk, chunksize) This is more interesting iterator. It provides the ability to divide a numeric value into pieces n - number of times that iterator will fire. If not specified, it will count forever chunks - the number of pieces that n should be divided into. It useful when you know the number of pieces that you want. If specified, the chunkSize should not be chunkSize - the maximum size of the pieces, that n should be divided into. It is useful when you know the size of the pieces that you want. If specified, the chunk should not be Some thoughts... However, practical application of this iterator is unclear. Perhaps it can be used to index vector or rows/columns of arrays Kutergin A. High performance computing with R
  • 90. Problem of data splitting: package iterators Capabilities idiv(n, chunk, chunksize) This is more interesting iterator. It provides the ability to divide a numeric value into pieces n - number of times that iterator will fire. If not specified, it will count forever chunks - the number of pieces that n should be divided into. It useful when you know the number of pieces that you want. If specified, the chunkSize should not be chunkSize - the maximum size of the pieces, that n should be divided into. It is useful when you know the size of the pieces that you want. If specified, the chunk should not be Some thoughts... However, practical application of this iterator is unclear. Perhaps it can be used to index vector or rows/columns of arrays Kutergin A. High performance computing with R
  • 91. Problem of data splitting: package iterators Capabilities idiv(n, chunk, chunksize) This is more interesting iterator. It provides the ability to divide a numeric value into pieces n - number of times that iterator will fire. If not specified, it will count forever chunks - the number of pieces that n should be divided into. It useful when you know the number of pieces that you want. If specified, the chunkSize should not be chunkSize - the maximum size of the pieces, that n should be divided into. It is useful when you know the size of the pieces that you want. If specified, the chunk should not be Some thoughts... However, practical application of this iterator is unclear. Perhaps it can be used to index vector or rows/columns of arrays Kutergin A. High performance computing with R
  • 92. Problem of data splitting: package iterators Capabilities idiv(n, chunk, chunksize) This is more interesting iterator. It provides the ability to divide a numeric value into pieces n - number of times that iterator will fire. If not specified, it will count forever chunks - the number of pieces that n should be divided into. It useful when you know the number of pieces that you want. If specified, the chunkSize should not be chunkSize - the maximum size of the pieces, that n should be divided into. It is useful when you know the size of the pieces that you want. If specified, the chunk should not be Some thoughts... However, practical application of this iterator is unclear. Perhaps it can be used to index vector or rows/columns of arrays Kutergin A. High performance computing with R
  • 93. Problem of data splitting: package iterators Capabilities Example: l i b r a r y ( iterators ) # divide the value 10 into 3 pieces it < - idiv (10 , chunks =3) nextElem ( it ) [1] 4 nextElem ( it ) [1] 3 nextElem ( it ) [1] 3 t r y ( nextElem ( it ) ) # expect a StopIteration exception Error : StopIteration Kutergin A. High performance computing with R
  • 94. Problem of data splitting: package iterators Capabilities Example: l i b r a r y ( iterators ) # divide the value 10 into pieces no larger than 3 it < - idiv (10 , chunkSize =3) nextElem ( it ) [1] 3 nextElem ( it ) [1] 3 nextElem ( it ) [1] 2 nextElem ( it ) [1] 2 t r y ( nextElem ( it ) ) # expect a StopIteration exception Error : StopIteration Kutergin A. High performance computing with R
  • 95. Problem of data splitting: package iterators Capabilities iread.table(file,...., verbose = FALSE) This is very important iterator. It returns an iterator object over the rows of the data frame stored in a file in table format file - the name of the file to read data from ... - all additional arguments are passed on to the read.table function. See the documentation for read.table for more information verbose - logical flag indicating whether or not to print the calls to read.table Note! In this version of iread.table, both the read.table arguments header and row.names must be specified. This is because the default values of this arguments depend on the contents of the beginning of the file. In order to make the subsequent calls to read.table work consistently, the user must specified those arguments explicitly Kutergin A. High performance computing with R
  • 96. Problem of data splitting: package iterators Capabilities iread.table(file,...., verbose = FALSE) This is very important iterator. It returns an iterator object over the rows of the data frame stored in a file in table format file - the name of the file to read data from ... - all additional arguments are passed on to the read.table function. See the documentation for read.table for more information verbose - logical flag indicating whether or not to print the calls to read.table Note! In this version of iread.table, both the read.table arguments header and row.names must be specified. This is because the default values of this arguments depend on the contents of the beginning of the file. In order to make the subsequent calls to read.table work consistently, the user must specified those arguments explicitly Kutergin A. High performance computing with R
  • 97. Problem of data splitting: package iterators Capabilities iread.table(file,...., verbose = FALSE) This is very important iterator. It returns an iterator object over the rows of the data frame stored in a file in table format file - the name of the file to read data from ... - all additional arguments are passed on to the read.table function. See the documentation for read.table for more information verbose - logical flag indicating whether or not to print the calls to read.table Note! In this version of iread.table, both the read.table arguments header and row.names must be specified. This is because the default values of this arguments depend on the contents of the beginning of the file. In order to make the subsequent calls to read.table work consistently, the user must specified those arguments explicitly Kutergin A. High performance computing with R
  • 98. Problem of data splitting: package iterators Capabilities iread.table(file,...., verbose = FALSE) This is very important iterator. It returns an iterator object over the rows of the data frame stored in a file in table format file - the name of the file to read data from ... - all additional arguments are passed on to the read.table function. See the documentation for read.table for more information verbose - logical flag indicating whether or not to print the calls to read.table Note! In this version of iread.table, both the read.table arguments header and row.names must be specified. This is because the default values of this arguments depend on the contents of the beginning of the file. In order to make the subsequent calls to read.table work consistently, the user must specified those arguments explicitly Kutergin A. High performance computing with R
  • 99. Problem of data splitting: package iterators Capabilities iread.table(file,...., verbose = FALSE) This is very important iterator. It returns an iterator object over the rows of the data frame stored in a file in table format file - the name of the file to read data from ... - all additional arguments are passed on to the read.table function. See the documentation for read.table for more information verbose - logical flag indicating whether or not to print the calls to read.table Note! In this version of iread.table, both the read.table arguments header and row.names must be specified. This is because the default values of this arguments depend on the contents of the beginning of the file. In order to make the subsequent calls to read.table work consistently, the user must specified those arguments explicitly Kutergin A. High performance computing with R
  • 100. Problem of data splitting: package iterators Capabilities iread.table(file,...., verbose = FALSE) This is very important iterator. It returns an iterator object over the rows of the data frame stored in a file in table format file - the name of the file to read data from ... - all additional arguments are passed on to the read.table function. See the documentation for read.table for more information verbose - logical flag indicating whether or not to print the calls to read.table Note! In this version of iread.table, both the read.table arguments header and row.names must be specified. This is because the default values of this arguments depend on the contents of the beginning of the file. In order to make the subsequent calls to read.table work consistently, the user must specified those arguments explicitly Kutergin A. High performance computing with R
  • 101. Problem of data splitting: package iterators Capabilities iread.table(file,...., verbose = FALSE) This is very important iterator. It returns an iterator object over the rows of the data frame stored in a file in table format file - the name of the file to read data from ... - all additional arguments are passed on to the read.table function. See the documentation for read.table for more information verbose - logical flag indicating whether or not to print the calls to read.table Note! In this version of iread.table, both the read.table arguments header and row.names must be specified. This is because the default values of this arguments depend on the contents of the beginning of the file. In order to make the subsequent calls to read.table work consistently, the user must specified those arguments explicitly Kutergin A. High performance computing with R
  • 102. Problem of data splitting: package iterators Capabilities iread.table(file,...., verbose = FALSE) This is very important iterator. It returns an iterator object over the rows of the data frame stored in a file in table format file - the name of the file to read data from ... - all additional arguments are passed on to the read.table function. See the documentation for read.table for more information verbose - logical flag indicating whether or not to print the calls to read.table Note! In this version of iread.table, both the read.table arguments header and row.names must be specified. This is because the default values of this arguments depend on the contents of the beginning of the file. In order to make the subsequent calls to read.table work consistently, the user must specified those arguments explicitly Kutergin A. High performance computing with R
  • 103. Problem of data splitting: package iterators Capabilities iread.table(file,...., verbose = FALSE) This is very important iterator. It returns an iterator object over the rows of the data frame stored in a file in table format file - the name of the file to read data from ... - all additional arguments are passed on to the read.table function. See the documentation for read.table for more information verbose - logical flag indicating whether or not to print the calls to read.table Note! In this version of iread.table, both the read.table arguments header and row.names must be specified. This is because the default values of this arguments depend on the contents of the beginning of the file. In order to make the subsequent calls to read.table work consistently, the user must specified those arguments explicitly Kutergin A. High performance computing with R
  • 104. Problem of data splitting: package iterators Capabilities iread.table(file,...., verbose = FALSE) This is very important iterator. It returns an iterator object over the rows of the data frame stored in a file in table format file - the name of the file to read data from ... - all additional arguments are passed on to the read.table function. See the documentation for read.table for more information verbose - logical flag indicating whether or not to print the calls to read.table Note! In this version of iread.table, both the read.table arguments header and row.names must be specified. This is because the default values of this arguments depend on the contents of the beginning of the file. In order to make the subsequent calls to read.table work consistently, the user must specified those arguments explicitly Kutergin A. High performance computing with R
  • 105. Problem of data splitting: package iterators Capabilities Example: l i b r a r y ( iterators ) #Gnerating random data its < - 2000000; d i m < - 3; d a t a < - m a t r i x ( r n o r m ( its * d i m ) ,its , d i m ) #Writing them to HDD DATA _ PATH < - " E : / R _ w o r k s / d a t a . t x t " #Size of this file - 123 Mb w r i t e . t a b l e ( d a t a , f i l e = DATA _ PATH , a p p e n d = FALSE , sep = " t " , dec = " . " ) Kutergin A. High performance computing with R
  • 106. Problem of data splitting: package iterators Capabilities #Creating an iterator from these file ifile < - iread . t a b l e ( DATA _ PATH , header = TRUE , r o w . n a m e s = NULL , verbose = FALSE ) row . names V1 V2 V3 1 1 -1.042623 -1.386382 0.399798 > nextElem ( ifile ) row . names V1 V2 V3 1 2 0.8841238 -1.296501 0.1580505 > nextElem ( ifile ) row . names V1 V2 V3 1 3 -0.3195784 -0.6830442 0.3647958 #It works very fast!!!! #remove the file f i l e . r e m o v e ( DATA _ PATH ) Kutergin A. High performance computing with R
  • 107. Problem of data splitting: package iterators Capabilities isplit(x, f, drop = FALSE) Another important type of iterator. It returns the the iterator that divides the data in the vector x into the groups define by f x - vector or data frame of values to be split into groups f - a factor or list of factors used to categorize x drop - logical indicating if levels that do not occur should be dropped More detailed information you can find in documentation Note! This is very useful! For example, you have data-vector and vector containing values of the factor corresponding these data. Factor has pre-defined levels. Thus, you can extract data in loop for each of the levels of the factor without additional operations. Also you can define in loop’s body some conditions for each level of the factor and use this condition as a condition for if() control structures Kutergin A. High performance computing with R
  • 108. Problem of data splitting: package iterators Capabilities isplit(x, f, drop = FALSE) Another important type of iterator. It returns the the iterator that divides the data in the vector x into the groups define by f x - vector or data frame of values to be split into groups f - a factor or list of factors used to categorize x drop - logical indicating if levels that do not occur should be dropped More detailed information you can find in documentation Note! This is very useful! For example, you have data-vector and vector containing values of the factor corresponding these data. Factor has pre-defined levels. Thus, you can extract data in loop for each of the levels of the factor without additional operations. Also you can define in loop’s body some conditions for each level of the factor and use this condition as a condition for if() control structures Kutergin A. High performance computing with R
  • 109. Problem of data splitting: package iterators Capabilities isplit(x, f, drop = FALSE) Another important type of iterator. It returns the the iterator that divides the data in the vector x into the groups define by f x - vector or data frame of values to be split into groups f - a factor or list of factors used to categorize x drop - logical indicating if levels that do not occur should be dropped More detailed information you can find in documentation Note! This is very useful! For example, you have data-vector and vector containing values of the factor corresponding these data. Factor has pre-defined levels. Thus, you can extract data in loop for each of the levels of the factor without additional operations. Also you can define in loop’s body some conditions for each level of the factor and use this condition as a condition for if() control structures Kutergin A. High performance computing with R
  • 110. Problem of data splitting: package iterators Capabilities isplit(x, f, drop = FALSE) Another important type of iterator. It returns the the iterator that divides the data in the vector x into the groups define by f x - vector or data frame of values to be split into groups f - a factor or list of factors used to categorize x drop - logical indicating if levels that do not occur should be dropped More detailed information you can find in documentation Note! This is very useful! For example, you have data-vector and vector containing values of the factor corresponding these data. Factor has pre-defined levels. Thus, you can extract data in loop for each of the levels of the factor without additional operations. Also you can define in loop’s body some conditions for each level of the factor and use this condition as a condition for if() control structures Kutergin A. High performance computing with R
  • 111. Problem of data splitting: package iterators Capabilities isplit(x, f, drop = FALSE) Another important type of iterator. It returns the the iterator that divides the data in the vector x into the groups define by f x - vector or data frame of values to be split into groups f - a factor or list of factors used to categorize x drop - logical indicating if levels that do not occur should be dropped More detailed information you can find in documentation Note! This is very useful! For example, you have data-vector and vector containing values of the factor corresponding these data. Factor has pre-defined levels. Thus, you can extract data in loop for each of the levels of the factor without additional operations. Also you can define in loop’s body some conditions for each level of the factor and use this condition as a condition for if() control structures Kutergin A. High performance computing with R
  • 112. Problem of data splitting: package iterators Capabilities isplit(x, f, drop = FALSE) Another important type of iterator. It returns the the iterator that divides the data in the vector x into the groups define by f x - vector or data frame of values to be split into groups f - a factor or list of factors used to categorize x drop - logical indicating if levels that do not occur should be dropped More detailed information you can find in documentation Note! This is very useful! For example, you have data-vector and vector containing values of the factor corresponding these data. Factor has pre-defined levels. Thus, you can extract data in loop for each of the levels of the factor without additional operations. Also you can define in loop’s body some conditions for each level of the factor and use this condition as a condition for if() control structures Kutergin A. High performance computing with R
  • 113. Problem of data splitting: package iterators Capabilities isplit(x, f, drop = FALSE) Another important type of iterator. It returns the the iterator that divides the data in the vector x into the groups define by f x - vector or data frame of values to be split into groups f - a factor or list of factors used to categorize x drop - logical indicating if levels that do not occur should be dropped More detailed information you can find in documentation Note! This is very useful! For example, you have data-vector and vector containing values of the factor corresponding these data. Factor has pre-defined levels. Thus, you can extract data in loop for each of the levels of the factor without additional operations. Also you can define in loop’s body some conditions for each level of the factor and use this condition as a condition for if() control structures Kutergin A. High performance computing with R
  • 114. Problem of data splitting: package iterators Capabilities isplit(x, f, drop = FALSE) Another important type of iterator. It returns the the iterator that divides the data in the vector x into the groups define by f x - vector or data frame of values to be split into groups f - a factor or list of factors used to categorize x drop - logical indicating if levels that do not occur should be dropped More detailed information you can find in documentation Note! This is very useful! For example, you have data-vector and vector containing values of the factor corresponding these data. Factor has pre-defined levels. Thus, you can extract data in loop for each of the levels of the factor without additional operations. Also you can define in loop’s body some conditions for each level of the factor and use this condition as a condition for if() control structures Kutergin A. High performance computing with R
  • 115. Problem of data splitting: package iterators Capabilities isplit(x, f, drop = FALSE) Another important type of iterator. It returns the the iterator that divides the data in the vector x into the groups define by f x - vector or data frame of values to be split into groups f - a factor or list of factors used to categorize x drop - logical indicating if levels that do not occur should be dropped More detailed information you can find in documentation Note! This is very useful! For example, you have data-vector and vector containing values of the factor corresponding these data. Factor has pre-defined levels. Thus, you can extract data in loop for each of the levels of the factor without additional operations. Also you can define in loop’s body some conditions for each level of the factor and use this condition as a condition for if() control structures Kutergin A. High performance computing with R
  • 116. Problem of data splitting: package iterators Capabilities isplit(x, f, drop = FALSE) Another important type of iterator. It returns the the iterator that divides the data in the vector x into the groups define by f x - vector or data frame of values to be split into groups f - a factor or list of factors used to categorize x drop - logical indicating if levels that do not occur should be dropped More detailed information you can find in documentation Note! This is very useful! For example, you have data-vector and vector containing values of the factor corresponding these data. Factor has pre-defined levels. Thus, you can extract data in loop for each of the levels of the factor without additional operations. Also you can define in loop’s body some conditions for each level of the factor and use this condition as a condition for if() control structures Kutergin A. High performance computing with R
  • 117. Problem of data splitting: package iterators Capabilities x < - r n o r m (200) f < - f a c t o r ( s a m p l e (1:10 , l e n g t h ( x ) , r e p l a c e = TRUE ) ) it < - isplit (x , f ) nextElem ( it ) $ value [1] 0.14087878 -0.94439161 0.13593045 [4] -0.25732860 0.09422130 -0.55166303 [7] -0.18325419 -0.00871019 0.38344388 [10] -1.05761926 1.16126462 -0.02280205 [13] -0.67338941 1.68724264 0.92112983 [16] 1.39782337 -0.51060989 $ key $ key [[1]] [1] " 1 " Kutergin A. High performance computing with R
  • 118. Problem of data splitting: package iterators Capabilities Special types of iterators Also there are special types of iterators. Like: irnorm(..., cont) or irunif(..., count). These function returns an iterator that return random number of various distributions. Each one is a wrapper around a standard R function count - number of times that the iterator will fire. If not specified, it will fire values forever ... - arguments to pass to the underling rnorm function Example: # create an iterator that returns three random numbers it < - irnorm (1 , c o u n t =2) nextElem ( it ) ; nextElem ( it ) [1] 0.1592311 [1] -1.387449 t r y ( nextElem ( it ) ) # expect a StopIteration exception Error : StopIteration Kutergin A. High performance computing with R
  • 119. Parallel computation with R: high-level parallelism packages: parallel, snow Scope High-level parallelism means that you do not need to define ideology of communication between thread. Which process is master, which processes are slaves? You only initialize parallel environment and work inside it. All the details are on the shoulders of the package’s methods Package: snow Package contains the basic function allow you to create different type of clusters on a multicore machine Package: parallel This package is an add-on packages multicore and snow and provides drop- in replacements for most of the functionality of those packages Kutergin A. High performance computing with R
  • 120. Parallel computation with R: high-level parallelism packages: parallel, snow Scope High-level parallelism means that you do not need to define ideology of communication between thread. Which process is master, which processes are slaves? You only initialize parallel environment and work inside it. All the details are on the shoulders of the package’s methods Package: snow Package contains the basic function allow you to create different type of clusters on a multicore machine Package: parallel This package is an add-on packages multicore and snow and provides drop- in replacements for most of the functionality of those packages Kutergin A. High performance computing with R
  • 121. Parallel computation with R: high-level parallelism packages: parallel, snow Scope High-level parallelism means that you do not need to define ideology of communication between thread. Which process is master, which processes are slaves? You only initialize parallel environment and work inside it. All the details are on the shoulders of the package’s methods Package: snow Package contains the basic function allow you to create different type of clusters on a multicore machine Package: parallel This package is an add-on packages multicore and snow and provides drop- in replacements for most of the functionality of those packages Kutergin A. High performance computing with R