Contenu connexe


R programming language

  1. Keerti Verma AP,OCT
  2. Introduction R is a - • A Programming Language • A Statistical Package • An Interpreter • Open Source • Object Oriented Language
  3. Continue...  R is a programming language and software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing.  The R language is widely used among statisticians and data miners for developing statistical software and data analysis.  Polls, surveys of data miners, and studies of scholarly literature databases show that R's popularity has increased substantially in recent years
  4. Continue...
  5. Continue...
  6. Evolution Of R Language  R is an implementation of the S programming language.  S was created by John Chambers while at Bell Labs.  R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and is currently developed by the R Development Core Team, of which Chambers is a member. R is named partly after the first names of the first two R authors and partly as a play on the name of S.
  7. statistical Programming Language S version1 S version2 S version 3 S version4 developed 30 years ago for research applied to the high-tech industry R
  8. Features of R Language  As stated earlier, R is a programming language and software environment for statistical analysis, graphics representation and reporting. The following are the important features of R: -  R is a well-developed, simple and effective programming language which includes conditionals, loops, user defined recursive functions and input and output facilities.  R has an effective data handling and storage facility.  R provides a suite of operators for calculations on arrays, lists, vectors and matrices.
  9. Continue...  R provides a large, coherent and integrated collection of tools for data analysis.  R provides graphical facilities for data analysis and display either directly at the computer or printing at the papers.  As a conclusion, R is world’s most widely used statistics programming language. It's the # 1 choice of data scientists and supported by a vibrant and talented community of contributors. R is taught in universities and deployed in mission critical business applications.
  10. Continue...  The project was conceived in 1992, with an initial version released in 1995 and a stable beta version in 2000.  Current stable version of R is 3.3.2 released on October 31, 2016.
  11. Data types in R  Generally, while doing programming in any programming language, you need to use various variables to store various information. Variables are nothing but reserved memory locations to store values. This means that, when you create a variable you reserve some space in memory.  In contrast to other programming languages like C and java in R, the variables are not declared as some data type. The variables are assigned with R- Objects and the data type of the R-object becomes the data type of the variable.
  12. Continue...  There are many types of R-objects. The frequently used ones are:  Vectors  Lists  Matrices  Arrays  Data Frames
  13. Continue...  A vector is a sequence of data elements of the same basic type.  The simplest of these objects is the vector object and there are six data types of these atomic vectors, also termed as six classes of vectors. The other R-Objects are built upon the atomic vectors.
  14. Continue...Data type Example verify Logical TRUE , FALSE v <- TRUE print(class(v)) it produces the following result:- [1] "logical" Numeric 12.3, 5, 999 v <- 23.5 print(class(v)) it produces the following result: [1] "numeric" Integer 2L, 34L, 0L v <- 2L print(class(v)) it produces the following result: [1] "complex"
  15. Continue... Data Type Example Verify Complex 2+5i v <- 2+5i print(class(v)) It print the following result: [1] "complex" Character 'a' , '"good", "TRUE", '23.4' v <- "TRUE" print(class(v)) It print the following result [1] "character" Raw Hello" is stored as 48 65 6c 6c 6f v <- charToRaw("Hello") print(class(v)) It print the following result: [1] "raw"
  16. Continue...
  17. Interacting with R  RStudio is a free and open-source integrated development environment (IDE) for R, a programming language for statistical computing and graphics.  RStudio was founded by JJ Allaire,creator of the programming language ColdFusion.
  18. Continue...  RStudio is available in two editions: RStudio Desktop, where the program is run locally as a regular desktop application.  RStudio Server, Prepackaged distributions of RStudio Desktop are available for Windows, OS X, and Linux.  RStudio is written in the C++ programming language and uses the Qt framework for its graphical user interface.
  19. Rstudio IDE
  20. Comparison with other statistics software
  21. Continue...  Sas:- SAS (Statistical Analysis System) is a software suite developed by SAS Institute for advanced analytics, multivariate analyses, business intelligence, data management, and predictive analytics.  SAS was developed at North Carolina State University from 1966 until 1976, when SAS Institute was incorporated. SAS was further developed in the 1980s and 1990s with the addition of new statistical procedures, additional components
  22. Continue...  SAS is an expensive tool whereas R is free.  Algorithms used in SAS procedures are not open to public so you cannot do research on that. Whereas R is all transparent.  R has advanced graphical capabilities. Supports various professional graphics templates.  New statistical and machine learning techniques implemented in R much more quickly than SAS. 500 lines of SAS code can be equivalent to 100 lines of R code
  23. Continue...  Time Series Forecasting - Need to purchase SAS ETS Module. It is free in R  Text Mining - Need to purchase SAS Enterprise / Text Miner. It is free in R  Machine Learning - Need to purchase SAS Enterprise Miner. It is free in R  Online Reporting - Need to purchase SAS Visual Analytics. It is free in R with shiny package.
  24. Advantages of R  Free open source philosophy.  R has over 4800 packages available from multiple repositories specializing in topics like econometrics, data mining, spatial analysis, and bio-informatics.  Online help and discussion.  R visualization capabilities .  Interface with other languages and scripting capabilities
  25. Continue...  Real data have missing values. Missing values are an integral part of the R language. Many functions have arguments that control how missing values are to be handled.  Solution of big data .
  26. Disadvantages of R  R has a steep learning curve it does take a while to get used to the power of R but no steeper than for other statistical languages. ˆR is not so easy to use for the novice.  No default parallel execution.  Top skills needed for high performance computing.
  27. Continue...  Memory management, speed, and efficiency are probably the biggest challenges R faces.  Poor management of large dataset.  Complicated structure of packages in R .  Capabilities such as security were not built into the R language, Also, R cannot be embedded in a Web browser.  A high-level programming language
  28. So why learn R??
  29. Some other points  Hadoop and R are a natural match and are quite complementary in terms of visualization and analytics of big data.  Rhipe is an R library which allows running a map reduce job.
  30. References  age)  n-development/r-programming-language-statistical- data-analysis.html  data/learn-to-crunch-big-data-with-r.html  http://www.tutorialpoint.pdf  7s
  31. Thank you