Introduction
R is a -
• A Programming Language
• A Statistical Package
• An Interpreter
• Open Source
• Object Oriented Language
Continue...
R is a programming language and software
environment for statistical computing and
graphics supported by the R Foundation for
Statistical Computing.
The R language is widely used
among statisticians and data miners for
developing statistical software and data analysis.
Polls, surveys of data miners, and studies of
scholarly literature databases show that R's
popularity has increased substantially in recent
years
Evolution Of R Language
R is an implementation of the S programming
language.
S was created by John Chambers while at Bell Labs.
R was created by Ross Ihaka and Robert
Gentleman at the University of Auckland, New
Zealand, and is currently developed by the R
Development Core Team, of which Chambers is a
member. R is named partly after the first names of
the first two R authors and partly as a play on the
name of S.
statistical Programming Language
S version1
S version2
S version 3
S version4
developed 30 years
ago for research applied to the
high-tech industry
R
Features of R Language
As stated earlier, R is a programming language and
software environment for statistical analysis, graphics
representation and reporting. The following are the
important features of R: -
R is a well-developed, simple and effective
programming language which includes conditionals,
loops, user defined recursive functions and input and
output facilities.
R has an effective data handling and storage facility.
R provides a suite of operators for calculations on
arrays, lists, vectors and matrices.
Continue...
R provides a large, coherent and integrated
collection of tools for data analysis.
R provides graphical facilities for data analysis and
display either directly at the computer or printing
at the papers.
As a conclusion, R is world’s most widely used
statistics programming language. It's the # 1 choice
of data scientists and supported by a vibrant and
talented community of contributors. R is taught in
universities and deployed in mission critical
business applications.
Continue...
The project was conceived in 1992, with an initial
version released in 1995 and a stable beta version
in 2000.
Current stable version of R is 3.3.2 released on
October 31, 2016.
Data types in R
Generally, while doing programming in any
programming language, you need to use various
variables to store various information. Variables
are nothing but reserved memory locations to
store values. This means that, when you create a
variable you reserve some space in memory.
In contrast to other programming languages like C
and java in R, the variables are not declared as
some data type. The variables are assigned with R-
Objects and the data type of the R-object becomes
the data type of the variable.
Continue...
There are many types of R-objects. The frequently
used ones are:
Vectors
Lists
Matrices
Arrays
Data Frames
Continue...
A vector is a sequence of data elements of the same
basic type.
The simplest of these objects is the vector object and
there are six data types of these atomic vectors, also
termed as six classes of vectors. The other R-Objects
are built upon the atomic vectors.
Continue...Data type Example verify
Logical TRUE , FALSE v <- TRUE
print(class(v))
it produces the following
result:-
[1] "logical"
Numeric 12.3, 5, 999 v <- 23.5
print(class(v))
it produces the following
result:
[1] "numeric"
Integer 2L, 34L, 0L v <- 2L
print(class(v))
it produces the following
result:
[1] "complex"
Continue...
Data Type Example Verify
Complex 2+5i v <- 2+5i
print(class(v))
It print the following
result:
[1] "complex"
Character 'a' , '"good", "TRUE", '23.4' v <- "TRUE"
print(class(v))
It print the following
result
[1] "character"
Raw Hello" is stored as 48 65
6c 6c 6f
v <- charToRaw("Hello")
print(class(v))
It print the following
result:
[1] "raw"
Interacting with R
RStudio is a free and open-source integrated
development environment (IDE) for R,
a programming language for statistical
computing and graphics.
RStudio was founded by JJ Allaire,creator of the
programming language ColdFusion.
Continue...
RStudio is available in two editions: RStudio
Desktop, where the program is run locally as a
regular desktop application.
RStudio Server, Prepackaged distributions of
RStudio Desktop are available for Windows, OS X,
and Linux.
RStudio is written in the C++ programming
language and uses the Qt framework for
its graphical user interface.
Continue...
Sas:- SAS (Statistical Analysis System) is a software
suite developed by SAS Institute for advanced
analytics, multivariate analyses, business
intelligence, data management, and predictive
analytics.
SAS was developed at North Carolina State
University from 1966 until 1976, when SAS Institute
was incorporated. SAS was further developed in the
1980s and 1990s with the addition of new statistical
procedures, additional components
Continue...
SAS is an expensive tool whereas R is free.
Algorithms used in SAS procedures are not open to
public so you cannot do research on that. Whereas
R is all transparent.
R has advanced graphical capabilities. Supports
various professional graphics templates.
New statistical and machine learning techniques
implemented in R much more quickly than SAS.
500 lines of SAS code can be equivalent to 100 lines
of R code
Continue...
Time Series Forecasting - Need to purchase SAS
ETS Module. It is free in R
Text Mining - Need to purchase SAS Enterprise /
Text Miner. It is free in R
Machine Learning - Need to purchase SAS
Enterprise Miner. It is free in R
Online Reporting - Need to purchase SAS Visual
Analytics. It is free in R with shiny package.
Advantages of R
Free open source philosophy.
R has over 4800 packages available from multiple
repositories specializing in topics like
econometrics, data mining, spatial analysis, and
bio-informatics.
Online help and discussion.
R visualization capabilities .
Interface with other languages and scripting
capabilities
Continue...
Real data have missing values. Missing values are
an integral part of the R language. Many functions
have arguments that control how missing values
are to be handled.
Solution of big data .
Disadvantages of R
R has a steep learning curve it does take a while to
get used to the power of R but no steeper than for
other statistical languages. ˆR is not so easy to use
for the novice.
No default parallel execution.
Top skills needed for high performance
computing.
Continue...
Memory management, speed, and efficiency are
probably the biggest challenges R faces.
Poor management of large dataset.
Complicated structure of packages in R .
Capabilities such as security were not built into the
R language, Also, R cannot be embedded in a Web
browser.
A high-level programming language
Some other points
Hadoop and R are a natural match and are quite
complementary in terms of visualization and analytics
of big data.
Rhipe is an R library which allows running a map
reduce job.