3. Objectives
To know about R
To asertain Characteristics of R
To compare with other proprietary alternatives
To evaluate with the help of an example
4. The predecessor for R is S.
S was developed by John Chanmbers (earlier versions) along
with Rick Becker and Allan Wilks of Bell Laboratories
the project was started on May, 1976.
in 1979, S was ported to UNIX
S-Plus and R happened to be by-products of S. 1
S was available for academic and commercial purposes from
ATΓT Laboratories.
1
Ironically, R stood at top 26 best software languages, where as S and
S-Plus are observed in 100.
5. R began as a research project by Ross Ihaka and
Robert Gentleman at University of Aukland in 1990s.
R is programming language, meant for statistical computing.
R is open source software, supported by volunteers all around
the world. But the central control in the hands of a group
called R-core
The base system provides:
interactive language for numerical computing
data management
graphics
a variety of related calculations
8. Introduction to R
R is an integrated suite of software facilities for data manipulation,
calculation and graphical display. Among other things it has
an effective data handling and storage facility,
a suite of operators for calculations on arrays, in particular
matrices,
a large, coherent, integrated collection of intermediate tools
for data analysis,
graphical facilities for data analysis and display either directly
at the computer or on hardcopy, and
a well developed, simple and effective programming language
(called ’S’) which includes conditionals, loops, user defined
recursive functions and input and output facilities.
(Indeedmost of the system supplied functions are themselves
written in the S language.)
9. Introduction - Continued
R is a GNU project which is similar to the S language and
environment which was developed at Bell Laboratories
(formerly ATT, now Lucent Technologies) by John Chambers
and colleagues.(Please visit http://www.r-project.org/)
R provides a wide variety of statistical and and graphical
techniques, and is highly extensible. some of them are:
Linear Modelling
Non-Linear Modelling
Classical Statistical Tests
time-series Analysis
Classification, clustering
Neural Networks
Social Network Analysis
Linear Programming, integer-programming and etc
and many more.............
10. Introduction - More...
R is Ligua Franca of statistical research
Over all SAS is 11 years behind R (William Ravelle)
Most importantly R is not only free but also open sourcewhich mean much more
R is available under GNU Copy-left
The recent R version 2.15.3 (Security Blanket) has been
released on 2013-03-01
11. Speciality of R
By Tal Galili (from http://www.kdnuggets.com/), he asserts
that:
R has largest number of email discussions
The number of R packages published on CRAN continue to
grow (than STATA and SAS)
R has more blogs (appox. 170) the second to R is SAS (only
31 blogs)
Even in terms of job opportunities it might not be worse
41 percent SAS
15 percent SPSS
14 percent R
13. Introduction - Speciality of R
By R A Muenchen (from
http://r4stats.com/articles/popularity/), he observes that
R counts for more number of downloads (but it might be
difficult to count)
TIOBE (http://www.tiobe.com, community programming
language index) ranked R ranked as 24th best programming
language (SPSS was out from the list)
Transparent Language Popularity Index (TLPI) ranked R as 12
most wounderful languages on the globe; the SAS as 26th
R observed as most wanted on online discussions
Mean
Mean
1000
Mean
Mean
monthly email disscussions for R are more than 3000
monthly email disscussions for STATA are more than
monthly email disscussions for SAS are less than 1000
monthly email disscussions for SPSS are less than 500
The assumption is being that what you want is that what you
talk
16. Introduction - What Muenchen Said?
His book, ”R for SAS and SPSS users” is a great work for miners and
analyst.
He studied popularity of data analysis software with respect certain
factors(https://sites.google.com/site/r4statistics/popularity):
sales downloads
Language popularity measures
Internet discussions
Competition
Usage
Literature books
Impact on scholarly activity
Website popularity
Growth in pupularity
IT Research firms
Job markets
17. Meunchen Survey - Number of Users
3
3
Fig. 3: Number of Users and Analytics
20. Meunchen Survey - Job Market
6
6
Fig. 6: Jobs for analytics software on Indeed.com
21. Introduction - Comparison
According to Brendan O’Conner (expert of artificial intelligence
and social science researcher):
there are two big divisions of solutions; they are:
programming oriented solutions like R, Matlab, Python
analytic solutions like Excel, Stata, and SPSS
Python is “immature”
Matlab is certainly “weak”, but might be better for
mathematical algorithms
SPSS and Stata are equal in capabilities; perhaps Stata might
be much cheaper than SPSS
These two are for those who crave for easy ways and
short-cuts.....
SAS is favoured by older crowd....
SAS people complain that that the graphical outputs are poor
Matlab visualization too is in little controversy compared to R
So, why not we try R!
22. Introduction - O’Conner’s Comparison
Name
R
Advantages
Library support
Visualization
Matlab
Elegant visualization
matrix support
Python
SciPy/
NumPy/
Matplotlib
Excel
SAS
Stata
SPSS
Easy; visual
flexible
Large datasets
Easy statistical
analysis
Like stata but
more expensive and wost
7
7
Disadvantages
Steep
learning
curve
Expensive
Immature
Large
datasets
Expensive
outdated
programming language
23. Introduction - O’Conner’s Comparison - Continued
Name
R
Matlab
SciPy/
NumPy/
Matplotlib
Excel
SAS
Stata
SPSS
Open Source
Yes
No
Yes
Typical Users
Finance and Statistics
Engineering
Engineering
No
No
No
No
Business
Business;Government
Science
Business; Academics
8
8
Illustration-3: comparison 2
24. Last but not least......
Ista Zahn 9 says that ....
A
”I am the only person in my department who uses LTEX and
R. Because Sweave simply provides a way to integrate these
two programs, it follows that I am the only Sweave user as
well. Why have I taken the time and eort to learn these
programs instead of following the crowd and sticking with
Word and SPSS? Quite simply, I made the switch because
A
using LTEX and R is actually easier. It took me some time to
become familiar with these programs, but after using them for
a couple of months I am firmly convinced that I am more
productive with these programs than I ever was with Word
and SPSS.
9
Zahn, I. (2008). Learning to Sweave in APA Style. The PracTEX Journal,
No-1