The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
R and Rcmdr Statistical Software
1. Introducing
R and Rcmdr
Statistical Software
FutureVideo
HealthVideo
November 24, 2013 (Sunday)
12:20 PM
Jabria-2 Auditorium
By: Dr. Kang Mun Arturo Tan
Management Sciences Department
Yanbu University College
2. R is the 18th letter of the alphabet.
R is data analysis software.
R is a programming language.
R is an environment for
statistical analysis.
3. A Bit of History (and Credits)
The R Project
The Department
of Statistics of
The University of Auckland, New Zealand
is well known for being the birthplace of the R Project.
4. Founders of the R Project are, at the time senior lecturers
Robert Gentleman and Ross Ihaka, now Associate Professors.
Starting to work in 1991, the R codes were first released in 1996. The R
Project is a language and environment for statistical computing and
graphics.
5. John Hopkins University
University of Washington
Princeton University
Stanford University
Google
Pfizer
Merck
Bank of America
Intercontinental Hotels
Shell
…
It is widely taught around the world and is being used by
Ivy League Universities, Google,
students, and even by school children.
second-year Statistics
6. “R is the
most powerful
statistical computing language on the planet.”
24. R Commander Default Menu Tree [current as of version 2.0-0]
File - Change working directory
|- Open script file
|- Save script
|- Save script as |- Open R Markdown file
|- Save R Markdown file
|- Save R Markdown file as
|- Save output
|- Save output as
|- Save R workspace
|- Save R workspace as
|- Exit - from Commander
|- from Commander and R
Edit - Cut
|- Copy
|- Paste
|- Delete
|- Find
|- Select all
|- Undo
|- Redo
|- Clear Window
25. Data - New data set
|- Load data set
|- Merge data sets
|- Import data - from text file, clipboard, or URL
| |- from SPSS data set | |- from SAS xport file
| |- from Minitab data set
| |- from STATA data set
| |- from Excel, Access, or dBase data set [32-bit Windows only]
| |- from Excel file [currently 64-bit Windows only]
|- Data in packages - List data sets in packages
| |- Read data set from attached package
|- Active data set - Select active data set
| |- Refresh active data set
| |- Help on active data set (if available)
| |- Variables in active data set
| |- Set case names
| |- Subset active data set
| |- Aggregate variables in active data set
| |- Remove row(s) from active data set
| |- Stack variables in active data set
| |- Remove cases with missing data
| |- Save active data set
| |- Export active data set
|- Manage variables in active data set - Recode variable
|- Compute new variable
|- Add observation numbers to data set
|- Standardize variables
|- Convert numeric variables to factors
|- Bin numeric variable
|- Reorder factor levels
|- Define contrasts for a factor
|- Rename variables
|- Delete variables from data set
30. Why use R?
There's lots of software available for data analysis today: spreadsheets like
Excel, batch-oriented procedure-based systems like SAS; point-and-click
GUI-based systems like SPSS; data mining systems, and so on
31. What makes R different?
R is free.
As an open-source project, you can use R free of charge: no worries about subscription
fees, license managers, or user limits. But just as importantly, R is open: you can inspect
the code and tinker with it as much as you like (provided you respect the terms of the GNU
General Public License version 2 under which it is distributed). Thousands of experts
around the world have done just that, and their contributions benefit the millions of
people who use R today.
R is a language.
In R, you do data analysis by writing functions and scripts, not by pointing and clicking.
That may sound daunting, but it's an easy language to learn, and a very natural and
expressive one for data analysis. But once you learn the language, there are many benefits.
As an interactive language (as opposed to a data-in-data-out black-box procedures), R
promotes experimentation and exploration, which improves data analysis and often leads
to discoveries that wouldn't be made otherwise. A script documents all your work, from
data access to reporting, and can instantly be re-run at any time. (This makes it much
easier to update results when the data change.) Scripts also make it easy to automate a
sequence of tasks that can be integrated into other processes. Many R users who have
used other software report that they can do their data analyses in a fraction of the time.
32. Graphics and data visualization.
One of the design principles of R was that visualization of data through charts and graphs is an essential
part of the data analysis process. As a result, it has excellent tools for creating graphics, from staples like
bar charts and scatterplots to multi-panel Lattice charts to brand new graphics of your own devising. R's
graphical system is heavily influenced by thought leaders in data visualization like Bill Cleveland and
Edward Tufte, and as a result graphics based on R appear regularly in venues like the New York Times,
the Economist, and the FlowingData blog.
A flexible statistical analysis toolkit.
All of the standard data analysis tools are built right into the R language: from accessing
data in various formats, to data manipulation (transforms, merges, aggregations, etc.), to
traditional and modern statistical models (regression, ANOVA, GLM, tree models, etc). All
are included in an object-oriented framework that makes it easy to programatically extract
out and combine just the information you need from the results, rather than having to cutand-paste from a static report.
33. Access to powerful, cutting-edge analytics.
Leading academics and researches from around the world use R to develop the latest
methods in statistics, machine learning, and predictive modeling. There are expansive,
cutting-edge extensions to R in finance, genomics, and dozens of other fields. To date,
more than 2000 packages extending the R language in every domain are available for free
download, with more added every day.
A robust, vibrant community.
With thousands of contributors and more than two million users around the world, if
you've got a question about R chances are, someone's answered it (or can). There's a
wealth of community resources for R available on the Web, for help in just about every
domain.
34. Unlimited possibilities.
With R, you're not restricted to choosing a pre-defined set of routines. You can use code
contributed by others in the open-source community, or extend R with your own functions.
And R is excellent for "mash-ups" with other applications: combine R with a MySQL
database, an Apache web-server, and the Google Maps API and you've got yourself a realtime GIS analysis toolkit. That's just one big idea -- what's yours?
“The great beauty of R is that you can modify it to do all
sorts of things,” said Hal Varian, chief economist at
Google. “And you have a lot of prepackaged stuff that’s
already available, so you’re standing on the shoulders of
giants.”
35. Here are our suggestions for the best on-line resources for information about R.
The R Project homepage. Look here for official news from the R Project,
plus links to documentation, mailing lists, the official R FAQs, and more.
StackOverflow. Got a question about R? Search for questions tagged with "r"
and you'll probably find your question already answered. If not, ask away.
R bloggers. For a steady stream of news, tips and articles related to R follow
this blog aggregator for posts from dozens of R bloggers, including the team
from Revolution Analytics.
36. The Video Rchive. Watch recordings of speakers at R user group meetings and
conferences talk about various aspects of using R.
#rstats on Twitter. To listen in on (or contribute to) an information-rich
conversation about R 140 characters at a time, search for the #rstats hastag.
CRAN Task Views. The list of 2000+ add-on packages for R can be daunting, but
these Task Views list the most important ones in domain-specific areas as diverse as
Finance, Clinical Trials, and Machine Learning.
Crantastic.org.On the other hand, if you're looking for a specific page, you can
search by keyword at this interactive directory of all R packages. You can also log in
and rate and comment on packages.