R is a language and a platform for statistical computing and
graphics. It is a GNU project and was developed at Bell
Laboratories(formerly AT & T, now Lucent Technologies) by
John Chambers and his colleagues.
R provides a wide variety of statistical and graphical techniques
with highly scalable features.
R is available as a Free Software under the terms of Free
Software Foundation’s GNU General Public License in source
code form .
What is R-language?
Rupak Roy
It is includes an effective data handling and storage facility.
It is the most comprehensive statistical analysis package available as it
incorporates all of the standard statistical tests, models and analysis as well
as providing a comprehensive language for managing and manipulating the
data.
Everyone is welcome to provide code enhancements, debug the bug issues
and also add new packages. So the wealth of quality packages available for
R is testament to this approach to software development and sharing.
R has over 4800 packages available from multiple repositories specializing in
topics like econometrics, data mining, spatial analysis and bio-informatics.
R can handle as many types of data from csv, sas, spss , excel, mysql, sql
server, oracle and even can be integrated with hadoop for big data analysis .
Introduction to R-language
Rupak Roy
R is been listed in the top open source analytical tools 2016
list after SAS which is a license version. Therefore in 2019 R
took the lead in analytical tools with its robustness and
versatile in nature.
Introduction to R-language
Rupak Roy
R Studio is again a free and open source integrated
development environment(IDE) for R programming language
for statistical computing and graphics. R studio was founded
by JJ Allaire.
R studio is available in 2 editions. R-Studio Desktop, where the
program is run locally as a regular desktop application and
R-Studio Server which allows accessing R Studio remotely
using a web browser.
Introduction to R-Studio
Rupak Roy
Difference between R and R Studio.
R and R Studio are two different versions of the same thing.
R is a programming language for statistical calculation and R
Studio is a IDE integrated Development Environment that has
more GUI interface to make analytics easy .
We can use R without R Studio but we cant use R Studio
without R .
Or we can say R Studio is a front end IDE to R.
Introduction to R Studio
Rupak Roy
The CRAN (Comprehensive R Archive Network) is a
network of ftp and web servers around the world
that stores identical, up-to-date versions of code and
documentations for R.
What is CRAN?
Rupak Roy
What is Big Data ?
Extremely large data sets are analyzed computationally to reveal patterns, trends and
associations especially relating to human behavior or machines.
They can be from terabyte
to petabyte consisting of
millions to trillions of
rows and columns.
However R is not made
for big data analytics but
it has its advantage to
integrate with big data
technologies named as hadoop.
One of the big advantage over hadoop is that hadoop is specially designed for
programmers and data scientist, analyst or anyone not from programing background don’t
have to spend more time in programming rather than analyzing their data.
So what R does in this, will send instructions to the hadoop and hadoop will
process all the instructions and return back the results to R.
R also have the advantage to extract multiple samples from hadoop, which is required for
statistical modeling computing.
R can handle data as much as the memory available from the system i.e. RAM.
Source editor: contains a text editor where multiple lines of code can be entered.
Users can also save it as script file to disk.
Console editor: where all the interactive work of R is performed like objects
created, analysis, filter etc.
R Studio Environment
Packages: this is the place where a user can view all the list of
install packages. Packages are a self contained set of codes to
perform specific task similar to add-ins in excel.
Help: this is where we can browse the built-in help system for any R
related topics.
Files: the place where user can browse their files of the computer.
Plots: this is the place where R displays its visual analysis like
histogram, bar diagram, boxplots etc.
Workspace/history: The workspace is our current R working
environment and includes any user-defined objects (vectors,
matrices, data frames, lists, functions). At the end of an R session,
the user can save an image of the current workspace that is
automatically reloaded the next time when R is started.
R Studio Environment
Rupak Roy
To install R first, kindly follow the following steps:
Visit https://cran.r-project.org/
Then according to your operating system, select one, in this
case we choose ‘Download R for Windows’.
In the next page, click ‘install R for the first time’ from base
category.
Now Download R 3.3.3 for Windows.
Run the R setup file and choose the appropriate options
according to the needs (we will keep the default setting for
this course) and finish the installation.
Select the RGUI and it should something look like this.
Installing R and R Studio
Rupak Roy
Now let’s install the R Studio
Go to https://www.rstudio.com/products/rstudio/
Download R Studio desktop,
select the installation file for
your systems and
run the installation file.
Later we can even change the
settings by choosing
Tools -> options
Installing R and R Studio
Next: Data types and their structure in R.
Installing R and R Studio
Rupak Roy