Introduction to R

Statistics Lab
Rodolfo Metulini
IMT Institute for Advanced Studies, Lucca, Italy

Introduction to R - 09.01.2014

Getting help with functions
To get more information on any speciﬁc named function, for
example solve, the command is
> help(solve)
An alternative is:
> ?solve
Running:
> help.start()
we will launch a Web browser that allows to enter to the help
home page.
The ?? command allows searching for help in a diﬀerent way. For
example, it is usefull to get a help of non installed packages.

objects and saving data
The entities that R creates and manipulates are known as
objects.
During an R session, objects are created and stored by name.
> objects() can be used to display the name of the objects which
are currently stored within R.
> rm() can be used to remove objects.
At the end of each R session, you are given the opportunity to save
all the currently available objects. You can save the objects (the
workspace) in .RData format in the current directory. You also can
save command lines in .Rhistory format.

scalars and vectors: manipulation
To set up a vector named x, namely 1, 2, 3, 4 and 5, let use the R
command:
> x = c(1,2,3,4,5)
or, identically, the assign function could be used.
> assign(”x”, c(1,2,3,4,5))
x is a vector of length 5. To check it we can use the following
function:
> length(x)
>1/x gives the reciprocal of x.
> y = c(x,0,x) would create a vector with 11 entries consisting of
two copies of x with a 0 in the middle.

scalars and vectors: manipulation
Vectors can be used in arithmetic expressions.
Vector in the same expression need not all to be of the same
length. If not, the output value have the length of the longest
vector in the expression.
For example:
>v =2∗x +y +1
generate a new vector of length 11 constructed by adding together,
element by element, 2*x repeated 2.2 times, y repeated just once,
and 1 repeated 11 times.
So, WARNING: R compute that kind of expression even if it is
wrongly deﬁned.

scalars and vectors: manipulation - 2
In addition, are also available log, exp, sin, cos, tan, sqrt and, of
course, the classical arithmetic operators
min(x) and max(x) select the smallest and the largest element of
the vector.
sum(x) and prod(x) display the sum and the product, respectively,
of the numbers within the vector.
mean(x) calculates the sample (arithmetic) mean, wich is the same
of sum(x)/length(x); and var(x) gives the sample variance:
sum((x − mean(x))2 )/(length(x) − 1)
sort(x) returns a vector of the same size of x, with the elements in
increasing order.

seq and rep
There are facilities to generate commonly used sequences of
numbers.
> 1:30 is the vector c(1,2, ..., 29,30)
> 2*1:15 is the vector c(2,4, ..., 28,30) of length 15.
In addition, seq() is in use. seq(2:10) is the same of the vector
2:10
by=, from=, to= are usefull command:
>seq(from= 30, to = 1)
>seq(-10, 10, by = 0.5)
rep() can be used for replicating and object.
> rep(x, times=5) > rep(x, each=5)

logical vectors

As well as numerical vectors, R allows manipulation of logical
quantities.
The elements of a logical vector can have the value TRUE, FALSE
and NA (”not available”)
Logical vectors are generated by conditions. Example:
> temp = x > 3
The logical operator are : <, <=, >=, ==, ! = for inequality. In
addition, if c1 and c2 are logical expressions, then c1c2 is the
intersection (”and”), c1|c2 is the union (”or ”), and !c1 is the
negation of c1

missing Values

In some cases the components of a vector may not be completely
known: in this case we assign the value ”NA”
The function is.na(x) gives a logical vector of the same size as x
with value TRUE if the corresponding element in x is NA. > z =
c(1:3, NA); ind = is.na(z)
There is a second kind of ”missing” values that are produced by
numerical computation, the so-called Not a Number, NaN, values.
Example:
> 0/0
> Inf/Inf

index vectors: subsets of a vector
Subsets of the elements of a vector may be selected by appendix to
the name of the vector an index vector in square brackets.
1. A logical vector: Values corresponding to TRUE in the index
vector are selected: > y = x[!is.na(x)]
2. A vector of positive (negative) integer quantities: in this
case the values in the index vector must lie in the set
{1, 2, ..., length(x)}. In the second case the selected vales will
be excluded. > x[2:3]; x[-(2:3)]
3. A vector of character string: this is possible only after
applying a names to the objects.
> cars = c(1,2,3)
> names(cars)=c(”ferrari”,”lamborghini”,”bugatti”)
> pref = cars[c(”ferrari”,”bugatti”)]

Objects and attribute
To each object it is associated one (and only one) attribute (it’s
the reason why we called them ”atomic”)
The objects can be: numeric, logical, complex, character and
raw
Usefull commands: mode(), as.numeric(), is.numeric()
For example, create a numeric vector:
> z = 0:9
change it in character: > digits = as.character(z);
and coerce it in a numeric:> d = as.integer(digits)
d and z are the same!

arrays, matrices and data.frame

Vectors are the most important type of objects in R, but there are
several others. Between the others:
matrix: they are multidimensional generalizations of vectors
data.frame: matrix-like structures, but the column can be of
diﬀerent types. This is used when we manage with both
numerical and categorical data.
How to transform a vector in matrix?
> v = 1:50
> dim(v) = c(10,5)

arrays, matrices and data.frame (2)

How to create by beginning a matrix?
> m = array(1:20, dim= c(4,5))
Subsetting a matrix or replacing a subset of a matrix with zeros?
Lets give a look to the examples in the codes.

matrix manipulation
The operator ÷ ∗ ÷ is used for the matrix moltiplication.
An nx1 or 1xn matrices are also valid matrices.
If for example, A and B are square matrix of the same size,
then:
>A*B
is the matrix of element by element products(it doesn’t work for
matrices with diﬀerent dimension), and
> A ÷ ∗ ÷ t(B)
is the matrix product.
diag(A) return the elements in the main diagonal of A. ginv(A)
and t(A) return the inverse and the transposed matrix.
Ginv() require MASS package.

lists and data frames
An R list is an object consisting of an ordered collection of objects
known as its components.
Here is a simple example of how to make a list:
> Lst = list(name=”Rodolfo”, surname=”Metulini”, age =
”30”)
It is possible to concatenating two or more lists:
list.ABC = C(list.A, list.B, list.C)
A data.frame is a list with a speciﬁc class ”data.frame”.
We can convert a matrix object in a data.frame objects with the
command as.data.frame(matrix)
The Easiest way to create a data.frame object is by mean of
read.table () function.

reading data
Large data objects will usually be read as values from external files
rather than entered during an R session at the keyboard.
There are basically two similar commands to upload data.
1. read.table(): specific for .csv files.
2. read.delim(): specific for .txt files
Usefull commands:
sep = ” ”: to specify if data in the dataset are separated by ;, ., ,
or they are tab delimited.
header = TRUE : to specify that first row in the dataset refers to
variable names
moreover, read.dta() is used to upload data from STATA :)

distributions and co.

One convenient use of R is to provide a comprehensive set of
statistical tables. Functions are provided to evaluate the
comulative distribution P(X < x), the probability density function
and the quantile function (given q, the smallest x such that
P(X < x) > q), and to simulate from the distribution.
Here, by ”d” for the density , ”p” (pnorm, punif, pexp etc ..) for
the CDF, ”q” for the quantile function. and ”r ” for
simulation.
Let empirically examine the distribution of a variable
(codes).

covar and concentration indices
The covariance and the correlation measure the degree at which
two variables change togheter
The correlation is a index [-1,1], the covariance is a pure number
(depends on the values assumed by the variables)
> Cov = cov(A,B) > Cor = corr(A,B)
We can also calculate the correlation netween A and B as
follow:
> CorAB = Cov / sqrt(Var(A)*Var(B))
Gini index: it is the most popular concentration index, we need to
install ineq package
Mode: the most frequent value within the distribution, we need to
install modeest package, mfv command

homeworks

For who of us is familiar with STATA, lets try to upload a .dta
ﬁle with read.dta() function.
Study the agreement with other distributions (exponential?
uniform? it is up to you) of eruption data.

Introduction to R

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (18)

En vedette

En vedette (16)

Similaire à Introduction to R

Similaire à Introduction to R (20)

Plus de University of Salerno

Plus de University of Salerno (20)

Dernier

Dernier (20)

Introduction to R