R For Data Science - Linear Regression

R for Data Science Course:
Linear Regression
Slide template by: https://www.slidescarnival.com/

I am Ivo
I am here to teach you about R.
Hi!
2

Linear Regression
Intuition
1
3

Linear Regression
Simplicity
One of the simplest algorithms to develop and interpret. With Linear Regression we
want to find the best line that fit a set of points.
4
Continuous Variables
In Linear Regression, you want to predict a continuous variable – variables that may
have a theoretical infinite number of values, some examples:
○ House prices;
○ Weight of someone;
○ Height of someone;
○ A stock Portfolio return;

Example Exercise – House
Prices
5
0.00 €
50,000.00 €
100,000.00 €
150,000.00 €
200,000.00 €
250,000.00 €
300,000.00 €
350,000.00 €
0 20 40 60 80 100 120 140 160
House
Price
House Area (in Square Meters)
House Price vs. House Area

Prices
6
0.00 €
50,000.00 €
100,000.00 €
150,000.00 €
200,000.00 €
250,000.00 €
300,000.00 €
350,000.00 €
0 20 40 60 80 100 120 140 160
House
Price
How much would this new house
cost,
If we just know that it has an area of
122 sq/mt?

Prices
7
0.00 €
50,000.00 €
100,000.00 €
150,000.00 €
200,000.00 €
250,000.00 €
300,000.00 €
350,000.00 €
0 20 40 60 80 100 120 140 160
House
Price
One idea is to build a line that would
represent the relationship between
the Area and the Price of the house
and then classify the new point
according to that line.
That idea is Linear Regression!

Prices
8
0.00 €
50,000.00 €
100,000.00 €
150,000.00 €
200,000.00 €
250,000.00 €
300,000.00 €
350,000.00 €
0 20 40 60 80 100 120 140 160
House
Price
But... How exactly do we find this
best line? There are a ton of infinite
lines we can draw.

Y = b+mx
A line can be characterized by the equation above
- where b is the intercept, m is the slope and x is
the value of the variable.
9
Algebra Recap!

Linear Regression
10
Definition
The linear regression models the relationship between an independent variable (commonly called y
variable) and the dependent variables (commonly called x’s). This relationship is modeled by a simple
linear equation based on the following – notice how it can be multivariate:
𝑦 = 𝑏0 + 𝑏1𝑥1 + 𝑏2𝑥2 + … + 𝑏𝑛𝑥𝑛
Target variable!
Bias / Intercept Coefficient for
2nd Variable (if
exists)
Variable Value

How do we model the relationship
for our “House Pricing” example?
11

Linear Regression
12
House Pricing Example
𝐻𝑜𝑢𝑠𝑒 𝑃𝑟𝑖𝑐𝑒 = 𝐵𝑖𝑎𝑠 + 𝑏1 ∗ 𝑆𝑞𝑢𝑎𝑟𝑒𝑑 𝑀𝑒𝑡𝑒𝑟𝑠
For our example, where we want to predict the House Price of a house with 122 squared meters, we know
The following:
House Price = ❓ -> This is our objective, finding what’s the house price.
Bias = ❓
B1 = ❓
Squared Meters = 122 ❓

How exactly do we learn bias and
b1 to get to our house price?
13

0.00 €
50,000.00 €
100,000.00 €
150,000.00 €
200,000.00 €
250,000.00 €
300,000.00 €
350,000.00 €
0 20 40 60 80 100 120 140 160
House
Price
Linear Regression
14
First Idea, Try Random Values
𝐻𝑜𝑢𝑠𝑒 𝑃𝑟𝑖𝑐𝑒 = 10000 + 1000 ∗ 𝑆𝑞𝑢𝑎𝑟𝑒𝑑 𝑀𝑒𝑡𝑒𝑟𝑠
Bias = 10000, B1 =
1000
𝟏𝟑𝟐. 𝟎𝟎𝟎€ = 10000 + 1000 ∗ 122
These values don’t seem a
good fit at all, let’s try more.
Doesn’t make sense to have a
house that has 121 meters
squared to cost
approximately the same as a
70 m/sq one (assuming no
other variables have
influence).

0.00 €
50,000.00 €
100,000.00 €
150,000.00 €
200,000.00 €
250,000.00 €
300,000.00 €
350,000.00 €
0 20 40 60 80 100 120 140 160
House
Price
Linear Regression
15
Second Idea, Try More Random Values
Bias = 10000, B1 =
1500
𝟏𝟗𝟑. 𝟎𝟎𝟎€ = 10000 + 1500 ∗ 122
We are getting closer! Let’s
raise B1 a bit more.

0.00 €
50,000.00 €
100,000.00 €
150,000.00 €
200,000.00 €
250,000.00 €
300,000.00 €
350,000.00 €
0 20 40 60 80 100 120 140 160
House
Price
Linear Regression
16
Third Idea, Try More Random Values
Bias = 10000, B1 =
2000
𝟐𝟓𝟒. 𝟎𝟎𝟎€ = 10000 + 2000 ∗ 122
That seems a good fit!
During this trial and error
approach, we also developed
an equation!

Linear Regression
17
Equation Generated:
0 €
50,000 €
100,000 €
150,000 €
200,000 €
250,000 €
300,000 €
350,000 €
0 20 40 60 80 100 120 140 160
House
Price

Linear Regression – Least
Squares
18
0 €
50,000 €
100,000 €
150,000 €
200,000 €
250,000 €
300,000 €
350,000 €
0 20 40 60 80 100 120 140 160
House
Price
Visually, we want to construct a line that passes through most of the points.
Mathematically, we want to get a line that minimizes between each point and the potential line.
Generally called,
least squares
regression!
This line is
awful as the
difference
between each
point and the
line is huge.

Squares
19
0 €
50,000 €
100,000 €
150,000 €
200,000 €
250,000 €
300,000 €
350,000 €
0 20 40 60 80 100 120 140 160
House
Price
Generally called,
least squares
regression!
This line is
better as the
difference
between each
point and the
line is lower.

Squares
20
0 €
50,000 €
100,000 €
150,000 €
200,000 €
250,000 €
300,000 €
350,000 €
0 20 40 60 80 100 120 140 160
House
Price
Generally called,
least squares
regression!
This line seems
really good as
the error
between each
point and the
line is really low.

Cost Function
21
Our cost function (sometimes called loss or minimization function) is this difference between the
value of our line and each point. In regression, the most common cost function to use is the mean
squared error:
The larger this
cost function,
the worse our
line is in
predicting the
house prices!
Each y is the real
house price.
Each y~i is the
value predicted
by our line.
n is the number
of houses in our
sample.

Squares
22
0 €
50,000 €
100,000 €
150,000 €
200,000 €
250,000 €
300,000 €
350,000 €
0 20 40 60 80 100 120 140 160
House
Price
Why Square the
difference ? So
that the errors
don’t cancel out
each other
y1
y~1
Contribution to cost function = (𝑦1 − 𝑦~1)2
y2
y~2
Contribution to cost function = (𝑦2 − 𝑦~2)2

Cost Function
23
Each of our different lines, will produce different cost function values.
0 €
50,000 €
100,000 €
150,000 €
200,000 €
250,000 €
300,000 €
350,000 €
0 20 40 60 80 100 120 140 160
House
Price
This line produces a cost
function value of around
22.500.000.000 of mean
squared error – our cost
function value

Cost Function
24
4.900.000.000 of mean
function value
0 €
50,000 €
100,000 €
150,000 €
200,000 €
250,000 €
300,000 €
350,000 €
0 20 40 60 80 100 120 140 160
House
Price

Cost Function
25
100.000.000 of mean
function value
0 €
50,000 €
100,000 €
150,000 €
200,000 €
250,000 €
300,000 €
350,000 €
0 20 40 60 80 100 120 140 160
House
Price

26
Cost Function Plot
Imagining we have B0, or bias, fixed – each b1 (also called coefficient) will produce different cost function values.
0
5,000,000,000
10,000,000,000
15,000,000,000
20,000,000,000
25,000,000,000
30,000,000,000
35,000,000,000
40,000,000,000
45,000,000,000
0
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
2600
2800
3000
3200
3400
3600
3800
4000
4200
4400
4600
4800
5000
Cost
Function
Value
Value of B1 - Coefficient to Multiply with House Area
Cost Function per Value of B1
This point minimizes our
error between the line and
points!

27
Gradient Descent
Most algorithm implementations perform this search by doing gradient descent:
0
5,000,000,000
10,000,000,000
15,000,000,000
20,000,000,000
25,000,000,000
30,000,000,000
35,000,000,000
40,000,000,000
45,000,000,000
0
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
2600
2800
3000
3200
3400
3600
3800
4000
4200
4400
4600
4800
5000
Cost
Function
Value
Value of B1 - Coefficient to Multiply with House Area
Cost Function per Value of B1
Randomnly initialized weights
may start here!

Closed Form Solution
28
Particularly for Linear Regression, there is a formula named Closed Form Solution that
outputs the best coefficients – this formula is only valid for Linear Regression:
Read as the inverse of the
transpose of the Features
(independent variables) – X’
multiplied by the same matrix
- X
Read as the Transpose of the
Features (independent
variables) – X’ multiplied by
The target variable
(dependent variable) (y)

To retain
◉ Regression problems are problems where we want to
predict a continuous variable.
◉ Linear Regression is one of the algorithms to solve that
problem.
◉ A Linear Regression finds the equation that best fit our
points – minimizing the error between the line and the
points.
◉ We can find the best coefficients for our line using
either Gradient Descent or Closed Form Solution.
29

How do we train linear regressions
in R?
30

lm function
◉ The lm function let us train linear regressions
really quickly in R:
◉ lm(y ~ x1 + x2 + ... + xn, data=dataframe)
◉ For our house prices example: lm(house_price ~
house_area, data = houses)
◉ You can read this as y as a function of x1 and x2 ...
and xn.
31

New example! Using the mtcars
data frame
32

A quick example using
another dataset
33
The mtcars data frame is an
R internal data set that
contains information about
different car models.
Let’s see if we can predict
consumption (mpg) as a
function of cyl, disp, hp, drat
and wt. These are all
columns that characterize
the car – number of
cylinders, horse power, etc.
You can check their
description using ?mtcars
on the R console.

Here is the function we will pass to R:
lm(mpg ~ cyl + disp + hp + drat + wt, data = mtcars)
Miles Per Gallon Prediction
34
The output is really cool, it gives us the coefficients (weights) for each variable:
Can you guess the equation that was generated with these coefficients❓

The equation generated:
35
𝑀𝑃𝐺 = 36.00836 − 1,10749 ∗ 𝑐𝑦𝑙 + 0.01236 ∗ 𝑑𝑖𝑠𝑝 − 0.02402 ∗ ℎ𝑝 + 0.95221 ∗ 𝑑𝑟𝑎𝑡 − 3,67329 ∗ 𝑤𝑡

The equation generated:
36
The cool thing is that the sign of these coefficients point towards the influence of the variable:
- The more cylinders the motor has, the less miles per gallon the car will make (negative weight)
- The more horsepower the motor has, the less miles per gallon the car will make (negative weight)
- The more displacement the car has, the more miles per gallon the car will make. (positive weight)

Are all variables relevant?
37

We can call summary to check more information about the regression – we’ll explore more of the output of the
summary command in the practical lectures:
mpg_prediction <- lm(mpg ~ cyl + disp + hp + drat + wt, data = mtcars)
summary(mpg_prediction)
38
These signs point us
towards the influence of the
variable – they refer the
significance level of an
hypothesis test
We want these residuals to
be near-normal distributed:
- Median near 0;
- Similar absolute value of
max and min.

Assumptions - Linear
Regression
◉ The target and features must have a linear relationship;
◉ No correlated features (extremely hard in real world
scenarios)
◉ Independence of Observations;
◉ Somewhat Normal Residuals;
◉ Homoscedasticity – Constant variance accross the errors of
the predictions;
39

You can find me at:
LinkedIn Udemy Medium
40
Want to discover more?
Join my R courses on Udemy risk-free (30 day refund policy), where you will
have the chance to learn with practical exercises:
- R for Absolute Beginners
- R for Data Science

Credits
Special thanks to all the people who made and
released these awesome resources for free:
◉ Presentation template by SlidesCarnival
41

R For Data Science - Linear Regression

Recommandé

Recommandé

Contenu connexe

Similaire à R For Data Science - Linear Regression

Similaire à R For Data Science - Linear Regression (20)

Dernier

Dernier (20)

R For Data Science - Linear Regression