Linear regression is used to predict a continuous target variable from one or more continuous or categorical predictor variables. It finds the best-fitting straight line through the data points by minimizing the sum of squared residuals. The lm() function in R allows users to quickly train linear regression models on data frames by specifying a formula with the target variable on the left of ~ and predictor variables on the right. For example, lm(mpg ~ wt, data = mtcars) predicts miles per gallon (mpg) from weight (wt) using the built-in mtcars data frame.
4. Linear Regression
Simplicity
One of the simplest algorithms to develop and interpret. With Linear Regression we
want to find the best line that fit a set of points.
4
Continuous Variables
In Linear Regression, you want to predict a continuous variable – variables that may
have a theoretical infinite number of values, some examples:
○ House prices;
○ Weight of someone;
○ Height of someone;
○ A stock Portfolio return;
5. Example Exercise – House
Prices
5
0.00 €
50,000.00 €
100,000.00 €
150,000.00 €
200,000.00 €
250,000.00 €
300,000.00 €
350,000.00 €
0 20 40 60 80 100 120 140 160
House
Price
House Area (in Square Meters)
House Price vs. House Area
6. Example Exercise – House
Prices
6
0.00 €
50,000.00 €
100,000.00 €
150,000.00 €
200,000.00 €
250,000.00 €
300,000.00 €
350,000.00 €
0 20 40 60 80 100 120 140 160
House
Price
House Area (in Square Meters)
House Price vs. House Area
How much would this new house
cost,
If we just know that it has an area of
122 sq/mt?
7. Example Exercise – House
Prices
7
0.00 €
50,000.00 €
100,000.00 €
150,000.00 €
200,000.00 €
250,000.00 €
300,000.00 €
350,000.00 €
0 20 40 60 80 100 120 140 160
House
Price
House Area (in Square Meters)
House Price vs. House Area
One idea is to build a line that would
represent the relationship between
the Area and the Price of the house
and then classify the new point
according to that line.
That idea is Linear Regression!
8. Example Exercise – House
Prices
8
0.00 €
50,000.00 €
100,000.00 €
150,000.00 €
200,000.00 €
250,000.00 €
300,000.00 €
350,000.00 €
0 20 40 60 80 100 120 140 160
House
Price
House Area (in Square Meters)
House Price vs. House Area
But... How exactly do we find this
best line? There are a ton of infinite
lines we can draw.
9. Y = b+mx
A line can be characterized by the equation above
- where b is the intercept, m is the slope and x is
the value of the variable.
9
Algebra Recap!
10. Linear Regression
10
Definition
The linear regression models the relationship between an independent variable (commonly called y
variable) and the dependent variables (commonly called x’s). This relationship is modeled by a simple
linear equation based on the following – notice how it can be multivariate:
𝑦 = 𝑏0 + 𝑏1𝑥1 + 𝑏2𝑥2 + … + 𝑏𝑛𝑥𝑛
Target variable!
Bias / Intercept Coefficient for
2nd Variable (if
exists)
Variable Value
11. How do we model the relationship
for our “House Pricing” example?
11
12. Linear Regression
12
House Pricing Example
𝐻𝑜𝑢𝑠𝑒 𝑃𝑟𝑖𝑐𝑒 = 𝐵𝑖𝑎𝑠 + 𝑏1 ∗ 𝑆𝑞𝑢𝑎𝑟𝑒𝑑 𝑀𝑒𝑡𝑒𝑟𝑠
For our example, where we want to predict the House Price of a house with 122 squared meters, we know
The following:
House Price = ❓ -> This is our objective, finding what’s the house price.
Bias = ❓
B1 = ❓
Squared Meters = 122 ❓
13. How exactly do we learn bias and
b1 to get to our house price?
13
14. 0.00 €
50,000.00 €
100,000.00 €
150,000.00 €
200,000.00 €
250,000.00 €
300,000.00 €
350,000.00 €
0 20 40 60 80 100 120 140 160
House
Price
House Area (in Square Meters)
House Price vs. House Area
Linear Regression
14
First Idea, Try Random Values
𝐻𝑜𝑢𝑠𝑒 𝑃𝑟𝑖𝑐𝑒 = 10000 + 1000 ∗ 𝑆𝑞𝑢𝑎𝑟𝑒𝑑 𝑀𝑒𝑡𝑒𝑟𝑠
Bias = 10000, B1 =
1000
𝟏𝟑𝟐. 𝟎𝟎𝟎€ = 10000 + 1000 ∗ 122
These values don’t seem a
good fit at all, let’s try more.
Doesn’t make sense to have a
house that has 121 meters
squared to cost
approximately the same as a
70 m/sq one (assuming no
other variables have
influence).
15. 0.00 €
50,000.00 €
100,000.00 €
150,000.00 €
200,000.00 €
250,000.00 €
300,000.00 €
350,000.00 €
0 20 40 60 80 100 120 140 160
House
Price
House Area (in Square Meters)
House Price vs. House Area
Linear Regression
15
Second Idea, Try More Random Values
𝐻𝑜𝑢𝑠𝑒 𝑃𝑟𝑖𝑐𝑒 = 10000 + 1500 ∗ 𝑆𝑞𝑢𝑎𝑟𝑒𝑑 𝑀𝑒𝑡𝑒𝑟𝑠
Bias = 10000, B1 =
1500
𝟏𝟗𝟑. 𝟎𝟎𝟎€ = 10000 + 1500 ∗ 122
We are getting closer! Let’s
raise B1 a bit more.
16. 0.00 €
50,000.00 €
100,000.00 €
150,000.00 €
200,000.00 €
250,000.00 €
300,000.00 €
350,000.00 €
0 20 40 60 80 100 120 140 160
House
Price
House Area (in Square Meters)
House Price vs. House Area
Linear Regression
16
Third Idea, Try More Random Values
𝐻𝑜𝑢𝑠𝑒 𝑃𝑟𝑖𝑐𝑒 = 10000 + 2000 ∗ 𝑆𝑞𝑢𝑎𝑟𝑒𝑑 𝑀𝑒𝑡𝑒𝑟𝑠
Bias = 10000, B1 =
2000
𝟐𝟓𝟒. 𝟎𝟎𝟎€ = 10000 + 2000 ∗ 122
That seems a good fit!
During this trial and error
approach, we also developed
an equation!
17. Linear Regression
17
Equation Generated:
𝐻𝑜𝑢𝑠𝑒 𝑃𝑟𝑖𝑐𝑒 = 10000 + 2000 ∗ 𝑆𝑞𝑢𝑎𝑟𝑒𝑑 𝑀𝑒𝑡𝑒𝑟𝑠
0 €
50,000 €
100,000 €
150,000 €
200,000 €
250,000 €
300,000 €
350,000 €
0 20 40 60 80 100 120 140 160
House
Price
House Area (in Square Meters)
House Price vs. House Area
18. Linear Regression – Least
Squares
18
0 €
50,000 €
100,000 €
150,000 €
200,000 €
250,000 €
300,000 €
350,000 €
0 20 40 60 80 100 120 140 160
House
Price
House Area (in Square Meters)
House Price vs. House Area
Visually, we want to construct a line that passes through most of the points.
Mathematically, we want to get a line that minimizes between each point and the potential line.
Generally called,
least squares
regression!
This line is
awful as the
difference
between each
point and the
line is huge.
19. Linear Regression – Least
Squares
19
0 €
50,000 €
100,000 €
150,000 €
200,000 €
250,000 €
300,000 €
350,000 €
0 20 40 60 80 100 120 140 160
House
Price
House Area (in Square Meters)
House Price vs. House Area
Visually, we want to construct a line that passes through most of the points.
Mathematically, we want to get a line that minimizes between each point and the potential line.
Generally called,
least squares
regression!
This line is
better as the
difference
between each
point and the
line is lower.
20. Linear Regression – Least
Squares
20
0 €
50,000 €
100,000 €
150,000 €
200,000 €
250,000 €
300,000 €
350,000 €
0 20 40 60 80 100 120 140 160
House
Price
House Area (in Square Meters)
House Price vs. House Area
Visually, we want to construct a line that passes through most of the points.
Mathematically, we want to get a line that minimizes between each point and the potential line.
Generally called,
least squares
regression!
This line seems
really good as
the error
between each
point and the
line is really low.
21. Cost Function
21
Our cost function (sometimes called loss or minimization function) is this difference between the
value of our line and each point. In regression, the most common cost function to use is the mean
squared error:
The larger this
cost function,
the worse our
line is in
predicting the
house prices!
Each y is the real
house price.
Each y~i is the
value predicted
by our line.
n is the number
of houses in our
sample.
22. Linear Regression – Least
Squares
22
0 €
50,000 €
100,000 €
150,000 €
200,000 €
250,000 €
300,000 €
350,000 €
0 20 40 60 80 100 120 140 160
House
Price
House Area (in Square Meters)
House Price vs. House Area
Why Square the
difference ? So
that the errors
don’t cancel out
each other
y1
y~1
Contribution to cost function = (𝑦1 − 𝑦~1)2
y2
y~2
Contribution to cost function = (𝑦2 − 𝑦~2)2
23. Cost Function
23
Each of our different lines, will produce different cost function values.
0 €
50,000 €
100,000 €
150,000 €
200,000 €
250,000 €
300,000 €
350,000 €
0 20 40 60 80 100 120 140 160
House
Price
House Area (in Square Meters)
House Price vs. House Area
This line produces a cost
function value of around
22.500.000.000 of mean
squared error – our cost
function value
24. Cost Function
24
Each of our different lines, will produce different cost function values.
This line produces a cost
function value of around
4.900.000.000 of mean
squared error – our cost
function value
0 €
50,000 €
100,000 €
150,000 €
200,000 €
250,000 €
300,000 €
350,000 €
0 20 40 60 80 100 120 140 160
House
Price
House Area (in Square Meters)
House Price vs. House Area
25. Cost Function
25
Each of our different lines, will produce different cost function values.
This line produces a cost
function value of around
100.000.000 of mean
squared error – our cost
function value
0 €
50,000 €
100,000 €
150,000 €
200,000 €
250,000 €
300,000 €
350,000 €
0 20 40 60 80 100 120 140 160
House
Price
House Area (in Square Meters)
House Price vs. House Area
26. 26
Cost Function Plot
Imagining we have B0, or bias, fixed – each b1 (also called coefficient) will produce different cost function values.
0
5,000,000,000
10,000,000,000
15,000,000,000
20,000,000,000
25,000,000,000
30,000,000,000
35,000,000,000
40,000,000,000
45,000,000,000
0
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
2600
2800
3000
3200
3400
3600
3800
4000
4200
4400
4600
4800
5000
Cost
Function
Value
Value of B1 - Coefficient to Multiply with House Area
Cost Function per Value of B1
This point minimizes our
error between the line and
points!
27. 27
Gradient Descent
Most algorithm implementations perform this search by doing gradient descent:
0
5,000,000,000
10,000,000,000
15,000,000,000
20,000,000,000
25,000,000,000
30,000,000,000
35,000,000,000
40,000,000,000
45,000,000,000
0
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
2600
2800
3000
3200
3400
3600
3800
4000
4200
4400
4600
4800
5000
Cost
Function
Value
Value of B1 - Coefficient to Multiply with House Area
Cost Function per Value of B1
Randomnly initialized weights
may start here!
28. Closed Form Solution
28
Particularly for Linear Regression, there is a formula named Closed Form Solution that
outputs the best coefficients – this formula is only valid for Linear Regression:
Read as the inverse of the
transpose of the Features
(independent variables) – X’
multiplied by the same matrix
- X
Read as the Transpose of the
Features (independent
variables) – X’ multiplied by
The target variable
(dependent variable) (y)
29. To retain
◉ Regression problems are problems where we want to
predict a continuous variable.
◉ Linear Regression is one of the algorithms to solve that
problem.
◉ A Linear Regression finds the equation that best fit our
points – minimizing the error between the line and the
points.
◉ We can find the best coefficients for our line using
either Gradient Descent or Closed Form Solution.
29
31. lm function
◉ The lm function let us train linear regressions
really quickly in R:
◉ lm(y ~ x1 + x2 + ... + xn, data=dataframe)
◉ For our house prices example: lm(house_price ~
house_area, data = houses)
◉ You can read this as y as a function of x1 and x2 ...
and xn.
31
33. A quick example using
another dataset
33
The mtcars data frame is an
R internal data set that
contains information about
different car models.
Let’s see if we can predict
consumption (mpg) as a
function of cyl, disp, hp, drat
and wt. These are all
columns that characterize
the car – number of
cylinders, horse power, etc.
You can check their
description using ?mtcars
on the R console.
34. Here is the function we will pass to R:
lm(mpg ~ cyl + disp + hp + drat + wt, data = mtcars)
Miles Per Gallon Prediction
34
The output is really cool, it gives us the coefficients (weights) for each variable:
Can you guess the equation that was generated with these coefficients❓
36. The equation generated:
Miles Per Gallon Prediction
36
The cool thing is that the sign of these coefficients point towards the influence of the variable:
- The more cylinders the motor has, the less miles per gallon the car will make (negative weight)
- The more horsepower the motor has, the less miles per gallon the car will make (negative weight)
- The more displacement the car has, the more miles per gallon the car will make. (positive weight)
38. We can call summary to check more information about the regression – we’ll explore more of the output of the
summary command in the practical lectures:
mpg_prediction <- lm(mpg ~ cyl + disp + hp + drat + wt, data = mtcars)
summary(mpg_prediction)
Miles Per Gallon Prediction
38
These signs point us
towards the influence of the
variable – they refer the
significance level of an
hypothesis test
We want these residuals to
be near-normal distributed:
- Median near 0;
- Similar absolute value of
max and min.
39. Assumptions - Linear
Regression
◉ The target and features must have a linear relationship;
◉ No correlated features (extremely hard in real world
scenarios)
◉ Independence of Observations;
◉ Somewhat Normal Residuals;
◉ Homoscedasticity – Constant variance accross the errors of
the predictions;
39
40. You can find me at:
LinkedIn Udemy Medium
40
Want to discover more?
Join my R courses on Udemy risk-free (30 day refund policy), where you will
have the chance to learn with practical exercises:
- R for Absolute Beginners
- R for Data Science
41. Credits
Special thanks to all the people who made and
released these awesome resources for free:
◉ Presentation template by SlidesCarnival
41