SlideShare une entreprise Scribd logo
1  sur  62
Air passengers
predictions using ARIMA
Table of contents
Introduction
01
Objectives & Dataset
02
Analysis
03
Results
04 05
Limitations Future Scope
06
References
07
Acknowledgements
08
Introduction
Times series and forecasting
TIME SERIES DATA
A time-series data is a set of observations on a quantitative variable
collected over time. For example, historical data on sales, inventory,
customer counts, interest rates, costs, etc.
Businesses are often very interested in forecasting time series variables.
Often, independent variables are not available to build a regression
model of a time series variable that is why time series analysis is used.
TIME SERIES ANALYSIS
• Time series analysis is statistical technique that uses time-series data
for explaining the past or forecasting future events.
• In time series analysis, we analyze the past behavior of a variable in
order to predict its future behavior.
• The prediction is a function of time (days, months, years, etc.)
• No causal variable.
TIME SERIES FORECASTING
Time series forecasting means to forecast or to predict the
future value over a period. It assumes developing models based
on previous data and applying them to make observations and
guide future strategic decisions.
The future is forecast or estimated based on what has already
happened. Time series adds a time order dependence between
observations.
Types of time series
methods used for
forecasting:
• Autoregression (AR)
• Moving Average (MA)
• Autoregressive Moving Average (ARMA)
• Autoregressive Integrated Moving Average (ARIMA)
• Seasonal Autoregressive Integrated Moving-Average (SARIMA).
Stationarity
Stationarity means that the
statistical properties of a
process generating a time
series do not change over
time. ARIMA requires
stationarity.
Requirements for
stationarity:
1. Mean should be
constant
2. Variance should be
constant
3. There is no seasonality
Differencing
Differencing can help stabilize the mean of a time series by removing
changes in the level of a time series, and therefore reducing trend and
seasonality.
AR MODEL
In an autoregression model, we forecast the variable of interest using a
linear combination of past values of the variable. The term autoregression
indicates that it is a regression of the variable against itself.
Thus, an autoregressive model of order p can be written as :
𝒚𝒕 = 𝒄 + 𝝓𝟏 𝒚𝒕−𝟏 + 𝝓𝟐 𝒚𝒕−𝟐 + ⋯ + 𝝓𝒑 𝒚𝒕−𝒑 + 𝜺𝒕
where,
• t = Index denoting time period.
• 𝑦𝑡 =A series of n values measured over a time period.
• ϕ = Coefficient term for each value.
• c = Constant term in equation.
• 𝜀𝑡 = Forecast error for time.
• p= order of AR terms
MA MODEL
Rather than using past values of the forecast variable in a regression, a
moving average model uses past forecast errors in a regression-like
model.
𝒚𝒕 = 𝒄 + 𝜺𝒕 + 𝝓𝟏𝜺𝒕−𝟏 + 𝝓𝟐 𝜺𝒕−𝟐 + ⋯ + 𝝓𝒒 𝜺𝒕−𝒒
where 𝜀𝑡 is white noise. We refer to this as an MA(q) model, a moving
average model of order q . Of course, we do not observe the values of
𝜀𝑡, so it is not really a regression in the usual sense.
ARIMA MODEL
Auto Regressive Integrated Moving Average
ARIMA models
If we combine differencing with autoregression and a moving average
model, we obtain a non-seasonal ARIMA model. ARIMA is an acronym
for Auto Regressive Integrated Moving Average (in this context,
“integration” is the reverse of differencing). The full model can be written
as:
𝒚𝒕
′
= 𝒄 + 𝝓𝟏 𝒚𝒕−𝟏
′
+ 𝝓𝟐𝒚𝒕−𝟐
′
+ ⋯ + 𝝓𝒑𝒚𝒕−𝒑
′
+ 𝝓𝟏 𝜺𝒕−𝟏 + ⋯ + 𝝓𝒒 𝜺𝒕−𝒒 + 𝜺𝒕
where, 𝑦𝑡
′
is the differenced series (it may have been differenced more
than once). The “predictors” on the right-hand side include both lagged
values of 𝑦𝑡 and lagged errors. We call this an ARIMA(p, d, q) model.
Components of ARIMA
‘p’ is the order of the ‘Auto Regressive’
(AR) term. It refers to the number of lags
of Y to be used as predictors.
p
‘q’ is the order of the ‘Moving Average’
(MA) term. It refers to the number of lagged
forecast errors that should go into the
ARIMA Model.
q
‘d’ is the minimum number of
differencing needed to make the series
stationary. And if the time series is
already stationary, then d = 0.
d
Lagging a time series means to shift its
values forward one or more-time steps,
or equivalently, to shift the times in its
index backward one or more steps.
Lags
Seasonal ARIMA models/ SARIMA
ARIMA models are also capable of modelling a wide range of seasonal
data. A seasonal ARIMA model is formed by including additional
seasonal terms in the ARIMA models we have seen so far. It is written
as follows:
where, m= number of observations per year.
Auto Correlation Function
ACF is an (complete) auto-correlation
function which gives us values of auto-
correlation of any series with its lagged
values.
In simple terms, it describes how well
the present value of the series is
related with its past values. A time
series can have components like trend,
seasonality, cyclic and residual. ACF
considers all these components while
finding correlations hence it’s a
‘complete auto-correlation plot’.
Partial Autocorrelation Function
PACF is a partial auto-correlation function. Instead of
finding correlations of present with lags like ACF, it
finds correlation of the residuals (which remains after
removing the effects which are already explained by
the earlier lag(s)) with the next lag value hence
‘partial’ and not ‘complete’ as we remove already
found variations before we find the next correlation.
• The Akaike information criterion (AIC) is a mathematical method for
evaluating how well a model fits the data it was generated from. In
statistics, AIC is used to compare different possible models and
determine which one is the best fit for the data.
• To compare AIC values, we use the following rule: "If a model is
more than 2 AIC units lower than another, then it is considered
significantly better than that model."
• Formula: AIC = 2k – 2ln(L)
• AIC: Akaike information criterion
• k: number of estimated parameters in the model
• L: maximum value of the likelihood function for the model
AIC values and its interpretation
The civil aviation industry in India has emerged as one of the
fastest growing industries in the country during the last three
years. But was affected severely due to the COVID-19 pandemic
in the year 2020-2021. This was due to the nearly complete
restriction of air travel during 2020.
The losses incurred was approximately Rs. 19,564 crores during
the pandemic. This was a big hit to the aviation industry.
The graph here shows, that the number of passengers had gone
down by almost 50% in 2020 compared to 2019.
But since then, the industry has been booming and has been up
by 238 percent in 2021. It is expected that the Indian aviation
industry is set to overtake the UK. Observers say that the sky is
the limit for the aviation sector in India.
Indian aviation sector
WHY SARIMA?
• The difference between ARIMA and Regression is that ARIMA also
takes into account the past errors which regression does not take into
account and in the case of SARIMA, it takes into account the
seasonality of the dataset too.
• ARIMA also trumps the predictions of moving average methods since it
also accounts for the autoregressive part where it uses past values to
predict future values which is not taken into account in any moving
averages methods.
• Since Our data has a seasonality component, it was not stationary as
well and since time is an influencing factor for our prediction we need
to use SARIMA
Requirements for ARIMA & SARIMA
1. Data is Univariate; dataset should contain only one
variable.
2.Data predicted is stationary.
3.The error terms are white noise; they are
independent and identically distributed with no
correlation.
OBJECTIVE 1
Predicting the passengers
per plane for the next 4
years from Jan 2020 if
covid did not happen
OBJECTIVE 2
Predicting the
passengers per plane
for the next 12 months
OBJECTIVES
OBJECTIVE 3
Predicting the losses
suffered due to COVID
through 2020 to 2023
R Studio
Libraries used: readxl, tseries,
forecast
Microsoft Excel
Load data and export into R
and python
OUR RESOURCES
Python
Libraries used: matplotlib,
pandas
Internet
Dataset, YouTube, research
papers, articles
Faculty
Expertise of the professors
Data overview
International departures,
Number of people carried
Variables used
International flights
Secondary data
From Airports Authority of India
Time frame
Jan 2010 – Jul 2022
Domestic Flights
Domestic departures,
Number of people carried
Domestic Data
There is high demand
during holiday seasons
Seasonality
Crisis
Covid 19 crisis hit in 2020
and the people travelling
decreased
Trend
Increasing till 2020
International Data
Seasonality
Crisis
Covid 19 crisis hit in 2020
and the people travelling
decreased
Trend
Increasing steadily since
2010 till 2020
There is high demand
during holiday seasons
DATA
MODELLING
Process of ARIMA Modelling
Checking
stationarity
Using Augmented
Dickey fuler test
Model
Identification
Using auto.arima function
from the tseries library in
R studio and selecting the
model with least AIC Value
Predicting Values
Using the Forecast
library in R studio
ADF Test
• Augmented Dickey Fuller test (ADF Test) is a common statistical test used
to test whether a given Time series is stationary or not. It is also called the
unit root test.
• A Dickey-Fuller test is a unit root test that tests the null hypothesis that α=1
in the following model equation. alpha is the coefficient of the first lag on Y.
• The presence of a unit root means the time series is non-stationary.
• Hypothesis: H0 : α=1 vs H1 : α≠1
• Formula: 𝑦𝑡 = 𝑐 + 𝛽𝑡 + 𝛼𝑦𝑡−1 + 𝜙Δ𝑦𝑡
′
+ 𝑒𝑡
Checking stationarity using ADF Test
H0 : Time series is not stationary
H1 : Time series is stationary
Data P value Accept/reject null Decision
Domestic data 0.2622 Accept null Not stationary
International data 0.3371 Accept null Not stationary
ARIMA & AIC
Domestic
Model
P,D,Q values AIC
values
Model 1 (0,1,0) 1201.687
Model 2 (1,1,1)(0,0,1)[12] 1183.11
Model 3 (0,1,0)(0,0,1)[12] 1199.731
International
model
P,D,Q values AIC values
Model 1 (1,1,3) 1232.58
Model 2 (0,1,2)(1,0,0)[12] 1240.293
Model 3 (2,1,2) 1235.78
These models in red are selected as the best fit models for the respective aviation
types.
These are selected based on their AIC values and the ones in red have the lowest
AIC values out of all models and therefore, they are the best fit.
Domestic ACF & PACF plots
International ACF & PACF plots
• The test determines whether errors are iid or whether there is
something more behind them; whether the autocorrelations for
the errors or residuals are non-zero.
• Essentially, it is a test of lack of fit: if the autocorrelations of
the residuals are very small, we say that the model doesn’t
show ‘significant lack of fit’
• The hypothesis for the Test are given below:
• H0, is that our model does not show lack of fit
• H1, is just that the model does show a lack of fit.
Ljung-box test
Checking fit of the model using Ljung-box test
Lags used P value Reject/accept null Decision
5 0.8784 Accept Null Good Fit
10 0.9678 Accept Null Good Fit
15 0.6549 Accept Null Good Fit
25 0.7494 Accept Null Good Fit
Domestic Data
Checking fit of the model using Ljung-box test
Lags used P value Reject/accept null Decision
5 0.9993 Accept Null Good Fit
10 0.9972 Accept Null Good Fit
15 0.8501 Accept Null Good Fit
25 0.9514 Accept Null Good Fit
International Data
Forecasting Domestic data Using best Model
01-08-2022 116.46
01-09-2022 115.88
01-10-2022 116.70
01-11-2022 117.72
01-12-2022 117.64
01-01-2023 112.85
01-02-2023 117.97
01-03-2023 118.10
01-04-2023 117.28
01-05-2023 119.28
01-06-2023 117.66
01-07-2023 116.85
Forecasting International data using best model
01-08-2022 147.91
01-09-2022 145.86
01-10-2022 145.41
01-11-2022 144.99
01-12-2022 144.60
01-01-2023 144.24
01-02-2023 143.90
01-03-2023 143.58
01-04-2023 143.28
01-05-2023 143.003
01-06-2023 142.74
01-07-2023 142.50
IF COVID
HAD NOT
OCCOURED
Slicing the Dataset
DOMESTIC DATA W/O COVID INTERNATIONAL DATA W/O COVID
Checking stationarity using ADF Test
Data P value Accept/reject null Decision
Domestic data 0.09633 Accept null Not stationary
International data 0.0698 Accept null Not stationary
H0 : Time series is not stationary​
H1 : Time series is stationary​
ARIMA & AIC
Domestic
model
P,D,Q value AIC
Model 1 (4,0,0)(0,1,1)[12] 633.6514
Model 2 (2,0,0)(1,1,2)[12] 640.62
Model 3 (3,0,0)(0,1,1)[12] 635.2183
International
model
P,D,Q value AIC
Model 1 (1,1,2)(0,1,1)[12] 647.2197
Model 2 (3,1,2)(0,1,1)[12] 640.4121
Model 3 (2,1,3)(0,1,1)[12] 638.3682
These models in red are selected as the best fit models for the
respective aviation types.​
These are selected based on their AIC values and the ones in red have
the lowest AIC values out of all models and therefore, they are the best
fit.​
Domestic ACF & PACF plots
International ACF & PACF plots
Checking fit of the model using Ljung-box test
Lags used P value Reject/accept null Decision
5 0.7941 Accept Null Good Fit
10 0.7305 Accept Null Good Fit
15 0.4957 Accept Null Good Fit
25 0.2979 Accept Null Good Fit
Domestic Data
Checking fit of the model using Ljung-box test
Lags used P value Reject/accept null Decision
5 0.8032 Accept Null Good Fit
10 0.7952 Accept Null Good Fit
15 0.749 Accept Null Good Fit
25 0.07807 Accept Null Good Fit
International Data
Forecasting Domestic data using best model
• If covid had not occurred, then
the domestic aviation industry
would have continued thriving
and expanding as we can see
• The Seasonality will
continue as it is
Forecasting international data using best model
• The international aviation industry
stagnates with little to no growth
• There is a certain seasonality
expected
LOSSES
INCURRED
DUE TO COVID
Domestic Losses
• Covid obstructed a blooming
domestic aviation sector
• The loss due to covid is still seen
in 2022 as shown by the gap
between blue and red line
International Losses
• Huge losses were bared due to
covid
• The international travel is almost
back on track as the gap has
almost closed in
• The biggest takeaway from here
is that the industry took 2 years 4
months to recover from a crisis
like travel ban
₹82,24,772
Losses Incurred by a single domestic Plane losing
out customers due to covid!
Monthly domestic losses
• The spikes in losses are due to the
1st and 2nd wave of covid
• The losses have finally plateaued,
and the industry is likely to see
some stability now
A few assumptions
Domestic Loss calculation
Bombay to Delhi is
the busiest domestic
air route
Average price per ticket
for this route is Rs.5000
Ticket is booked one
month prior
No inflation in the
price of tickets
No fluctuation in
price of tickets
₹5,33,39,335
Losses bared by single international plane losing
out customers due to covid
Monthly international losses
• The International Aviation industry
is very unstable
• The international aviation is almost
close to catching up with our
simulation
• Currently the industry is running at
negligible losses
A few assumptions
International Loss calculation
Bombay to Dubai is
the busiest domestic
air route
Average price per ticket
for this route is Rs.35000
Ticket is booked one
month prior
No inflation in the
price of tickets
No fluctuation in
price of tickets
LIMITATIONS
• ARIMA has poor performance for long term
forecasts
• Quite a bit of subjectivity involved in finding P,D,Q
values
• There are better models for prediction of data
• Organized data is hard to find
FUTURE SCOPE
• Predictions for different brands using market share
• Using better models for prediction of data
• Checking the accuracy of ARIMA and SARIMA
predictions by comparing present and predicted
values
[1] International Journal of Applied Engineering Research ISSN 0973-4562 Volume 14,
Number 3 (2019) pp. 646-650, 10.37622/000000
References
[2] International Journal of Innovative Technology and Exploring Engineering
(IJITEE) ISSN: 2278-3075, Volume-8, Issue-11S, September
2019, doi.org/ 10.35940/ijitee.K1216.09811S19
[3] https://doi.org/10.1109/icmlc.2009.5212785
[4] https://www.aai.aero/en/business-opportunities/aai-traffic-news
[5] https://www.dgca.gov.in/digigov-portal/?page=4267/4210/servicename
Acknowledgements
We would like to thank Professor Anwesha, for her patient instruction,
passionate support, and constructive criticisms of this effort.
We would like to thank Dr.Santosha C.D. and Professor Kavya S for giving us
this opportunity and continued support in the development of this project.
We would like to thank our fellow classmates and peers for their valuable
inputs and their help in the project and for patiently listening to us.
Thank You
Akarsh Avinash
Palak Bansal
Sai Teja Dharlapudi

Contenu connexe

Tendances

Conjoint analysis
Conjoint analysisConjoint analysis
Conjoint analysis
Karthik Ram
 
Improving customer satisfaction
Improving customer satisfactionImproving customer satisfaction
Improving customer satisfaction
Martin Trịnh
 

Tendances (20)

Customer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R OpenCustomer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R Open
 
Airline pricing & demand
Airline pricing & demandAirline pricing & demand
Airline pricing & demand
 
Airline revenue management
Airline revenue managementAirline revenue management
Airline revenue management
 
Management of IndiGo Airlines
Management of IndiGo AirlinesManagement of IndiGo Airlines
Management of IndiGo Airlines
 
R and Visualization: A match made in Heaven
R and Visualization: A match made in HeavenR and Visualization: A match made in Heaven
R and Visualization: A match made in Heaven
 
Conjoint analysis
Conjoint analysisConjoint analysis
Conjoint analysis
 
Airline flights delay prediction- 2014 Spring Data Mining Project
Airline flights delay prediction- 2014 Spring Data Mining ProjectAirline flights delay prediction- 2014 Spring Data Mining Project
Airline flights delay prediction- 2014 Spring Data Mining Project
 
Seasonal ARIMA
Seasonal ARIMASeasonal ARIMA
Seasonal ARIMA
 
Time Series - Auto Regressive Models
Time Series - Auto Regressive ModelsTime Series - Auto Regressive Models
Time Series - Auto Regressive Models
 
Timeseries forecasting
Timeseries forecastingTimeseries forecasting
Timeseries forecasting
 
Scheduling and Revenue Management
Scheduling and Revenue ManagementScheduling and Revenue Management
Scheduling and Revenue Management
 
Assignment on swot analysis of airport authority of india
Assignment on swot analysis of airport authority of indiaAssignment on swot analysis of airport authority of india
Assignment on swot analysis of airport authority of india
 
Time Series Analysis Ravi
Time Series Analysis RaviTime Series Analysis Ravi
Time Series Analysis Ravi
 
airside operation 3
airside operation 3airside operation 3
airside operation 3
 
Time series forecasting with ARIMA
Time series forecasting with ARIMATime series forecasting with ARIMA
Time series forecasting with ARIMA
 
Revenue passenger mile & Yield
Revenue passenger mile & YieldRevenue passenger mile & Yield
Revenue passenger mile & Yield
 
Ryanair - Accounting, finance & control project
Ryanair - Accounting, finance & control projectRyanair - Accounting, finance & control project
Ryanair - Accounting, finance & control project
 
Forecasting and methods of forecasting
Forecasting and methods of forecastingForecasting and methods of forecasting
Forecasting and methods of forecasting
 
Improving customer satisfaction
Improving customer satisfactionImproving customer satisfaction
Improving customer satisfaction
 
Forecasting Methods
Forecasting MethodsForecasting Methods
Forecasting Methods
 

Similaire à Air Passenger Prediction Using ARIMA Model

Similaire à Air Passenger Prediction Using ARIMA Model (20)

What is ARIMAX Forecasting and How is it Used for Enterprise Analysis?
What is ARIMAX Forecasting and How is it Used for Enterprise Analysis?What is ARIMAX Forecasting and How is it Used for Enterprise Analysis?
What is ARIMAX Forecasting and How is it Used for Enterprise Analysis?
 
What is ARIMA Forecasting and How Can it Be Used for Enterprise Analysis?
What is ARIMA Forecasting and How Can it Be Used for Enterprise Analysis?What is ARIMA Forecasting and How Can it Be Used for Enterprise Analysis?
What is ARIMA Forecasting and How Can it Be Used for Enterprise Analysis?
 
Time series project
Time series projectTime series project
Time series project
 
Time series modelling arima-arch
Time series modelling  arima-archTime series modelling  arima-arch
Time series modelling arima-arch
 
Different Models Used In Time Series - InsideAIML
Different Models Used In Time Series - InsideAIMLDifferent Models Used In Time Series - InsideAIML
Different Models Used In Time Series - InsideAIML
 
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Ecm time series forecast
Ecm time series forecastEcm time series forecast
Ecm time series forecast
 
Modelling and forecasting of tur production in India usingARIMA model
Modelling and forecasting of tur production in India usingARIMA modelModelling and forecasting of tur production in India usingARIMA model
Modelling and forecasting of tur production in India usingARIMA model
 
IRJET- Analysis of Crucial Oil Gas and Liquid Sensor Statistics and Productio...
IRJET- Analysis of Crucial Oil Gas and Liquid Sensor Statistics and Productio...IRJET- Analysis of Crucial Oil Gas and Liquid Sensor Statistics and Productio...
IRJET- Analysis of Crucial Oil Gas and Liquid Sensor Statistics and Productio...
 
Non-Temporal ARIMA Models in Statistical Research
Non-Temporal ARIMA Models in Statistical ResearchNon-Temporal ARIMA Models in Statistical Research
Non-Temporal ARIMA Models in Statistical Research
 
Study of effectiveness of time series modeling (arima) in forecasting stock p...
Study of effectiveness of time series modeling (arima) in forecasting stock p...Study of effectiveness of time series modeling (arima) in forecasting stock p...
Study of effectiveness of time series modeling (arima) in forecasting stock p...
 
Social_Distancing_DIS_Time_Series
Social_Distancing_DIS_Time_SeriesSocial_Distancing_DIS_Time_Series
Social_Distancing_DIS_Time_Series
 
Reading in the future icelandair1
Reading in the future    icelandair1Reading in the future    icelandair1
Reading in the future icelandair1
 
ARIMA.pptx
ARIMA.pptxARIMA.pptx
ARIMA.pptx
 
On Modeling Murder Crimes in Nigeria
On Modeling Murder Crimes in NigeriaOn Modeling Murder Crimes in Nigeria
On Modeling Murder Crimes in Nigeria
 
Forecasting Techniques - Data Science SG
Forecasting Techniques - Data Science SG Forecasting Techniques - Data Science SG
Forecasting Techniques - Data Science SG
 
Stock Price Prediction using Machine Learning Algorithms: ARIMA, LSTM & Linea...
Stock Price Prediction using Machine Learning Algorithms: ARIMA, LSTM & Linea...Stock Price Prediction using Machine Learning Algorithms: ARIMA, LSTM & Linea...
Stock Price Prediction using Machine Learning Algorithms: ARIMA, LSTM & Linea...
 
Time series analysis
Time series analysisTime series analysis
Time series analysis
 
Quality service management
Quality service managementQuality service management
Quality service management
 
Forecasting Models & Their Applications
Forecasting Models & Their ApplicationsForecasting Models & Their Applications
Forecasting Models & Their Applications
 

Dernier

Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
HyderabadDolls
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
SayantanBiswas37
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 

Dernier (20)

Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 

Air Passenger Prediction Using ARIMA Model

  • 2. Table of contents Introduction 01 Objectives & Dataset 02 Analysis 03 Results 04 05 Limitations Future Scope 06 References 07 Acknowledgements 08
  • 4. TIME SERIES DATA A time-series data is a set of observations on a quantitative variable collected over time. For example, historical data on sales, inventory, customer counts, interest rates, costs, etc. Businesses are often very interested in forecasting time series variables. Often, independent variables are not available to build a regression model of a time series variable that is why time series analysis is used.
  • 5. TIME SERIES ANALYSIS • Time series analysis is statistical technique that uses time-series data for explaining the past or forecasting future events. • In time series analysis, we analyze the past behavior of a variable in order to predict its future behavior. • The prediction is a function of time (days, months, years, etc.) • No causal variable.
  • 6. TIME SERIES FORECASTING Time series forecasting means to forecast or to predict the future value over a period. It assumes developing models based on previous data and applying them to make observations and guide future strategic decisions. The future is forecast or estimated based on what has already happened. Time series adds a time order dependence between observations.
  • 7. Types of time series methods used for forecasting: • Autoregression (AR) • Moving Average (MA) • Autoregressive Moving Average (ARMA) • Autoregressive Integrated Moving Average (ARIMA) • Seasonal Autoregressive Integrated Moving-Average (SARIMA).
  • 8. Stationarity Stationarity means that the statistical properties of a process generating a time series do not change over time. ARIMA requires stationarity. Requirements for stationarity: 1. Mean should be constant 2. Variance should be constant 3. There is no seasonality
  • 9. Differencing Differencing can help stabilize the mean of a time series by removing changes in the level of a time series, and therefore reducing trend and seasonality.
  • 10. AR MODEL In an autoregression model, we forecast the variable of interest using a linear combination of past values of the variable. The term autoregression indicates that it is a regression of the variable against itself. Thus, an autoregressive model of order p can be written as : 𝒚𝒕 = 𝒄 + 𝝓𝟏 𝒚𝒕−𝟏 + 𝝓𝟐 𝒚𝒕−𝟐 + ⋯ + 𝝓𝒑 𝒚𝒕−𝒑 + 𝜺𝒕 where, • t = Index denoting time period. • 𝑦𝑡 =A series of n values measured over a time period. • ϕ = Coefficient term for each value. • c = Constant term in equation. • 𝜀𝑡 = Forecast error for time. • p= order of AR terms
  • 11. MA MODEL Rather than using past values of the forecast variable in a regression, a moving average model uses past forecast errors in a regression-like model. 𝒚𝒕 = 𝒄 + 𝜺𝒕 + 𝝓𝟏𝜺𝒕−𝟏 + 𝝓𝟐 𝜺𝒕−𝟐 + ⋯ + 𝝓𝒒 𝜺𝒕−𝒒 where 𝜀𝑡 is white noise. We refer to this as an MA(q) model, a moving average model of order q . Of course, we do not observe the values of 𝜀𝑡, so it is not really a regression in the usual sense.
  • 12. ARIMA MODEL Auto Regressive Integrated Moving Average
  • 13. ARIMA models If we combine differencing with autoregression and a moving average model, we obtain a non-seasonal ARIMA model. ARIMA is an acronym for Auto Regressive Integrated Moving Average (in this context, “integration” is the reverse of differencing). The full model can be written as: 𝒚𝒕 ′ = 𝒄 + 𝝓𝟏 𝒚𝒕−𝟏 ′ + 𝝓𝟐𝒚𝒕−𝟐 ′ + ⋯ + 𝝓𝒑𝒚𝒕−𝒑 ′ + 𝝓𝟏 𝜺𝒕−𝟏 + ⋯ + 𝝓𝒒 𝜺𝒕−𝒒 + 𝜺𝒕 where, 𝑦𝑡 ′ is the differenced series (it may have been differenced more than once). The “predictors” on the right-hand side include both lagged values of 𝑦𝑡 and lagged errors. We call this an ARIMA(p, d, q) model.
  • 14. Components of ARIMA ‘p’ is the order of the ‘Auto Regressive’ (AR) term. It refers to the number of lags of Y to be used as predictors. p ‘q’ is the order of the ‘Moving Average’ (MA) term. It refers to the number of lagged forecast errors that should go into the ARIMA Model. q ‘d’ is the minimum number of differencing needed to make the series stationary. And if the time series is already stationary, then d = 0. d Lagging a time series means to shift its values forward one or more-time steps, or equivalently, to shift the times in its index backward one or more steps. Lags
  • 15. Seasonal ARIMA models/ SARIMA ARIMA models are also capable of modelling a wide range of seasonal data. A seasonal ARIMA model is formed by including additional seasonal terms in the ARIMA models we have seen so far. It is written as follows: where, m= number of observations per year.
  • 16. Auto Correlation Function ACF is an (complete) auto-correlation function which gives us values of auto- correlation of any series with its lagged values. In simple terms, it describes how well the present value of the series is related with its past values. A time series can have components like trend, seasonality, cyclic and residual. ACF considers all these components while finding correlations hence it’s a ‘complete auto-correlation plot’.
  • 17. Partial Autocorrelation Function PACF is a partial auto-correlation function. Instead of finding correlations of present with lags like ACF, it finds correlation of the residuals (which remains after removing the effects which are already explained by the earlier lag(s)) with the next lag value hence ‘partial’ and not ‘complete’ as we remove already found variations before we find the next correlation.
  • 18. • The Akaike information criterion (AIC) is a mathematical method for evaluating how well a model fits the data it was generated from. In statistics, AIC is used to compare different possible models and determine which one is the best fit for the data. • To compare AIC values, we use the following rule: "If a model is more than 2 AIC units lower than another, then it is considered significantly better than that model." • Formula: AIC = 2k – 2ln(L) • AIC: Akaike information criterion • k: number of estimated parameters in the model • L: maximum value of the likelihood function for the model AIC values and its interpretation
  • 19. The civil aviation industry in India has emerged as one of the fastest growing industries in the country during the last three years. But was affected severely due to the COVID-19 pandemic in the year 2020-2021. This was due to the nearly complete restriction of air travel during 2020. The losses incurred was approximately Rs. 19,564 crores during the pandemic. This was a big hit to the aviation industry. The graph here shows, that the number of passengers had gone down by almost 50% in 2020 compared to 2019. But since then, the industry has been booming and has been up by 238 percent in 2021. It is expected that the Indian aviation industry is set to overtake the UK. Observers say that the sky is the limit for the aviation sector in India. Indian aviation sector
  • 20. WHY SARIMA? • The difference between ARIMA and Regression is that ARIMA also takes into account the past errors which regression does not take into account and in the case of SARIMA, it takes into account the seasonality of the dataset too. • ARIMA also trumps the predictions of moving average methods since it also accounts for the autoregressive part where it uses past values to predict future values which is not taken into account in any moving averages methods. • Since Our data has a seasonality component, it was not stationary as well and since time is an influencing factor for our prediction we need to use SARIMA
  • 21. Requirements for ARIMA & SARIMA 1. Data is Univariate; dataset should contain only one variable. 2.Data predicted is stationary. 3.The error terms are white noise; they are independent and identically distributed with no correlation.
  • 22. OBJECTIVE 1 Predicting the passengers per plane for the next 4 years from Jan 2020 if covid did not happen OBJECTIVE 2 Predicting the passengers per plane for the next 12 months OBJECTIVES OBJECTIVE 3 Predicting the losses suffered due to COVID through 2020 to 2023
  • 23. R Studio Libraries used: readxl, tseries, forecast Microsoft Excel Load data and export into R and python OUR RESOURCES Python Libraries used: matplotlib, pandas Internet Dataset, YouTube, research papers, articles Faculty Expertise of the professors
  • 24. Data overview International departures, Number of people carried Variables used International flights Secondary data From Airports Authority of India Time frame Jan 2010 – Jul 2022 Domestic Flights Domestic departures, Number of people carried
  • 25. Domestic Data There is high demand during holiday seasons Seasonality Crisis Covid 19 crisis hit in 2020 and the people travelling decreased Trend Increasing till 2020
  • 26. International Data Seasonality Crisis Covid 19 crisis hit in 2020 and the people travelling decreased Trend Increasing steadily since 2010 till 2020 There is high demand during holiday seasons
  • 28. Process of ARIMA Modelling Checking stationarity Using Augmented Dickey fuler test Model Identification Using auto.arima function from the tseries library in R studio and selecting the model with least AIC Value Predicting Values Using the Forecast library in R studio
  • 29. ADF Test • Augmented Dickey Fuller test (ADF Test) is a common statistical test used to test whether a given Time series is stationary or not. It is also called the unit root test. • A Dickey-Fuller test is a unit root test that tests the null hypothesis that α=1 in the following model equation. alpha is the coefficient of the first lag on Y. • The presence of a unit root means the time series is non-stationary. • Hypothesis: H0 : α=1 vs H1 : α≠1 • Formula: 𝑦𝑡 = 𝑐 + 𝛽𝑡 + 𝛼𝑦𝑡−1 + 𝜙Δ𝑦𝑡 ′ + 𝑒𝑡
  • 30. Checking stationarity using ADF Test H0 : Time series is not stationary H1 : Time series is stationary Data P value Accept/reject null Decision Domestic data 0.2622 Accept null Not stationary International data 0.3371 Accept null Not stationary
  • 31. ARIMA & AIC Domestic Model P,D,Q values AIC values Model 1 (0,1,0) 1201.687 Model 2 (1,1,1)(0,0,1)[12] 1183.11 Model 3 (0,1,0)(0,0,1)[12] 1199.731 International model P,D,Q values AIC values Model 1 (1,1,3) 1232.58 Model 2 (0,1,2)(1,0,0)[12] 1240.293 Model 3 (2,1,2) 1235.78 These models in red are selected as the best fit models for the respective aviation types. These are selected based on their AIC values and the ones in red have the lowest AIC values out of all models and therefore, they are the best fit.
  • 32. Domestic ACF & PACF plots
  • 33. International ACF & PACF plots
  • 34. • The test determines whether errors are iid or whether there is something more behind them; whether the autocorrelations for the errors or residuals are non-zero. • Essentially, it is a test of lack of fit: if the autocorrelations of the residuals are very small, we say that the model doesn’t show ‘significant lack of fit’ • The hypothesis for the Test are given below: • H0, is that our model does not show lack of fit • H1, is just that the model does show a lack of fit. Ljung-box test
  • 35. Checking fit of the model using Ljung-box test Lags used P value Reject/accept null Decision 5 0.8784 Accept Null Good Fit 10 0.9678 Accept Null Good Fit 15 0.6549 Accept Null Good Fit 25 0.7494 Accept Null Good Fit Domestic Data
  • 36. Checking fit of the model using Ljung-box test Lags used P value Reject/accept null Decision 5 0.9993 Accept Null Good Fit 10 0.9972 Accept Null Good Fit 15 0.8501 Accept Null Good Fit 25 0.9514 Accept Null Good Fit International Data
  • 37. Forecasting Domestic data Using best Model 01-08-2022 116.46 01-09-2022 115.88 01-10-2022 116.70 01-11-2022 117.72 01-12-2022 117.64 01-01-2023 112.85 01-02-2023 117.97 01-03-2023 118.10 01-04-2023 117.28 01-05-2023 119.28 01-06-2023 117.66 01-07-2023 116.85
  • 38. Forecasting International data using best model 01-08-2022 147.91 01-09-2022 145.86 01-10-2022 145.41 01-11-2022 144.99 01-12-2022 144.60 01-01-2023 144.24 01-02-2023 143.90 01-03-2023 143.58 01-04-2023 143.28 01-05-2023 143.003 01-06-2023 142.74 01-07-2023 142.50
  • 40. Slicing the Dataset DOMESTIC DATA W/O COVID INTERNATIONAL DATA W/O COVID
  • 41. Checking stationarity using ADF Test Data P value Accept/reject null Decision Domestic data 0.09633 Accept null Not stationary International data 0.0698 Accept null Not stationary H0 : Time series is not stationary​ H1 : Time series is stationary​
  • 42. ARIMA & AIC Domestic model P,D,Q value AIC Model 1 (4,0,0)(0,1,1)[12] 633.6514 Model 2 (2,0,0)(1,1,2)[12] 640.62 Model 3 (3,0,0)(0,1,1)[12] 635.2183 International model P,D,Q value AIC Model 1 (1,1,2)(0,1,1)[12] 647.2197 Model 2 (3,1,2)(0,1,1)[12] 640.4121 Model 3 (2,1,3)(0,1,1)[12] 638.3682 These models in red are selected as the best fit models for the respective aviation types.​ These are selected based on their AIC values and the ones in red have the lowest AIC values out of all models and therefore, they are the best fit.​
  • 43. Domestic ACF & PACF plots
  • 44. International ACF & PACF plots
  • 45. Checking fit of the model using Ljung-box test Lags used P value Reject/accept null Decision 5 0.7941 Accept Null Good Fit 10 0.7305 Accept Null Good Fit 15 0.4957 Accept Null Good Fit 25 0.2979 Accept Null Good Fit Domestic Data
  • 46. Checking fit of the model using Ljung-box test Lags used P value Reject/accept null Decision 5 0.8032 Accept Null Good Fit 10 0.7952 Accept Null Good Fit 15 0.749 Accept Null Good Fit 25 0.07807 Accept Null Good Fit International Data
  • 47. Forecasting Domestic data using best model • If covid had not occurred, then the domestic aviation industry would have continued thriving and expanding as we can see • The Seasonality will continue as it is
  • 48. Forecasting international data using best model • The international aviation industry stagnates with little to no growth • There is a certain seasonality expected
  • 50. Domestic Losses • Covid obstructed a blooming domestic aviation sector • The loss due to covid is still seen in 2022 as shown by the gap between blue and red line
  • 51. International Losses • Huge losses were bared due to covid • The international travel is almost back on track as the gap has almost closed in • The biggest takeaway from here is that the industry took 2 years 4 months to recover from a crisis like travel ban
  • 52. ₹82,24,772 Losses Incurred by a single domestic Plane losing out customers due to covid!
  • 53. Monthly domestic losses • The spikes in losses are due to the 1st and 2nd wave of covid • The losses have finally plateaued, and the industry is likely to see some stability now
  • 54. A few assumptions Domestic Loss calculation Bombay to Delhi is the busiest domestic air route Average price per ticket for this route is Rs.5000 Ticket is booked one month prior No inflation in the price of tickets No fluctuation in price of tickets
  • 55. ₹5,33,39,335 Losses bared by single international plane losing out customers due to covid
  • 56. Monthly international losses • The International Aviation industry is very unstable • The international aviation is almost close to catching up with our simulation • Currently the industry is running at negligible losses
  • 57. A few assumptions International Loss calculation Bombay to Dubai is the busiest domestic air route Average price per ticket for this route is Rs.35000 Ticket is booked one month prior No inflation in the price of tickets No fluctuation in price of tickets
  • 58. LIMITATIONS • ARIMA has poor performance for long term forecasts • Quite a bit of subjectivity involved in finding P,D,Q values • There are better models for prediction of data • Organized data is hard to find
  • 59. FUTURE SCOPE • Predictions for different brands using market share • Using better models for prediction of data • Checking the accuracy of ARIMA and SARIMA predictions by comparing present and predicted values
  • 60. [1] International Journal of Applied Engineering Research ISSN 0973-4562 Volume 14, Number 3 (2019) pp. 646-650, 10.37622/000000 References [2] International Journal of Innovative Technology and Exploring Engineering (IJITEE) ISSN: 2278-3075, Volume-8, Issue-11S, September 2019, doi.org/ 10.35940/ijitee.K1216.09811S19 [3] https://doi.org/10.1109/icmlc.2009.5212785 [4] https://www.aai.aero/en/business-opportunities/aai-traffic-news [5] https://www.dgca.gov.in/digigov-portal/?page=4267/4210/servicename
  • 61. Acknowledgements We would like to thank Professor Anwesha, for her patient instruction, passionate support, and constructive criticisms of this effort. We would like to thank Dr.Santosha C.D. and Professor Kavya S for giving us this opportunity and continued support in the development of this project. We would like to thank our fellow classmates and peers for their valuable inputs and their help in the project and for patiently listening to us.
  • 62. Thank You Akarsh Avinash Palak Bansal Sai Teja Dharlapudi