2. INTRODUCTION
• Rainfall is the important element of Indian economy.
Although the monsoon effect most part of India, the amount
of rainfall varies from heavy to scanty on different parts.
There is great regional and temporal variation in the
distribution of rainfall. Over 80% of the annual rainfall is
received in the four rainy months of June to September. The
average annual rainfall is about 125 cm, but it has great
spatial variations.
3. The data is collected from
https://www.kaggle.com/rajanand/rainfall.in.india/dat
a
The data consists of monthly rainfall in India from year
1901 to 2014.
The values is in millimeter.
About the Data
5. Objectives of Study:
The main objectives of the study are
Study the trend in rainfall of India over the years 1901 to
2014.
Compare the variation in rainfall among the four south
Indian states.
Analyze the rainfall data using time series models and
forecast for next few years using fitted model.
Compare the rainfall pattern in different states of India.
7. Time Series Models:
Time series is a collection of observation of well defined data items
obtained through repeated measurements over time.
Moving Average Model:
A time series {𝑋𝑡 t=0, ±1,±2,..} is said to be a moving average of order q can be expressed
as,
𝑋𝑡= 𝜀𝑡 - 𝛼1 𝜀(𝑡−1)............- 𝛼 𝑞 𝜀(𝑡−𝑞)
Where {𝜀𝑡} is white noise process and 𝛼1.......𝛼 𝑞 are constants.
Autoregressive Model:
A process {𝑋𝑡} expressed in the form
𝑋𝑡 = 𝛽1 𝑋(𝑡−1) +..................+ 𝛽 𝑝 𝑋(𝑡−𝑝) + 𝜀𝑡 --------------------(2.6.2)
Is referred to as an AR (p) process. Here the 𝛽1.......𝛽 𝑝are constants and {𝜀𝑡} is white
noise process.
8. Autoregressive Moving Average Process:
A time series {𝑋𝑡 , t=0, ±1,±2,..} is said to be a autoregressive moving average of order (p , q )
denoted as ARMA(p , q) can be expressed as,
𝑋𝑡 = 𝛽1 𝑋(𝑡−1) +..................+ 𝛽 𝑝 𝑋(𝑡−𝑝) + 𝜀𝑡 - 𝛼1 𝜀(𝑡−1)............- 𝛼 𝑞 𝜀(𝑡−𝑞)
Here the 𝛽1.......𝛽 𝑝 and 𝛼1.......𝛼 𝑞 are constants and {𝜀𝑡} is white noise process.
Using backward shift operator we can write it as
ф (B)𝑋𝑡 = ϴ (B) Ɛ 𝑡
Where ф (B) = 1 − 𝛽1 𝐵 − 𝛽2 𝐵2
.......−𝛽 𝑝 𝐵 𝑝
And ϴ (B) = 1-𝛼1 𝐵 − 𝛼2 𝐵2.......− 𝛼 𝑞 𝐵 𝑞
Autoregressive Integrated Moving Average Process:
Let {𝑋𝑡, t ∈ I} denotes a non-stationary time series, non-stationary due to trend component. Let
𝛻 denotes the difference operator and let original time series {𝑋𝑡 } is differenced‘d’ times so that
the resulting series is stationary.
i.e, let 𝑍𝑡 = 𝛻 𝑑 𝑋𝑡
Suppose 𝑍𝑡 follows ARMA (p, q) process the original series {𝑋𝑡 , t ∈ I} is said to be
autoregressive integrated moving average process of order (p, d, q).
ф (B) (1 − 𝐵) 𝑑 𝑋𝑡 = ϴ (B) Ɛ 𝑡
This is the representation of ARIMA (p, d, q) process.
9. Seasonal Autoregressive Integrated Moving Average Process:
Consider a time series which contains trend, stochastic seasonal, trend in seasonal we
make use of integrated or multiplicative model written in the form
SARIMA(p,d,q)(P,D,Q) 𝑠
, where p and q are non seasonal ARMA coefficients, d is the
number of differencing required to remove trend, P is number of multiplicative AR
coefficients, Q is number of multiplicative MA coefficients, D is number of differencing
required to remove trend in seasonal, s is seasonal period or distance.
Multiplicative seasonal 𝐴𝑅𝐼𝑀𝐴(𝑝, 𝑑, 𝑞)(𝑃, 𝐷, 𝑄) 𝑠 has the representation,
∅ 𝐵 𝛹 𝐵 𝑠 1 − 𝐵 𝑑 1 − 𝐵 𝑠 𝐷 𝑋𝑡 = 𝜃(𝐵)𝛩(𝐵 𝑠)ε 𝑡
Where
∅ 𝐵 = 1 − 𝛽1 𝐵 − ⋯ 𝛽 𝑝 𝐵 𝑝
𝜃 𝐵 = 1 − 𝛼1 𝐵 − ⋯ 𝛼 𝑞 𝐵 𝑞
∅ 𝐵 𝑠 = 1 − 𝜙1 𝐵 𝑠 − ⋯ 𝜙 𝑝 𝐵 𝑠𝑃
𝛩 𝐵 𝑠
= 1 − 𝛩1 𝐵 𝑠
− ⋯ 𝛩 𝑄 𝐵 𝑠𝑄
10. Testing for the presence of trend component:
Mann-Kendall trend test:
The Mann-Kendall trend test is a nonparametric test used to identify a trend in a series, even if
there is a seasonal component in the series.
𝐻0: there is no trend in the series
𝐻1: there is monotonic trend in the series.
Testing for the presence of seasonality component:
𝐻0: time series is free from seasonal variation
𝐻1: time series contains seasonal variation
Test statistic is 𝜒0
2
=
12 𝑗=1
𝐷
(𝑀 𝑗−
𝐶(𝐷−1)
2
)2
𝐶𝐷(𝐷+1)
~ χ2
(D-1)
D-seasonality periods, C-total number of years, Mj-sum of the ranks for the j th period.
If chi-square calculated is more than chi-square table value we reject 𝐻0 and conclude that
there is seasonal variation is there in the data.
11. Artificial Neural Network:
An Artificial Neural Network is based on a collection of connected units or nodes called artificial
neurons (a simplified version of biological neurons in an animal brain). Each connection between artificial
neurons can transmit a signal from one to another. The artificial neuron that receives the signal can process it and
then signal artificial neurons connected to it. In common ANN implementations, the signal at a connection
between artificial neurons is a real number, and the output of each artificial neuron is calculated by a non-linear
function of the sum of its inputs. Artificial neurons and connections typically have a weight that adjusts as
learning proceeds. The weight increases or decreases the strength of the signal at a connection. Artificial neurons
may have a threshold such that only if the aggregate signal crosses that threshold is the signal sent. Typically,
artificial neurons are organized in layers. Different layers may perform different kinds of transformations on their
inputs. Signals travel from the first (input), to the last (output) layer, possibly after traversing the layers multiple
times.
MLP-Neural Network:
The model of MLP-Neural Network can be written as
𝑦𝑡 = 𝛽0 + 𝑗=1
𝑞
𝛽𝑗 𝑓 𝑖=1
𝑝
𝛾𝑖𝑗 𝑦𝑡−𝑖 + 𝛾0𝑗 + 𝜀𝑡 ,for all t
Where p is the number of input nodes, q is the number of hidden nodes, f is a sigma transfer function such as
the logistic. f(x)=
1
1+𝑒−𝑥 is applied as the non-linear activation function. 𝜀𝑡 is the random shock; 𝛽0and 𝛾0𝑗 are the
bias term.
12. Cluster Analysis:
Cluster analysis or clustering is the task of grouping a set of objects in such a way
that objects in the same group (called a cluster) are more similar (in some sense) to each
other than to those in other groups (clusters). Cluster analysis is more primitive technique
in that no assumption are made concerning the number of groups. Grouping is done on the
basis of similarities or distances (dissimilarities). The inputs required are similarity
measures or data from which similarities can be computed. The primary objective of
cluster analysis is to define the structure of the data by planning the most similar
observations into groups. Here hierarchical clustering is used. It has two types
Agglomerative and Divisive hierarchical method.
Dendrogram:
The results of both agglomerative and divisive method maybe displayed in the form of
two-dimensional diagram known as dendrogram. It is tree like diagram frequently used to
illustrate the arrangement of clusters produced by hierarchical clustering.
13. Analysis and Findings:
Karnataka:
Descriptive statistics for observed data:
State Mean Variance minimum
maximu
m skewness Kurtosis
Karnataka 86.64561 7228.491 0 492.7 0.897548 0.295008
Kerala 244.029 66578.2 0 1526.5 1.30289 1.2518
Tamil Nadu 78.45329 4915.799 0 436.1 1.43313 2.374405
Andhra Pradesh 87.77259 8078.873 0 527.2 1.00761 0.570974
15. ACF plot of rainfall of Karnataka
0 5 10 15 20 25 30
-0.50.00.51.0
Lag
Rainfall
16. Test Statistic 1130.032
Chi square critical value 19.67514
Test for trend:
Mann Kendall test
𝐻0: there is no trend in the series
𝐻1: there is monotonic trend in the series.
tau = 0.0194, 2-sided pvalue =0.2832
Test for seasonality:
𝐻0: time series is free from seasonal variation
𝐻1: time series contains seasonal variation
18. ACF of seasonal differenced series.
0 20 40 60
-0.50.00.51.0
Time point
ACF
19. PACF of first seasonal differenced series.
0 20 40 60
-0.4-0.3-0.2-0.10.00.1
Time point
PACF
20. (P,D,Q)
AIC
p value p value
Box-Pierce
Ljung-Box
2,1,1 14008.43 0.892 0.859
0,1,1 14018.91 0.5.33 0.468
2,1,0 14294.29 .00000000154 4.15*10^-10
1,1,0 14400.83 1.49X10^-14 3.55X10^-15
1,1,1 14009.43 0.817 0.771
Table shows the different SARIMA models:
AIC value is minimum for SARIMA (0, 0, 0) (2, 1, 1) . Thus this is the best fitted model for rainfall of Karnataka.
21. ACF of residuals of SARIMA (0, 0, 0) (2, 1, 1)
0 5 10 15 20 25 30
0.00.20.40.60.81.0
Lag
Residual
22. Forecasted value for the year 2015:
Month
Original
value
Forecasted
value
Jan 1.7 2.084739
Feb 0.2 4.477405
Mar 24.4 12.825288
Apr 80.5 43.870151
May 125.3 91.090671
Jun 218.7 147.156772
Jul 112 240.710109
Aug 136.6 196.709958
Sep 164.5 147.887774
Oct 106.1 138.214296
Nov 138.1 46.427907
Dec 4.4 10.733744
23. Plot of observed series and forecasted series
Year
Rainfall
1995 2000 2005 2010 2015
050100150200250300
24. :
Artificial Neural Network:
Forecasted value:
Month Observed series Forecasted value
Jan 1.7 11.3040296
Feb 0.2 15.7620505
Mar 24.4 24.8821405
Apr 80.5 52.4991536
May 125.3 123.852219
Jun 218.7 154.8783145
Jul 112 296.8545399
Aug 136.6 213.8790608
Sep 164.5 210.278565
Oct 106.1 141.131712
Nov 138.1 42.71626907
Dec 4.4 24.95601203
30. Conclusion:
From the time profile of the data it is clear that data is non-stationary. But for
analysis we need stationary series. The data contain seasonal effect hence by differencing
method we made the data stationary. Fitted the best SARIMA model and Artificial
Neural Network for observed series and forecasted values from fitted time series model
for next one year. Based on error statistics, root mean square error, mean absolute error
and mean absolute percentage error we compare the accuracy between the two models.
The error statistics is minimum for SARIMA model compared to Artificial Neural
Network for the data. Because of the presence of high seasonal effect SARIMA model is
the best. Cluster analysis is done to compare the similarity of different states of India. It
explains that Karnataka , Kerala, Andaman & Andhra Pradesh comes under cluster1,
AssamMeghalaya, NagaManiMizoTripura, HimalayaSikkm comes under cluster2,
WestBengal,Orissa,Jharkhand,Bihar,UttarPradesh,Uttarakhand,HaryanaDelhiChandigarh
belongs to cluster3 and Tamilnadu belongs to cluster4.
31. References:
• Box GEP and Jenkins G.M(1976):Time Series Analysis:Forecasting and
Control,Holden-day,San Franscisco.
• Chatfield C.(1996):The Analysis of Time Series An
Introduction,Chapman & Hall.