This article provides a brief discussion on several statistical parameters that are most commonly used in any measurement and analysis process. There are a plethora of such parameters but the most important and widely used are briefed in here.
2. Contents:
Dispersion (also called Variability, Scatter, Spread)
Coefficient of Dispersion (COD)
Variance
Standard Deviation (SD) σ
Root Mean Squared Error (RMSE)
Absolute Error (AE)
Mean Absolute Error (MAE)
Percentage Error of Estimate (PE)
2
3. Contents:
Mean Square Error (MSE)
Mean Absolute Percentage Error (MAPE)
Mean Absolute Deviation (MAD)/ Mean Ratio
Residuals
Sum of Squares of Error (SSE)
Factor Analysis
Eigen Value (λ)
Eigen Vector
3
4. Dispersion (also called Variability, Scatter, Spread)
It is the extent to which a distribution is stretched or squeezed.
Common examples of Statistical Dispersion are the variance, standard
deviation and interquartile range.
Coefficient of Dispersion (COD)
It is a measure of spread that describes the amount of variability relative
to the mean and it is unit less.
𝑪𝑶𝑫 = 𝝈
𝝁 ∗ 𝟏𝟎𝟎
4
5. Variance:
It is the expectation of the squared deviation of a random variable from
its mean and it informally measures how far a set of random numbers
are spread out from the mean.
It is calculated by taking the differences between each number in the set
and the mean, squaring the differences (to make them positive) and
diving the sum of the squares by the number of values in the set.
The variance provides the user with a numerical measure of the scatter
of the data.
𝝈 𝟐 =
(𝑿 − 𝝁) 𝟐
𝑵
=
𝑿 𝟐
𝑵
− 𝝁 𝟐; 𝝁(𝑴𝒆𝒂𝒏) =
𝑿
𝑵
5
6. Standard Déviation (SD) σ
It is a measure used to quantify the amount of variation or dispersion of
a set of data values.
It is a number that tells how measurement for a group are spread out
from the average (mean) or expected value.
A low standard deviation means most of the numbers are very close to
the average while a high value indicates the data to be spread out.
The SD provides the user with a numerical measure of the scatter of the
data.
𝝈 =
𝟏
𝑵
(𝑿 − 𝝁) 𝟐
6
7. Root Mean Squared Error (RMSE)
It is also termed as Root Mean Square Deviation (RMSD).
It is used to measure the differences between values (sample and
population values) predicted by a model or an estimator and the values
actually observed.
𝑹𝑴𝑺𝑬 =
(𝑿 𝒐𝒃𝒅𝒆𝒓𝒗𝒆𝒅 − 𝑿 𝒎𝒐𝒅𝒆𝒍𝒍𝒆𝒅) 𝟐
𝑵
7
8. Absolute Error (AE)
It is the magnitude of the difference between the exact value and the
approximation.
The relative error is the absolute error divided by the magnitude of the
exact value.
𝑨𝑬 = 𝑿 𝒎𝒆𝒂𝒔𝒖𝒓𝒆𝒅 − 𝑿 𝒂𝒄𝒕𝒖𝒂𝒍
8
9. Mean Absolute Error (MAE)
It is a quantity to measure how close forecasts or predictions are to the
eventual outcomes.
It is an average of the absolute errors.
The simplest measure of forecast accuracy is MAE.
The relative size of error is not always obvious.
𝑴𝑨𝑬 =
𝟏
𝑵
𝒊=𝟏
𝑵
𝑨𝑬
9
10. Percentage Error of Estimate (PE)
It is the difference between the approximate and the exact values as a
percentage of the exact value.
%𝑬𝒓𝒓𝒐𝒓 =
𝑬𝒙𝒂𝒄𝒕 𝑽𝒂𝒍𝒖𝒆 − 𝑨𝒑𝒑𝒓𝒐𝒙𝒊𝒎𝒂𝒕𝒆 𝑽𝒂𝒍𝒖𝒆
𝑬𝒙𝒂𝒄𝒕 𝑽𝒂𝒍𝒖𝒆
∗ 𝟏𝟎𝟎
10
11. Mean Square Error (MSE)
Also termed as Mean Square Deviation (MSD).
It measures the average of the squares of the errors or deviations i.e. the
difference between the estimator and that is estimated.
𝑴𝑺𝑬 =
𝟏
𝑵
𝒊=𝟏
𝑵
(𝝁 − 𝑿𝒊) 𝟐
11
12. Mean Absolute Percentage Error (MAPE)
Also termed as Mean Absolute Percentage Deviation (MAPD)
It is a measure of predicting accuracy of a forecasting method in
statistics.
𝑴𝑨𝑷𝑬 =
𝟏𝟎𝟎
𝑵
𝒊=𝟏
𝑵
𝑨𝒄𝒕𝒖𝒂𝒍 𝑽𝒂𝒍𝒖𝒆 − 𝑭𝒐𝒓𝒆𝒄𝒂𝒔𝒕𝒆𝒅 𝑽𝒂𝒍𝒖𝒆
𝑨𝒄𝒕𝒖𝒂𝒍 𝑽𝒂𝒍𝒖𝒆
12
13. Mean Absolute Deviation (MAD)/ Mean Ratio
Ratio is the relationship between two numbers indicating how many
times the first number contains the second.
It is an alternative to MAPE, better suited to intermittent and low –
volume data.
It is a robust measure of the variability of a univariate sample of
quantitative data.
The MAD of a set of data is the average distance between each data
value and the mean.
𝑴𝑨𝑫 =
𝟏
𝑵
𝒊=𝟏
𝑵
𝑿𝒊 − 𝝁
13
14. Residuals
It is the difference between the observed value of the dependent variable
(y) and the predicted value (y’).
Each data point has one residual. Both the sum and the mean of the
residuals are equal to zero.
𝑹 = 𝑶𝒃𝒔𝒆𝒓𝒗𝒆𝒅 𝒀 𝒗𝒂𝒍𝒖𝒆 − 𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝒀 𝒗𝒂𝒍𝒖𝒆
14
15. Sum of Squares of Error (SSE)
Also termed as Residual Sum of Squares (RSS) and Sum of Squared
Residuals (SSR) or Sum of Squared Errors (SSE).
It is the sum of the squares of residuals (deviations predicted from actual
empirical values of data).
It is a measure of the discrepancy between the data and an estimation
model.
A small SSE or RSS indicates a tight fit of the model to the data.
It is used as an optimal criterion in parameter selection and model
selection.
𝑺𝑺𝑬 =
𝒊=𝟏
𝑵
(𝑿𝒊 − 𝝁) 𝟐
15
16. Factor Analysis
It is a useful tool for investigating variable relationships for complex
concepts allowing researchers to investigate concepts that are not easily
measured directly by collapsing a large number of variables into a few
interpretable underlying factors.
16
17. Eigen Vector
It is a vector which when operated on by a given operator gives a scalar
multiple of itself.
The Eigen vectors e1, e2, …. ep for a matrix A are obtained by solving
the expression
𝑨 − 𝝀𝒋 𝑰 𝒆𝒋 = 𝟎
Here, we have the difference between the matrix A minus the jth Eigen
value times the Identity matrix, this quantity is then multiplied by the jth
Eigen vector and set it all equal to zero.
This will obtain the Eigen vector ej associated with the Eigen value λj.
17
18. Eigen Value (λ)
If we define a P X P matrix A, we are going to have P Eigen values, λ1,
λ2, ……. λp.
These values are obtained by solving the equation
𝑨 − 𝝀𝑰 = 𝟎
On the left hand side, we have the matrix A minus λ times the Identity
matrix.
When we calculate the determinant of the resulting matrix, we end up
with a polynomial of order P.
Setting the polynomial equal to zero, and solving for λ, we obtain the
desired Eigen values.
We will have P solutions in general which are not necessarily unique.
18
19. Eigen Value & Eigen Vector
We will have P solutions in general which are not necessarily unique.
Eigen values are a special set of scalars associated with a linear system
of equations that are sometimes known as characteristics roots,
characteristics values, proper values or latent roots.
The determination of Eigen value and Eigen vector is extremely
important where it is equivalent to matrix diagonalization and arises in
such applications as stability analysis and other.
19
20. Use of Eigen Value and Eigen Vector
Eigen value and Eigen vectors are used for
• Computing prediction and confidence ellipses.
• Principal Components Analysis
• Factor Analysis
20
21. Example: Determination of Eigen Value:
Consider a 2X2 Matrix R
𝑹 =
𝟏 𝝆
𝝆 𝟏
𝑹 − 𝝀𝑰 =
𝟏 𝝆
𝝆 𝟏
−
𝟏 𝟎
𝟎 𝟏
Calculating the determinant with the matrix with 1 – λ on the diagonal and
ρ on the off – diagonal.
𝟏 − 𝝀 𝝆
𝝆 𝟏 − 𝝀
= (𝟏 − 𝝀) 𝟐−𝝆 𝟐 = (𝟏 − 𝝀) 𝟐−𝟐 + 𝟏 − 𝝆 𝟐
21
23. Determination of Eigen Vector:
Next, to obtain the corresponding Eigen vectors, we must solve the
expression
𝑹 − 𝝀𝑰 𝒆 = 𝟎
𝟏 𝝆
𝝆 𝟏
− 𝝀
𝟏 𝟎
𝟎 𝟏
𝒆 𝟏
𝒆 𝟐
=
𝟎
𝟎
→
𝟏 − 𝝀 𝝆
𝝆 𝟏 − 𝝀
𝒆 𝟏
𝒆 𝟐
=
𝟎
𝟎
23
24. Determination of Eigen Vector:
This yields
→ 𝟏 − 𝝀 𝒆 𝟏 + 𝝆𝒆 𝟐 = 𝟎
→ 𝝆𝒆 𝟏 + 𝟏 − 𝝀 𝒆 𝟐 = 𝟎
This does not have a unique solution.
The additional condition is that the sum of the squared values of e1 and
e2 is equal to 1. i.e.
𝒆 𝟏
𝟐
+ 𝒆 𝟐
𝟐
= 𝟏
24
25. Determination of Eigen Vector:
Considering the first equation,
𝟏 − 𝝀 𝒆 𝟏 + 𝝆𝒆 𝟐 = 𝟎
We get
𝒆 𝟐 = −
(𝟏 − 𝝀)
𝝆
𝒆 𝟏
Therefore,
𝒆 𝟏
𝟐
+
(𝟏 − 𝝀) 𝟐
𝝆 𝟐
𝒆 𝟏
𝟐
= 𝟏
Again
𝝀 = 𝟏 ± 𝝆
25