SlideShare a Scribd company logo
1 of 95
WELCOME TO MY
PRESENTATION
ON
STATISTICAL DISTANCE
Md. Menhazul Abedin
M.Sc. Student
Dept. of Statistics
Rajshahi University
Mob: 01751385142
Email: menhaz70@gmail.com
Objectives
โ€ข To know about the meaning of statistical
distance and itโ€™s relation and difference with
general or Euclidean distance
Content
๏ถDefinition of Euclidean distance
๏ถConcept & intuition of statistical distance
๏ถDefinition of Statistical distance
๏ถNecessity of statistical distance
๏ถConcept of Mahalanobis distance (population
&sample)
๏ถDistribution of Mahalanobis distance
๏ถMahalanobis distance in R
๏ถAcknowledgement
Euclidean Distance from origin
(0,0)
(X,Y)
X
Y
Euclidean Distance
P(X,Y)
Y
O (0,0) X
By Pythagoras
๐‘‘(๐‘œ, ๐‘) = ๐‘‹2 + ๐‘Œ2
Euclidean Distance
Specific point
we see that two specific points in each picture
Our problem is to determine the length between
two points .
But how ??????????
Assume that these pictures are placed in two
dimensional spaces and points are joined by a
straight line
Let 1st point is (๐‘ฅ1,๐‘ฆ1) and 2nd point is (๐‘ฅ2, ๐‘ฆ2)
then distance is
D= โˆš ( (๐‘ฅ1โˆ’๐‘ฅ2)2
+ (๐‘ฆ1 โˆ’ ๐‘ฆ2)2
)
What will be happen when dimension is three
Distanse in ๐‘…3
Distance is given by
โ€ข Points are (x1,x2,x3) and (y1,y2,y3)
(๐‘ฅ1 โˆ’ ๐‘ฆ1)2+(๐‘ฅ2 โˆ’ ๐‘ฆ2)2+(๐‘ฅ3 โˆ’ ๐‘ฆ3)2
For n dimension it can be written
as the following expression and
named as Euclidian distance
22
22
2
11
2121
)()()(),(
),,,(),,,,(
pp
pp
yxyxyxQPd
yyyQxxxP
๏€ญ๏€ซ๏€ซ๏€ญ๏€ซ๏€ญ๏€ฝ ๏Œ
๏Œ๏Œ
12/12/2016 14
Properties of Euclidean Distance and
Mathematical Distance
โ€ข Usual human concept of distance is Eucl. Dist.
โ€ข Each coordinate contributes equally to the distance
22
22
2
11
2121
)()()(),(
),,,(),,,,(
pp
pp
yxyxyxQPd
yyyQxxxP
๏€ญ๏€ซ๏€ซ๏€ญ๏€ซ๏€ญ๏€ฝ ๏Œ
๏Œ๏Œ
14
Mathematicians, generalizing its three properties ,
1) d(P,Q)=d(Q,P).
2) d(P,Q)=0 if and only if P=Q and
3) d(P,Q)=<d(P,R)+d(R,Q) for all R, define distance
on any set.
P(X1,Y1) Q(X2,Y2)
R(Z1,Z2))
R(Z1,Z2)
Taxicab Distance :NotionRed:
Manh
attan
distan
ce.
Green:
diagonal,
straight-
line
distance
Blue,
yello
w:
equiv
alent
Man
hatta
n
dista
nces.
โ€ข The Manhattan distance is the simple sum of
the horizontal and vertical components,
whereas
the diagonal distance might be computed by
applying the Pythagorean Theorem .
โ€ข Red: Manhattan distance.
โ€ข Green: diagonal, straight-line distance.
โ€ข Blue, yellow: equivalent Manhattan distances.
โ€ข Manhattan distance 12 unit
โ€ข Diagonal or straight-line distance or Euclidean
distance is 62 + 62 =6โˆš2
We observe that Euclidean distance is less than
Manhattan distance
Taxicab/Manhattan distance :Definition
(p1,p2))
(q1,q2)
โ”‚๐‘1 โˆ’ ๐‘ž2โ”‚
โ”‚p2-q2โ”‚
Manhattan Distance
โ€ข The taxicab distance between
(p1,p2) and (q1,q2)
is โ”‚p1-q1โ”‚+โ”‚p2-q2โ”‚
Relationship between Manhattan &
Euclidean distance.
7 Block
6 Block
Relationship between Manhattan &
Euclidean distance.
โ€ข It now seems that the distance from A to C is 7 blocks,
while the distance from A to B is 6 blocks.
โ€ข Unless we choose to go off-road, B is now closer to A
than C.
โ€ข Taxicab distance is sometimes equal to Euclidean
distance, but otherwise it is greater than Euclidean
distance.
Euclidean distance <Taxicab distance
Is it true always ???
Or for n dimension ???
Proofโ€ฆโ€ฆ..
Absolute values guarantee non-negative value
Addition property of inequality
Continuedโ€ฆโ€ฆโ€ฆ..
Continuedโ€ฆโ€ฆโ€ฆ..
For high dimension
โ€ข It holds for high dimensional case
โ€ข ฮฃ โ”‚๐‘ฅ๐‘– โˆ’ ๐‘ฆ๐‘–โ”‚2
โ‰ค ฮฃ โ”‚๐‘ฅ๐‘– โˆ’ ๐‘ฆ๐‘–โ”‚2
+ 2ฮฃโ”‚๐‘ฅ๐‘– โˆ’ ๐‘ฅ๐‘–โ”‚โ”‚๐‘ฅ๐‘— โˆ’ ๐‘ฅ๐‘—โ”‚
Which implies
ฮฃ (๐‘ฅ๐‘– โˆ’ ๐‘ฆ๐‘–)2 โ‰ค ฮฃโ”‚๐‘ฅ๐‘– โˆ’ ๐‘ฅ๐‘—โ”‚
๐‘‘ ๐ธ โ‰ค ๐‘‘ ๐‘‡
12/12/2016
Statistical Distance
โ€ข Weight coordinates subject to a great deal of
variability less heavily than those that are not
highly variable
Whoisnearerto
datasetifitwere
point?
Same
distance from
origin
โ€ข Here
variability in x1 axis > variability in x2 axis
๏ฑ Is the same distance meaningful from
origin ???
Ans: no
But, how we take into account the different
variability ????
Ans : Give different weights on axes.
12/12/2016
Statistical Distance for Uncorrelated Data
๏€จ ๏€ฉ ๏€จ ๏€ฉ
22
2
2
11
2
12*
2
2*
1
222
*
2111
*
1
21
),(
/,/
)0,0(),,(
s
x
s
x
xxPOd
sxxsxx
OxxP
๏€ซ๏€ฝ๏€ซ๏€ฝ
๏€ฝ๏€ฝ
weight
Standardization
all point that have coordinates (x1,x2) and
are a constant squared distance , c2
from the
origin must satisfy
๐‘ฅ12
๐‘ 11
+
๐‘ฅ22
๐‘ 22
=๐‘2
But โ€ฆ how to choose c ?????
Itโ€™s a problem
Choose c as 95% observation fall in this area โ€ฆ.
๐‘ 11 > ๐‘ 22
= >
1
๐‘ 11
<
1
๐‘ 22
12/12/2016
Ellipse of Constant Statistical Distance for
Uncorrelated Data
11sc๏€ญ 11sc
22sc
22sc๏€ญ
x1
x2
0
โ€ข This expression can be generalized as โ€ฆโ€ฆโ€ฆ
statistical distance from an arbitrary point
P=(x1,x2) to any fixed point Q=(y1,y2)
;lk;lk;
For P dimensionโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ..
Remark :
1) The distance of P to the origin O is
obtain by setting all ๐‘ฆ๐‘– = 0
2) If all ๐‘ ๐‘–๐‘– are equal Euclidean
distance formula is appropriate
Scattered Plot for
Correlated Measurements
โ€ข How do you measure the statistical distance of
the above data set ??????
โ€ข Ans : Firstly make it uncorrelated .
โ€ข But why and howโ€ฆโ€ฆโ€ฆ???????
โ€ข Ans: Rotate the axis keeping origin fixed.
12/12/2016
Scattered Plot for
Correlated Measurements
Rotation of axes keeping origin fixed
O M R
X1
N
Q
๐‘ฅ1
P(x1,x2)
x2
๐‘ฅ2
๐œƒ
๐œƒ
x=OM
=OR-MR
= ๐‘ฅ1 cos๐œƒ โ€“ ๐‘ฅ2 sin๐œƒ โ€ฆโ€ฆ. (i)
y=MP
=QR+NP
= ๐‘ฅ1 sin๐œƒ + ๐‘ฅ2 cos๐œƒ โ€ฆโ€ฆโ€ฆ.(ii)
โ€ข The solution of the above equations
Choice of ๐œƒ
๏ถWhat ๐œƒ will you choice ?
๏ถHow will you do it ?
๏ถ Data matrix โ†’ Centeralized data matrix โ†’ Covariance of
data matrix โ†’ Eigen vector
๏ถTheta = angle between 1st eigen vector and [1,0]
or
angle between 2nd eigen vector and [0,1]
Why is that angle between 1st eigen vector and
[0,1] or angle between 2nd eigen vector and [1,0]
??
Ans: Let B be a (p by p) positive definite matrix
with eigenvalues ฮป1โ‰ฅฮป2โ‰ฅฮป3โ‰ฅ โ€ฆ โ€ฆ . . โ‰ฅ ฮปp>0
and associated normalized eigenvectors
๐‘’1, ๐‘’2, โ€ฆ โ€ฆ โ€ฆ , ๐‘’ ๐‘.Then
๐‘š๐‘Ž๐‘ฅ ๐‘ฅโ‰ 0
๐‘ฅโ€ฒ ๐ต๐‘ฅ
๐‘ฅโ€ฒ ๐‘ฅ
= ฮป1 attained when x= ๐‘’1
๐‘š๐‘–๐‘› ๐‘ฅโ‰ 0
๐‘ฅโ€ฒ ๐ต๐‘ฅ
๐‘ฅโ€ฒ ๐‘ฅ
= ฮป ๐‘ attained when x= ๐‘’ ๐‘
๐‘š๐‘Ž๐‘ฅ ๐‘ฅโŠฅ๐‘’1,๐‘’2,โ€ฆ,๐‘’ ๐‘˜
๐‘ฅโ€ฒ ๐ต๐‘ฅ
๐‘ฅโ€ฒ ๐‘ฅ
= ฮป ๐‘˜+1 attained when
x= ๐‘’ ๐‘˜+1 , k = 1,2, โ€ฆ , p โˆ’ 1.
Choice of ๐œƒ
#### Excercise 16.page(309).Heights in inches (x) &
Weights in pounds(y). An Introduction to Statistics
and Probability M.Nurul Islam #######
x=c(60,60,60,60,62,62,62,64,64,64,66,66,66,66,68,
68,68,70,70,70);x
y=c(115,120,130,125,130,140,120,135,130,145,135
,170,140,155,150,160,175,180,160,175);y
############
V=eigen(cov(cdata))$vectors;V
as.matrix(cdata)%*%V
plot(x,y)
data=data.frame(x,y);data
as.matrix(data)
colMeans(data)
xmv=c(rep(64.8,20));xmv ### x mean vector
ymv=c(rep(144.5,20));ymv ### y mean vector
meanmatrix=cbind(xmv,ymv);meanmatrix
cdata=data-meanmatrix;cdata
### mean centred data
plot(cdata) abline(h=0,v=0)
cor(cdata)
โ€ข ##################
cov(cdata)
eigen(cov( cdata))
xx1=c(1,0);xx1
xx2=c(0,1);xx2
vv1=eigen(cov(cdata))$vectors[,1];vv1
vv2=eigen(cov(cdata))$vectors[,2];vv2
################
theta = acos( sum(xx1*vv1) / ( sqrt(sum(xx1 * xx1)) *
sqrt(sum(vv1 * vv1)) ) );theta
theta = acos( sum(xx2*vv2) / ( sqrt(sum(xx2 * xx2)) *
sqrt(sum(vv2 * vv2)) ) );theta
###############
xx=cdata[,1]*cos( 1.41784)+cdata[,2]*sin( 1.41784);xx
yy=-cdata[,1]*sin( 1.41784)+cdata[,2]*cos( 1.41784);yy
plot(xx,yy)
abline(h=0,v=0)
V=eigen(cov(cdata))$vectors;V
tdata=as.matrix(cdata)%*%V;tdata
### transformed data
cov(tdata)
round(cov(tdata),14)
cor(tdata)
plot(tdata)
abline(h=0,v=0)
round(cor(tdata),16)
โ€ข ################ comparison of both
method ############
comparison=tdata -
as.matrix(cbind(xx,yy));comparison
round(comparison,4)
########### using package. md from original data #####
md=mahalanobis(data,colMeans(data),cov(data),inverted =F);md
## md =mahalanobis distance
######## mahalanobis distance from transformed data ########
tmd=mahalanobis(tdata,colMeans(tdata),cov(tdata),inverted =F);tmd
###### comparison ############
md-tmd
Mahalanobis distance : Manually
mu=colMeans(tdata);mu
incov=solve(cov(tdata));incov
md1=t(tdata[1,]-mu)%*%incov%*%(tdata[1,]-
mu);md1
md2=t(tdata[2,]-mu)%*%incov%*%(tdata[2,]-
mu);md2
md3=t(tdata[3,]-mu)%*%incov%*%(tdata[3,]-
mu);md3
............. โ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ. โ€ฆโ€ฆโ€ฆโ€ฆ..
md20=t(tdata[20,]-mu)%*%incov%*%(tdata[20,]-
mu);md20
md for package and manully are equal
tdata
s1=sd(tdata[,1]);s1
s2=sd(tdata[,2]);s2
xstar=c(tdata[,1])/s1;xstar
ystar=c(tdata[,2])/s2;ystar
md1=sqrt((-1.46787309)^2 + (0.1484462)^2);md1
md2=sqrt((-1.22516896 )^2 + ( 0.6020111 )^2);md2
โ€ฆโ€ฆโ€ฆ. โ€ฆโ€ฆโ€ฆโ€ฆ โ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ..
Not equal to above distancesโ€ฆโ€ฆ..
Why ???????
Take into account mean
12/12/2016
Statistical Distance under Rotated
Coordinate System
2
2222112
2
111
212
211
22
2
2
11
2
1
21
2),(
cossin~
sincos~
~
~
~
~
),(
)~,~(),0,0(
xaxxaxaPOd
xxx
xxx
s
x
s
x
POd
xxPO
๏€ซ๏€ซ๏€ฝ
๏€ซ๏€ญ๏€ฝ
๏€ซ๏€ฝ
๏€ซ๏€ฝ
๏ฑ๏ฑ
๏ฑ๏ฑ
๐‘ 11 ๐‘ 22 are
sample
variances
โ€ข After some manipulation this can be written
in terms of origin variables
Whereas
Proofโ€ฆโ€ฆโ€ฆโ€ฆ
โ€ข ๐‘ 11=
1
๐‘›โˆ’1
ฮฃ( ๐‘ฅ1 โˆ’ ๐‘ฅ1 )
2
=
1
๐‘›โˆ’1
ฮฃ (๐‘ฅ1 cos ๐œƒ + ๐‘ฅ2 sin ๐œƒ โˆ’ ๐‘ฅ1 cos ๐œƒ โˆ’ ๐‘ฅ2 sin ๐œƒ )2
= ๐‘๐‘œ๐‘ 2(๐œƒ)๐‘ 11 + 2 sin ๐œƒ cos ๐œƒ ๐‘ 12 + ๐‘ ๐‘–๐‘›2(๐œƒ)๐‘ 22
๐‘ 22 =
1
๐‘›โˆ’1
ฮฃ( ๐‘ฅ2 โˆ’ ๐‘ฅ2 )
2
= ฮฃ
1
๐‘›โˆ’1
( โˆ’ ๐‘ฅ1 sin ๐œƒ + ๐‘ฅ2 cos ๐œƒ + ๐‘ฅ1 sin(๐œƒ) + ๐‘ฅ2 cos ๐œƒ ) 2
= ๐‘๐‘œ๐‘ 2(๐œƒ)๐‘ 22 - 2 sin ๐œƒ cos ๐œƒ ๐‘ 12 + ๐‘ ๐‘–๐‘›2(๐œƒ)๐‘ 11
Continuedโ€ฆโ€ฆโ€ฆโ€ฆ.
๐‘‘(๐‘‚, ๐‘ƒ)=
(๐‘ฅ1cos ๐œƒ + ๐‘ฅ2 sin ๐œƒ) 2
๐‘ 11
+
(โˆ’ ๐‘ฅ1 sin ๐œƒ + ๐‘ฅ2 cos ๐œƒ)2
๐‘ 22
Continuedโ€ฆโ€ฆโ€ฆโ€ฆ.
12/12/2016
General Statistical Distance
)])((2
))((2))((2
)(
)()([
),(
]222
[
),(
),,,(),0,,0,0(),,,,(
11,1
331113221112
2
2
2222
2
1111
1,131132112
22
222
2
111
2121
pppppp
pppp
pppp
ppp
pp
yxyxa
yxyxayxyxa
yxa
yxayxa
QPd
xxaxxaxxa
xaxaxa
POd
yyyQOxxxP
๏€ญ๏€ญ๏€ซ๏€ซ
๏€ญ๏€ญ๏€ซ๏€ญ๏€ญ
๏€ซ๏€ญ
๏€ซ๏€ซ๏€ญ๏€ซ๏€ญ
๏€ฝ
๏€ซ๏€ซ๏€ซ
๏€ซ๏€ซ๏€ซ๏€ซ
๏€ฝ
๏€ญ๏€ญ๏€ญ
๏€ญ๏€ญ
๏Œ
๏Œ
๏Œ
๏Œ
๏Œ๏Œ๏Œ
โ€ข The above distances are completely
determined by the coefficients(weights)
๐‘Ž๐‘–๐‘˜ ; i, k = 1,2,3, โ€ฆ โ€ฆ โ€ฆ p. These are can be
arranged in rectangular array as
this array (matrix) must be symmetric positive
definite.
Why Positive definite ????
Let A be a positive definite matrix .
A=Cโ€™C
Xโ€™AX= Xโ€™Cโ€™CX = (CX)โ€™(CX) = Yโ€™Y It obeys
all the distance property.
Xโ€™AX is distance ,
For different A it gives different distance .
โ€ข Why positive definite matrix ????????
โ€ข Ans: Spectral decomposition : the spectral
decomposition of a kร—k symmetric matrix
A is given by
โ€ข Where (ฮป๐‘–, ๐‘’๐‘–); ๐‘– = 1,2, โ€ฆ โ€ฆ โ€ฆ , ๐‘˜ are pair of
eigenvalues and eigenvectors.
And ฮป1 โ‰ฅ ฮป2 โ‰ฅ ฮป3 โ‰ฅ โ€ฆ โ€ฆ . . And if pd ฮป๐‘– > 0
& invertible .
4.0 4.5 5.0 5.5 6.0
2
3
4
5
ฮป1
ฮป2
๐‘’1
๐‘’2
โ€ข Suppose p=2. The distance from origin is
By spectral decomposition
X1
X2
๐ถ
โˆšฮป1
๐ถ
โˆšฮป2
Another property is
Thus
We use this property in Mahalanobis distance
12/12/2016
Necessity of Statistical Distance
Center of
gravity
Another
point
โ€ข Consider the Euclidean distances from the
point Q to the points P and the origin O.
โ€ข Obviously d(PQ) > d (QO )
๏ฑ But, P appears to be more like the points in
the cluster than does the origin .
๏ฑ If we take into account the variability of the
points in cluster and measure distance by
statistical distance , then Q will be closer to P
than O .
Mahalanobis distance
โ€ข The Mahalanobis distance is a descriptive
statistic that provides a relative measure of a
data point's distance from a common point. It
is a unitless measure introduced by P. C.
Mahalanobis in 1936
Intuition of Mahalanobis Distance
โ€ข Recall the eqution
d(O,P)= ๐‘ฅโ€ฒ ๐ด๐‘ฅ
=> ๐‘‘2
(๐‘‚, ๐‘ƒ) =๐‘ฅโ€ฒ
๐ด๐‘ฅ
Where x=
๐‘ฅ1
๐‘ฅ2
, A=
๐‘Ž11 ๐‘Ž12
๐‘Ž21 ๐‘Ž22
Intuition of Mahalanobis Distance
d(O,P)= ๐‘ฅโ€ฒ ๐ด๐‘ฅ
๐‘‘2
๐‘‚, ๐‘ƒ = ๐‘ฅโ€ฒ
๐ด๐‘ฅ
Where ๐‘ฅโ€ฒ
= ๐‘ฅ1 ๐‘ฅ2 ๐‘ฅ3 โ‹ฏ ๐‘ฅ ๐‘ ; A=
Intuition of Mahalanobis Distance
๐‘‘2
(๐‘ƒ, ๐‘„) = ๐‘ฅ โˆ’ ๐‘ฆ โ€ฒ
๐ด(๐‘ฅ โˆ’ ๐‘ฆ)
where, ๐‘ฅโ€ฒ
= ๐‘ฅ1, ๐‘ฅ2, โ€ฆ , ๐‘ฅ ๐‘ ; ๐‘ฆโ€ฒ
= (๐‘ฆ1, ๐‘ฆ2, โ€ฆ ๐‘ฆ๐‘)
A=
Mahalanobis Distance
โ€ข Mahalanobis used ,inverse of covariance
matrix ฮฃ instead of A
โ€ข Thus ๐‘‘2
๐‘‚, ๐‘ƒ = ๐‘ฅโ€ฒ
ฮฃโˆ’1
๐‘ฅ โ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ..(1)
โ€ข And used ๐œ‡ (๐‘๐‘’๐‘›๐‘ก๐‘’๐‘Ÿ ๐‘œ๐‘“ ๐‘”๐‘Ÿ๐‘Ž๐‘ฃ๐‘–๐‘ก๐‘ฆ ) instead of y
๐‘‘2
(๐‘ƒ, ๐‘„) = (๐‘ฅ โˆ’ ๐œ‡ )โ€ฒฮฃโˆ’1
(๐‘ฅ โˆ’ ๐œ‡)โ€ฆโ€ฆโ€ฆ..(2)
Mah-
alan-
obis
dist-
ance
Mahalanobis Distance
โ€ข The above equations are nothing but
Mahalanobis Distance โ€ฆโ€ฆ
โ€ข For example, suppose we took a single
observation from a bivariate population with
Variable X and Variable Y, and that our two
variables had the following characteristics
โ€ข single observation, X = 410 and Y = 400
The Mahalanobis distance for that single value
as:
โ€ข ghk
1.825
โ€ข Therefore, our single observation would have
a distance of 1.825 standardized units from
the mean (mean is at X = 500, Y = 500).
โ€ข If we took many such observations, graphed
them and colored them according to their
Mahalanobis values, we can see the elliptical
Mahalanobis regions come out
โ€ข The points are actually distributed along two
primary axes:
If we calculate Mahalanobis distances for each
of these points and shade them according to
their distance value, we see clear elliptical
patterns emerge:
โ€ข We can also draw actual ellipses at regions of
constant Mahalanobis values:
68%
obs
95%
obs
99.7%
obs
โ€ข Which ellipse do you choose ??????
๏ฑAns : Use the 68-95-99.7 rule .
1) about two-thirds (68%) of the points should
be within 1 unit of the origin (along the axis).
2) about 95% should be within 2 units
3)about 99.7 should be within 3 units
If
normal
Sample Mahalanobis Distancce
โ€ข The sample Mahalanobis distance is made by
replacing ฮฃ by S and ๐œ‡ by ๐‘‹
โ€ข i.e (X- ๐‘‹)โ€™ ๐‘†โˆ’1
(X- ๐‘‹)
For sample
(X- ๐‘ฟ)โ€™ ๐‘บโˆ’๐Ÿ
(X- ๐‘ฟ)โ‰ค ๐Œ ๐Ÿ
๐’‘ (โˆ)
Distribution of mahalanobis distance
Distribution of mahalanobis distance
Let ๐‘‹1, ๐‘‹2, ๐‘‹3, โ€ฆ โ€ฆ โ€ฆ , ๐‘‹ ๐‘› be in dependent
observation from
any population with
mean ๐œ‡ and finite (nonsingular) covariance ฮฃ .
Then
๏ถ ๐‘› ( ๐‘‹ โˆ’ ๐œ‡) is approximately ๐‘๐‘(0, ฮฃ)
and
๏ถ ๐‘› ๐‘‹ โˆ’ ๐œ‡ โ€ฒ
๐‘†โˆ’1
( ๐‘‹ โˆ’ ๐œ‡) is approximately ฯ‡ ๐‘
2
for n-p large
This is nothing but central limit theorem
Mahalanobis distance in R
โ€ข ########### Mahalanobis Distance ##########
โ€ข x=rnorm(100);x
โ€ข dm=matrix(x,nrow=20,ncol=5,byrow=F);dm ##dm = data matrix
โ€ข cm=colMeans(dm);cm ## cm= column means
โ€ข cov=cov(dm);cov ##cov = covariance matrix
โ€ข incov=solve(cov);incov ##incov= inverse of
covarianc matrix
Mahalanobis distance in R
โ€ข ####### MAHALANOBIS DISTANCE : MANUALY ######
โ€ข @@@ Mahalanobis distance of first
โ€ข observation@@@@@@
โ€ข ob1=dm[1,];ob1 ## first observation
โ€ข mv1=ob1-cm;mv1 ## deviatiopn of first
observation from
center of gravity
โ€ข md1=t(mv1)%*%incov%*%mv1;md1 ## mahalanobis
distance of first
observation from center of
gravity
โ€ข
Mahalanobis distance in R
โ€ข @@@@@@ Mahalanobis distance of second
observation@@@@@
โ€ข ob2=dm[2,];ob2 ## second observation
โ€ข mv2=ob2-cm;mv2 ## deviatiopn of second
โ€ข observation from
โ€ข center of gravity
โ€ข md2=t(mv2)%*%incov%*%mv2;md2 ##mahalanobis
distance of second
observation from center of
gravity
................ โ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ โ€ฆ..โ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ
Mahalanobis distance in R
โ€ฆโ€ฆโ€ฆ....... โ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ โ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ
@@@@@ Mahalanobis distance of 20th
observation@@@@@
โ€ข Ob20=dm[,20];ob20 [## 20th observation
โ€ข mv20=ob20-cm;mv20 ## deviatiopn of 20th
observation from
center of gravity
โ€ข md20=t(mv20)%*%incov%*%mv20;md20
## mahalanobis distance of
20thobservation from
center of gravity
Mahalanobis distance in R
####### MAHALANOBIS
DISTANCE : PACKAGE ########
โ€ข md=mahalanobis(dm,cm,cov,inverted =F);md
## md =mahalanobis
distance
โ€ข md=mahalanobis(dm,cm,cov);md
Another example
โ€ข x <- matrix(rnorm(100*3), ncol = 3)
โ€ข Sx <- cov(x)
โ€ข D2 <- mahalanobis(x, colMeans(x), Sx)
โ€ข plot(density(D2, bw = 0.5),
main="Squared Mahalanobis distances, n=100,
p=3")
โ€ข qqplot(qchisq(ppoints(100), df = 3), D2,
main = expression("Q-Q plot of Mahalanobis" *
~D^2 *
" vs. quantiles of" * ~ chi[3]^2))
โ€ข abline(0, 1, col = 'gray')
โ€ข ?? mahalanobis
Acknowledgement
Prof . Mohammad Nasser .
Richard A. Johnson
& Dean W. Wichern .
& others
THANK YOU
ALL
Necessity of Statistical Distance
In home
Mother
In mess
Female
maid
Student
in mess

More Related Content

What's hot

Probability distribution
Probability distributionProbability distribution
Probability distribution
Ranjan Kumar
ย 
STATISTICS: Hypothesis Testing
STATISTICS: Hypothesis TestingSTATISTICS: Hypothesis Testing
STATISTICS: Hypothesis Testing
jundumaug1
ย 
The Normal Distribution
The Normal DistributionThe Normal Distribution
The Normal Distribution
San Benito CISD
ย 

What's hot (20)

Addition rule and multiplication rule
Addition rule and multiplication rule  Addition rule and multiplication rule
Addition rule and multiplication rule
ย 
Binomial distribution
Binomial distributionBinomial distribution
Binomial distribution
ย 
Introduction to Bayesian Methods
Introduction to Bayesian MethodsIntroduction to Bayesian Methods
Introduction to Bayesian Methods
ย 
The siegel-tukey-test-for-equal-variability
The siegel-tukey-test-for-equal-variabilityThe siegel-tukey-test-for-equal-variability
The siegel-tukey-test-for-equal-variability
ย 
Probability distribution
Probability distributionProbability distribution
Probability distribution
ย 
Bayes' theorem
Bayes' theoremBayes' theorem
Bayes' theorem
ย 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
ย 
Logistic Regression Analysis
Logistic Regression AnalysisLogistic Regression Analysis
Logistic Regression Analysis
ย 
Negative binomial distribution
Negative binomial distributionNegative binomial distribution
Negative binomial distribution
ย 
Wilcoxon Rank-Sum Test
Wilcoxon Rank-Sum TestWilcoxon Rank-Sum Test
Wilcoxon Rank-Sum Test
ย 
STATISTICS: Hypothesis Testing
STATISTICS: Hypothesis TestingSTATISTICS: Hypothesis Testing
STATISTICS: Hypothesis Testing
ย 
Time Series - Auto Regressive Models
Time Series - Auto Regressive ModelsTime Series - Auto Regressive Models
Time Series - Auto Regressive Models
ย 
Non Linear Equation
Non Linear EquationNon Linear Equation
Non Linear Equation
ย 
ARIMA
ARIMA ARIMA
ARIMA
ย 
The Normal Distribution
The Normal DistributionThe Normal Distribution
The Normal Distribution
ย 
Ppt for 1.1 introduction to statistical inference
Ppt for 1.1 introduction to statistical inferencePpt for 1.1 introduction to statistical inference
Ppt for 1.1 introduction to statistical inference
ย 
Multiple linear regression
Multiple linear regressionMultiple linear regression
Multiple linear regression
ย 
Point Estimation
Point EstimationPoint Estimation
Point Estimation
ย 
Multiple linear regression
Multiple linear regressionMultiple linear regression
Multiple linear regression
ย 
Bayes rule (Bayes Law)
Bayes rule (Bayes Law)Bayes rule (Bayes Law)
Bayes rule (Bayes Law)
ย 

Viewers also liked

ECCV2010: distance function and metric learning part 2
ECCV2010: distance function and metric learning part 2ECCV2010: distance function and metric learning part 2
ECCV2010: distance function and metric learning part 2
zukun
ย 
Principal component analysis and matrix factorizations for learning (part 2) ...
Principal component analysis and matrix factorizations for learning (part 2) ...Principal component analysis and matrix factorizations for learning (part 2) ...
Principal component analysis and matrix factorizations for learning (part 2) ...
zukun
ย 
fauvel_igarss.pdf
fauvel_igarss.pdffauvel_igarss.pdf
fauvel_igarss.pdf
grssieee
ย 
Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf
Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdfKernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf
Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf
grssieee
ย 
KPCA_Survey_Report
KPCA_Survey_ReportKPCA_Survey_Report
KPCA_Survey_Report
Randy Salm
ย 
Modeling and forecasting age-specific mortality: Lee-Carter method vs. Functi...
Modeling and forecasting age-specific mortality: Lee-Carter method vs. Functi...Modeling and forecasting age-specific mortality: Lee-Carter method vs. Functi...
Modeling and forecasting age-specific mortality: Lee-Carter method vs. Functi...
hanshang
ย 
Explicit Signal to Noise Ratio in Reproducing Kernel Hilbert Spaces.pdf
Explicit Signal to Noise Ratio in Reproducing Kernel Hilbert Spaces.pdfExplicit Signal to Noise Ratio in Reproducing Kernel Hilbert Spaces.pdf
Explicit Signal to Noise Ratio in Reproducing Kernel Hilbert Spaces.pdf
grssieee
ย 
Pca and kpca of ecg signal
Pca and kpca of ecg signalPca and kpca of ecg signal
Pca and kpca of ecg signal
es712
ย 
Principal component analysis and matrix factorizations for learning (part 1) ...
Principal component analysis and matrix factorizations for learning (part 1) ...Principal component analysis and matrix factorizations for learning (part 1) ...
Principal component analysis and matrix factorizations for learning (part 1) ...
zukun
ย 

Viewers also liked (20)

ECCV2010: distance function and metric learning part 2
ECCV2010: distance function and metric learning part 2ECCV2010: distance function and metric learning part 2
ECCV2010: distance function and metric learning part 2
ย 
Principal component analysis and matrix factorizations for learning (part 2) ...
Principal component analysis and matrix factorizations for learning (part 2) ...Principal component analysis and matrix factorizations for learning (part 2) ...
Principal component analysis and matrix factorizations for learning (part 2) ...
ย 
fauvel_igarss.pdf
fauvel_igarss.pdffauvel_igarss.pdf
fauvel_igarss.pdf
ย 
Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf
Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdfKernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf
Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf
ย 
Nonlinear component analysis as a kernel eigenvalue problem
Nonlinear component analysis as a kernel eigenvalue problemNonlinear component analysis as a kernel eigenvalue problem
Nonlinear component analysis as a kernel eigenvalue problem
ย 
Principal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty DetectionPrincipal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty Detection
ย 
KPCA_Survey_Report
KPCA_Survey_ReportKPCA_Survey_Report
KPCA_Survey_Report
ย 
Analyzing Kernel Security and Approaches for Improving it
Analyzing Kernel Security and Approaches for Improving itAnalyzing Kernel Security and Approaches for Improving it
Analyzing Kernel Security and Approaches for Improving it
ย 
Adaptive anomaly detection with kernel eigenspace splitting and merging
Adaptive anomaly detection with kernel eigenspace splitting and mergingAdaptive anomaly detection with kernel eigenspace splitting and merging
Adaptive anomaly detection with kernel eigenspace splitting and merging
ย 
Modeling and forecasting age-specific mortality: Lee-Carter method vs. Functi...
Modeling and forecasting age-specific mortality: Lee-Carter method vs. Functi...Modeling and forecasting age-specific mortality: Lee-Carter method vs. Functi...
Modeling and forecasting age-specific mortality: Lee-Carter method vs. Functi...
ย 
Explicit Signal to Noise Ratio in Reproducing Kernel Hilbert Spaces.pdf
Explicit Signal to Noise Ratio in Reproducing Kernel Hilbert Spaces.pdfExplicit Signal to Noise Ratio in Reproducing Kernel Hilbert Spaces.pdf
Explicit Signal to Noise Ratio in Reproducing Kernel Hilbert Spaces.pdf
ย 
A Comparative Study between ICA (Independent Component Analysis) and PCA (Pri...
A Comparative Study between ICA (Independent Component Analysis) and PCA (Pri...A Comparative Study between ICA (Independent Component Analysis) and PCA (Pri...
A Comparative Study between ICA (Independent Component Analysis) and PCA (Pri...
ย 
Regularized Principal Component Analysis for Spatial Data
Regularized Principal Component Analysis for Spatial DataRegularized Principal Component Analysis for Spatial Data
Regularized Principal Component Analysis for Spatial Data
ย 
Pca and kpca of ecg signal
Pca and kpca of ecg signalPca and kpca of ecg signal
Pca and kpca of ecg signal
ย 
Probabilistic PCA, EM, and more
Probabilistic PCA, EM, and moreProbabilistic PCA, EM, and more
Probabilistic PCA, EM, and more
ย 
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at GoogleDataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
ย 
Principal component analysis and matrix factorizations for learning (part 1) ...
Principal component analysis and matrix factorizations for learning (part 1) ...Principal component analysis and matrix factorizations for learning (part 1) ...
Principal component analysis and matrix factorizations for learning (part 1) ...
ย 
Principal Component Analysis and Clustering
Principal Component Analysis and ClusteringPrincipal Component Analysis and Clustering
Principal Component Analysis and Clustering
ย 
ECG: Indication and Interpretation
ECG: Indication and InterpretationECG: Indication and Interpretation
ECG: Indication and Interpretation
ย 
Introduction to Statistical Machine Learning
Introduction to Statistical Machine LearningIntroduction to Statistical Machine Learning
Introduction to Statistical Machine Learning
ย 

Similar to Different kind of distance and Statistical Distance

kactl.pdf
kactl.pdfkactl.pdf
kactl.pdf
Rayhan331
ย 
Semana 19 ecuaciones con radicales รกlgebra uni ccesa007
Semana 19  ecuaciones con radicales  รกlgebra uni ccesa007Semana 19  ecuaciones con radicales  รกlgebra uni ccesa007
Semana 19 ecuaciones con radicales รกlgebra uni ccesa007
Demetrio Ccesa Rayme
ย 

Similar to Different kind of distance and Statistical Distance (20)

Diffusion kernels on SNP data embedded in a non-Euclidean metric
Diffusion kernels on SNP data embedded in a non-Euclidean metricDiffusion kernels on SNP data embedded in a non-Euclidean metric
Diffusion kernels on SNP data embedded in a non-Euclidean metric
ย 
Classification with mixtures of curved Mahalanobis metrics
Classification with mixtures of curved Mahalanobis metricsClassification with mixtures of curved Mahalanobis metrics
Classification with mixtures of curved Mahalanobis metrics
ย 
Quantum chaos of generic systems - Marko Robnik
Quantum chaos of generic systems - Marko RobnikQuantum chaos of generic systems - Marko Robnik
Quantum chaos of generic systems - Marko Robnik
ย 
Multivriada ppt ms
Multivriada   ppt msMultivriada   ppt ms
Multivriada ppt ms
ย 
Lesson 3: Problem Set 4
Lesson 3: Problem Set 4Lesson 3: Problem Set 4
Lesson 3: Problem Set 4
ย 
IIT JAM Mathematical Statistics - MS 2022 | Sourav Sir's Classes
IIT JAM Mathematical Statistics - MS 2022 | Sourav Sir's ClassesIIT JAM Mathematical Statistics - MS 2022 | Sourav Sir's Classes
IIT JAM Mathematical Statistics - MS 2022 | Sourav Sir's Classes
ย 
IIT JAM Math 2022 Question Paper | Sourav Sir's Classes
IIT JAM Math 2022 Question Paper | Sourav Sir's ClassesIIT JAM Math 2022 Question Paper | Sourav Sir's Classes
IIT JAM Math 2022 Question Paper | Sourav Sir's Classes
ย 
Math Analysis I
Math Analysis I Math Analysis I
Math Analysis I
ย 
Bayes gauss
Bayes gaussBayes gauss
Bayes gauss
ย 
Some Common Fixed Point Results for Expansive Mappings in a Cone Metric Space
Some Common Fixed Point Results for Expansive Mappings in a Cone Metric SpaceSome Common Fixed Point Results for Expansive Mappings in a Cone Metric Space
Some Common Fixed Point Results for Expansive Mappings in a Cone Metric Space
ย 
Class-10-Mathematics-Chapter-1-CBSE-NCERT.ppsx
Class-10-Mathematics-Chapter-1-CBSE-NCERT.ppsxClass-10-Mathematics-Chapter-1-CBSE-NCERT.ppsx
Class-10-Mathematics-Chapter-1-CBSE-NCERT.ppsx
ย 
Shape drawing algs
Shape drawing algsShape drawing algs
Shape drawing algs
ย 
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
ย 
kactl.pdf
kactl.pdfkactl.pdf
kactl.pdf
ย 
Semana 19 ecuaciones con radicales รกlgebra uni ccesa007
Semana 19  ecuaciones con radicales  รกlgebra uni ccesa007Semana 19  ecuaciones con radicales  รกlgebra uni ccesa007
Semana 19 ecuaciones con radicales รกlgebra uni ccesa007
ย 
Minimum mean square error estimation and approximation of the Bayesian update
Minimum mean square error estimation and approximation of the Bayesian updateMinimum mean square error estimation and approximation of the Bayesian update
Minimum mean square error estimation and approximation of the Bayesian update
ย 
Output primitives in Computer Graphics
Output primitives in Computer GraphicsOutput primitives in Computer Graphics
Output primitives in Computer Graphics
ย 
maths 12th.pdf
maths 12th.pdfmaths 12th.pdf
maths 12th.pdf
ย 
Maths-MS_Term2 (1).pdf
Maths-MS_Term2 (1).pdfMaths-MS_Term2 (1).pdf
Maths-MS_Term2 (1).pdf
ย 
07-Convolution.pptx signal spectra and signal processing
07-Convolution.pptx signal spectra and signal processing07-Convolution.pptx signal spectra and signal processing
07-Convolution.pptx signal spectra and signal processing
ย 

More from Khulna University

More from Khulna University (11)

Stat 2153 Introduction to Queiueng Theory
Stat 2153 Introduction to Queiueng TheoryStat 2153 Introduction to Queiueng Theory
Stat 2153 Introduction to Queiueng Theory
ย 
Stat 2153 Stochastic Process and Markov chain
Stat 2153 Stochastic Process and Markov chainStat 2153 Stochastic Process and Markov chain
Stat 2153 Stochastic Process and Markov chain
ย 
Stat 3203 -sampling errors and non-sampling errors
Stat 3203 -sampling errors  and non-sampling errorsStat 3203 -sampling errors  and non-sampling errors
Stat 3203 -sampling errors and non-sampling errors
ย 
Stat 3203 -cluster and multi-stage sampling
Stat 3203 -cluster and multi-stage samplingStat 3203 -cluster and multi-stage sampling
Stat 3203 -cluster and multi-stage sampling
ย 
Stat 3203 -multphase sampling
Stat 3203 -multphase samplingStat 3203 -multphase sampling
Stat 3203 -multphase sampling
ย 
Stat 3203 -pps sampling
Stat 3203 -pps samplingStat 3203 -pps sampling
Stat 3203 -pps sampling
ย 
Ds 2251 -_hypothesis test
Ds 2251 -_hypothesis testDs 2251 -_hypothesis test
Ds 2251 -_hypothesis test
ย 
Stat 1163 -statistics in environmental science
Stat 1163 -statistics in environmental scienceStat 1163 -statistics in environmental science
Stat 1163 -statistics in environmental science
ย 
Stat 1163 -correlation and regression
Stat 1163 -correlation and regressionStat 1163 -correlation and regression
Stat 1163 -correlation and regression
ย 
Introduction to matlab
Introduction to matlabIntroduction to matlab
Introduction to matlab
ย 
Regression and Classification: An Artificial Neural Network Approach
Regression and Classification: An Artificial Neural Network ApproachRegression and Classification: An Artificial Neural Network Approach
Regression and Classification: An Artificial Neural Network Approach
ย 

Recently uploaded

Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
RohitNehra6
ย 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sรฉrgio Sacani
ย 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
ย 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
LeenakshiTyagi
ย 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
ย 
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kesar Bagh Lucknow best Night Fun service ๐Ÿชก
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kesar Bagh Lucknow best Night Fun service  ๐ŸชกCALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kesar Bagh Lucknow best Night Fun service  ๐Ÿชก
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kesar Bagh Lucknow best Night Fun service ๐Ÿชก
anilsa9823
ย 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
ย 

Recently uploaded (20)

Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
ย 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
ย 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
ย 
โคJammu Kashmir Call Girls 8617697112 Personal Whatsapp Number ๐Ÿ’ฆโœ….
โคJammu Kashmir Call Girls 8617697112 Personal Whatsapp Number ๐Ÿ’ฆโœ….โคJammu Kashmir Call Girls 8617697112 Personal Whatsapp Number ๐Ÿ’ฆโœ….
โคJammu Kashmir Call Girls 8617697112 Personal Whatsapp Number ๐Ÿ’ฆโœ….
ย 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
ย 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
ย 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
ย 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
ย 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
ย 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
ย 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
ย 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
ย 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
ย 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
ย 
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kesar Bagh Lucknow best Night Fun service ๐Ÿชก
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kesar Bagh Lucknow best Night Fun service  ๐ŸชกCALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kesar Bagh Lucknow best Night Fun service  ๐Ÿชก
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kesar Bagh Lucknow best Night Fun service ๐Ÿชก
ย 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
ย 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: โ€œEg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: โ€œEg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: โ€œEg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: โ€œEg...
ย 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
ย 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
ย 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
ย 

Different kind of distance and Statistical Distance

  • 2. Md. Menhazul Abedin M.Sc. Student Dept. of Statistics Rajshahi University Mob: 01751385142 Email: menhaz70@gmail.com
  • 3. Objectives โ€ข To know about the meaning of statistical distance and itโ€™s relation and difference with general or Euclidean distance
  • 4. Content ๏ถDefinition of Euclidean distance ๏ถConcept & intuition of statistical distance ๏ถDefinition of Statistical distance ๏ถNecessity of statistical distance ๏ถConcept of Mahalanobis distance (population &sample) ๏ถDistribution of Mahalanobis distance ๏ถMahalanobis distance in R ๏ถAcknowledgement
  • 5. Euclidean Distance from origin (0,0) (X,Y) X Y
  • 6. Euclidean Distance P(X,Y) Y O (0,0) X By Pythagoras ๐‘‘(๐‘œ, ๐‘) = ๐‘‹2 + ๐‘Œ2
  • 8.
  • 9. we see that two specific points in each picture Our problem is to determine the length between two points . But how ?????????? Assume that these pictures are placed in two dimensional spaces and points are joined by a straight line
  • 10. Let 1st point is (๐‘ฅ1,๐‘ฆ1) and 2nd point is (๐‘ฅ2, ๐‘ฆ2) then distance is D= โˆš ( (๐‘ฅ1โˆ’๐‘ฅ2)2 + (๐‘ฆ1 โˆ’ ๐‘ฆ2)2 ) What will be happen when dimension is three
  • 12. Distance is given by โ€ข Points are (x1,x2,x3) and (y1,y2,y3) (๐‘ฅ1 โˆ’ ๐‘ฆ1)2+(๐‘ฅ2 โˆ’ ๐‘ฆ2)2+(๐‘ฅ3 โˆ’ ๐‘ฆ3)2
  • 13. For n dimension it can be written as the following expression and named as Euclidian distance 22 22 2 11 2121 )()()(),( ),,,(),,,,( pp pp yxyxyxQPd yyyQxxxP ๏€ญ๏€ซ๏€ซ๏€ญ๏€ซ๏€ญ๏€ฝ ๏Œ ๏Œ๏Œ
  • 14. 12/12/2016 14 Properties of Euclidean Distance and Mathematical Distance โ€ข Usual human concept of distance is Eucl. Dist. โ€ข Each coordinate contributes equally to the distance 22 22 2 11 2121 )()()(),( ),,,(),,,,( pp pp yxyxyxQPd yyyQxxxP ๏€ญ๏€ซ๏€ซ๏€ญ๏€ซ๏€ญ๏€ฝ ๏Œ ๏Œ๏Œ 14 Mathematicians, generalizing its three properties , 1) d(P,Q)=d(Q,P). 2) d(P,Q)=0 if and only if P=Q and 3) d(P,Q)=<d(P,R)+d(R,Q) for all R, define distance on any set.
  • 17. โ€ข The Manhattan distance is the simple sum of the horizontal and vertical components, whereas the diagonal distance might be computed by applying the Pythagorean Theorem .
  • 18. โ€ข Red: Manhattan distance. โ€ข Green: diagonal, straight-line distance. โ€ข Blue, yellow: equivalent Manhattan distances.
  • 19. โ€ข Manhattan distance 12 unit โ€ข Diagonal or straight-line distance or Euclidean distance is 62 + 62 =6โˆš2 We observe that Euclidean distance is less than Manhattan distance
  • 20. Taxicab/Manhattan distance :Definition (p1,p2)) (q1,q2) โ”‚๐‘1 โˆ’ ๐‘ž2โ”‚ โ”‚p2-q2โ”‚
  • 21. Manhattan Distance โ€ข The taxicab distance between (p1,p2) and (q1,q2) is โ”‚p1-q1โ”‚+โ”‚p2-q2โ”‚
  • 22. Relationship between Manhattan & Euclidean distance. 7 Block 6 Block
  • 23. Relationship between Manhattan & Euclidean distance. โ€ข It now seems that the distance from A to C is 7 blocks, while the distance from A to B is 6 blocks. โ€ข Unless we choose to go off-road, B is now closer to A than C. โ€ข Taxicab distance is sometimes equal to Euclidean distance, but otherwise it is greater than Euclidean distance. Euclidean distance <Taxicab distance Is it true always ??? Or for n dimension ???
  • 24. Proofโ€ฆโ€ฆ.. Absolute values guarantee non-negative value Addition property of inequality
  • 27. For high dimension โ€ข It holds for high dimensional case โ€ข ฮฃ โ”‚๐‘ฅ๐‘– โˆ’ ๐‘ฆ๐‘–โ”‚2 โ‰ค ฮฃ โ”‚๐‘ฅ๐‘– โˆ’ ๐‘ฆ๐‘–โ”‚2 + 2ฮฃโ”‚๐‘ฅ๐‘– โˆ’ ๐‘ฅ๐‘–โ”‚โ”‚๐‘ฅ๐‘— โˆ’ ๐‘ฅ๐‘—โ”‚ Which implies ฮฃ (๐‘ฅ๐‘– โˆ’ ๐‘ฆ๐‘–)2 โ‰ค ฮฃโ”‚๐‘ฅ๐‘– โˆ’ ๐‘ฅ๐‘—โ”‚ ๐‘‘ ๐ธ โ‰ค ๐‘‘ ๐‘‡
  • 28. 12/12/2016 Statistical Distance โ€ข Weight coordinates subject to a great deal of variability less heavily than those that are not highly variable Whoisnearerto datasetifitwere point? Same distance from origin
  • 29. โ€ข Here variability in x1 axis > variability in x2 axis ๏ฑ Is the same distance meaningful from origin ??? Ans: no But, how we take into account the different variability ???? Ans : Give different weights on axes.
  • 30. 12/12/2016 Statistical Distance for Uncorrelated Data ๏€จ ๏€ฉ ๏€จ ๏€ฉ 22 2 2 11 2 12* 2 2* 1 222 * 2111 * 1 21 ),( /,/ )0,0(),,( s x s x xxPOd sxxsxx OxxP ๏€ซ๏€ฝ๏€ซ๏€ฝ ๏€ฝ๏€ฝ weight Standardization
  • 31. all point that have coordinates (x1,x2) and are a constant squared distance , c2 from the origin must satisfy ๐‘ฅ12 ๐‘ 11 + ๐‘ฅ22 ๐‘ 22 =๐‘2 But โ€ฆ how to choose c ????? Itโ€™s a problem Choose c as 95% observation fall in this area โ€ฆ. ๐‘ 11 > ๐‘ 22 = > 1 ๐‘ 11 < 1 ๐‘ 22
  • 32. 12/12/2016 Ellipse of Constant Statistical Distance for Uncorrelated Data 11sc๏€ญ 11sc 22sc 22sc๏€ญ x1 x2 0
  • 33. โ€ข This expression can be generalized as โ€ฆโ€ฆโ€ฆ statistical distance from an arbitrary point P=(x1,x2) to any fixed point Q=(y1,y2) ;lk;lk; For P dimensionโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ..
  • 34. Remark : 1) The distance of P to the origin O is obtain by setting all ๐‘ฆ๐‘– = 0 2) If all ๐‘ ๐‘–๐‘– are equal Euclidean distance formula is appropriate
  • 36. โ€ข How do you measure the statistical distance of the above data set ?????? โ€ข Ans : Firstly make it uncorrelated . โ€ข But why and howโ€ฆโ€ฆโ€ฆ??????? โ€ข Ans: Rotate the axis keeping origin fixed.
  • 38. Rotation of axes keeping origin fixed O M R X1 N Q ๐‘ฅ1 P(x1,x2) x2 ๐‘ฅ2 ๐œƒ ๐œƒ
  • 39. x=OM =OR-MR = ๐‘ฅ1 cos๐œƒ โ€“ ๐‘ฅ2 sin๐œƒ โ€ฆโ€ฆ. (i) y=MP =QR+NP = ๐‘ฅ1 sin๐œƒ + ๐‘ฅ2 cos๐œƒ โ€ฆโ€ฆโ€ฆ.(ii)
  • 40. โ€ข The solution of the above equations
  • 41. Choice of ๐œƒ ๏ถWhat ๐œƒ will you choice ? ๏ถHow will you do it ? ๏ถ Data matrix โ†’ Centeralized data matrix โ†’ Covariance of data matrix โ†’ Eigen vector ๏ถTheta = angle between 1st eigen vector and [1,0] or angle between 2nd eigen vector and [0,1]
  • 42. Why is that angle between 1st eigen vector and [0,1] or angle between 2nd eigen vector and [1,0] ?? Ans: Let B be a (p by p) positive definite matrix with eigenvalues ฮป1โ‰ฅฮป2โ‰ฅฮป3โ‰ฅ โ€ฆ โ€ฆ . . โ‰ฅ ฮปp>0 and associated normalized eigenvectors ๐‘’1, ๐‘’2, โ€ฆ โ€ฆ โ€ฆ , ๐‘’ ๐‘.Then ๐‘š๐‘Ž๐‘ฅ ๐‘ฅโ‰ 0 ๐‘ฅโ€ฒ ๐ต๐‘ฅ ๐‘ฅโ€ฒ ๐‘ฅ = ฮป1 attained when x= ๐‘’1 ๐‘š๐‘–๐‘› ๐‘ฅโ‰ 0 ๐‘ฅโ€ฒ ๐ต๐‘ฅ ๐‘ฅโ€ฒ ๐‘ฅ = ฮป ๐‘ attained when x= ๐‘’ ๐‘
  • 43. ๐‘š๐‘Ž๐‘ฅ ๐‘ฅโŠฅ๐‘’1,๐‘’2,โ€ฆ,๐‘’ ๐‘˜ ๐‘ฅโ€ฒ ๐ต๐‘ฅ ๐‘ฅโ€ฒ ๐‘ฅ = ฮป ๐‘˜+1 attained when x= ๐‘’ ๐‘˜+1 , k = 1,2, โ€ฆ , p โˆ’ 1.
  • 44. Choice of ๐œƒ #### Excercise 16.page(309).Heights in inches (x) & Weights in pounds(y). An Introduction to Statistics and Probability M.Nurul Islam ####### x=c(60,60,60,60,62,62,62,64,64,64,66,66,66,66,68, 68,68,70,70,70);x y=c(115,120,130,125,130,140,120,135,130,145,135 ,170,140,155,150,160,175,180,160,175);y ############ V=eigen(cov(cdata))$vectors;V as.matrix(cdata)%*%V plot(x,y)
  • 45. data=data.frame(x,y);data as.matrix(data) colMeans(data) xmv=c(rep(64.8,20));xmv ### x mean vector ymv=c(rep(144.5,20));ymv ### y mean vector meanmatrix=cbind(xmv,ymv);meanmatrix cdata=data-meanmatrix;cdata ### mean centred data plot(cdata) abline(h=0,v=0) cor(cdata)
  • 47. ################ theta = acos( sum(xx1*vv1) / ( sqrt(sum(xx1 * xx1)) * sqrt(sum(vv1 * vv1)) ) );theta theta = acos( sum(xx2*vv2) / ( sqrt(sum(xx2 * xx2)) * sqrt(sum(vv2 * vv2)) ) );theta ############### xx=cdata[,1]*cos( 1.41784)+cdata[,2]*sin( 1.41784);xx yy=-cdata[,1]*sin( 1.41784)+cdata[,2]*cos( 1.41784);yy plot(xx,yy) abline(h=0,v=0)
  • 49. โ€ข ################ comparison of both method ############ comparison=tdata - as.matrix(cbind(xx,yy));comparison round(comparison,4)
  • 50. ########### using package. md from original data ##### md=mahalanobis(data,colMeans(data),cov(data),inverted =F);md ## md =mahalanobis distance ######## mahalanobis distance from transformed data ######## tmd=mahalanobis(tdata,colMeans(tdata),cov(tdata),inverted =F);tmd ###### comparison ############ md-tmd
  • 51. Mahalanobis distance : Manually mu=colMeans(tdata);mu incov=solve(cov(tdata));incov md1=t(tdata[1,]-mu)%*%incov%*%(tdata[1,]- mu);md1 md2=t(tdata[2,]-mu)%*%incov%*%(tdata[2,]- mu);md2 md3=t(tdata[3,]-mu)%*%incov%*%(tdata[3,]- mu);md3 ............. โ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ. โ€ฆโ€ฆโ€ฆโ€ฆ.. md20=t(tdata[20,]-mu)%*%incov%*%(tdata[20,]- mu);md20 md for package and manully are equal
  • 52. tdata s1=sd(tdata[,1]);s1 s2=sd(tdata[,2]);s2 xstar=c(tdata[,1])/s1;xstar ystar=c(tdata[,2])/s2;ystar md1=sqrt((-1.46787309)^2 + (0.1484462)^2);md1 md2=sqrt((-1.22516896 )^2 + ( 0.6020111 )^2);md2 โ€ฆโ€ฆโ€ฆ. โ€ฆโ€ฆโ€ฆโ€ฆ โ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ.. Not equal to above distancesโ€ฆโ€ฆ.. Why ??????? Take into account mean
  • 53. 12/12/2016 Statistical Distance under Rotated Coordinate System 2 2222112 2 111 212 211 22 2 2 11 2 1 21 2),( cossin~ sincos~ ~ ~ ~ ~ ),( )~,~(),0,0( xaxxaxaPOd xxx xxx s x s x POd xxPO ๏€ซ๏€ซ๏€ฝ ๏€ซ๏€ญ๏€ฝ ๏€ซ๏€ฝ ๏€ซ๏€ฝ ๏ฑ๏ฑ ๏ฑ๏ฑ ๐‘ 11 ๐‘ 22 are sample variances
  • 54. โ€ข After some manipulation this can be written in terms of origin variables Whereas
  • 55. Proofโ€ฆโ€ฆโ€ฆโ€ฆ โ€ข ๐‘ 11= 1 ๐‘›โˆ’1 ฮฃ( ๐‘ฅ1 โˆ’ ๐‘ฅ1 ) 2 = 1 ๐‘›โˆ’1 ฮฃ (๐‘ฅ1 cos ๐œƒ + ๐‘ฅ2 sin ๐œƒ โˆ’ ๐‘ฅ1 cos ๐œƒ โˆ’ ๐‘ฅ2 sin ๐œƒ )2 = ๐‘๐‘œ๐‘ 2(๐œƒ)๐‘ 11 + 2 sin ๐œƒ cos ๐œƒ ๐‘ 12 + ๐‘ ๐‘–๐‘›2(๐œƒ)๐‘ 22 ๐‘ 22 = 1 ๐‘›โˆ’1 ฮฃ( ๐‘ฅ2 โˆ’ ๐‘ฅ2 ) 2 = ฮฃ 1 ๐‘›โˆ’1 ( โˆ’ ๐‘ฅ1 sin ๐œƒ + ๐‘ฅ2 cos ๐œƒ + ๐‘ฅ1 sin(๐œƒ) + ๐‘ฅ2 cos ๐œƒ ) 2 = ๐‘๐‘œ๐‘ 2(๐œƒ)๐‘ 22 - 2 sin ๐œƒ cos ๐œƒ ๐‘ 12 + ๐‘ ๐‘–๐‘›2(๐œƒ)๐‘ 11
  • 56. Continuedโ€ฆโ€ฆโ€ฆโ€ฆ. ๐‘‘(๐‘‚, ๐‘ƒ)= (๐‘ฅ1cos ๐œƒ + ๐‘ฅ2 sin ๐œƒ) 2 ๐‘ 11 + (โˆ’ ๐‘ฅ1 sin ๐œƒ + ๐‘ฅ2 cos ๐œƒ)2 ๐‘ 22
  • 59. โ€ข The above distances are completely determined by the coefficients(weights) ๐‘Ž๐‘–๐‘˜ ; i, k = 1,2,3, โ€ฆ โ€ฆ โ€ฆ p. These are can be arranged in rectangular array as this array (matrix) must be symmetric positive definite.
  • 60. Why Positive definite ???? Let A be a positive definite matrix . A=Cโ€™C Xโ€™AX= Xโ€™Cโ€™CX = (CX)โ€™(CX) = Yโ€™Y It obeys all the distance property. Xโ€™AX is distance , For different A it gives different distance .
  • 61. โ€ข Why positive definite matrix ???????? โ€ข Ans: Spectral decomposition : the spectral decomposition of a kร—k symmetric matrix A is given by โ€ข Where (ฮป๐‘–, ๐‘’๐‘–); ๐‘– = 1,2, โ€ฆ โ€ฆ โ€ฆ , ๐‘˜ are pair of eigenvalues and eigenvectors. And ฮป1 โ‰ฅ ฮป2 โ‰ฅ ฮป3 โ‰ฅ โ€ฆ โ€ฆ . . And if pd ฮป๐‘– > 0 & invertible .
  • 62. 4.0 4.5 5.0 5.5 6.0 2 3 4 5 ฮป1 ฮป2 ๐‘’1 ๐‘’2
  • 63. โ€ข Suppose p=2. The distance from origin is By spectral decomposition X1 X2 ๐ถ โˆšฮป1 ๐ถ โˆšฮป2
  • 64. Another property is Thus We use this property in Mahalanobis distance
  • 65. 12/12/2016 Necessity of Statistical Distance Center of gravity Another point
  • 66. โ€ข Consider the Euclidean distances from the point Q to the points P and the origin O. โ€ข Obviously d(PQ) > d (QO ) ๏ฑ But, P appears to be more like the points in the cluster than does the origin . ๏ฑ If we take into account the variability of the points in cluster and measure distance by statistical distance , then Q will be closer to P than O .
  • 67. Mahalanobis distance โ€ข The Mahalanobis distance is a descriptive statistic that provides a relative measure of a data point's distance from a common point. It is a unitless measure introduced by P. C. Mahalanobis in 1936
  • 68. Intuition of Mahalanobis Distance โ€ข Recall the eqution d(O,P)= ๐‘ฅโ€ฒ ๐ด๐‘ฅ => ๐‘‘2 (๐‘‚, ๐‘ƒ) =๐‘ฅโ€ฒ ๐ด๐‘ฅ Where x= ๐‘ฅ1 ๐‘ฅ2 , A= ๐‘Ž11 ๐‘Ž12 ๐‘Ž21 ๐‘Ž22
  • 69. Intuition of Mahalanobis Distance d(O,P)= ๐‘ฅโ€ฒ ๐ด๐‘ฅ ๐‘‘2 ๐‘‚, ๐‘ƒ = ๐‘ฅโ€ฒ ๐ด๐‘ฅ Where ๐‘ฅโ€ฒ = ๐‘ฅ1 ๐‘ฅ2 ๐‘ฅ3 โ‹ฏ ๐‘ฅ ๐‘ ; A=
  • 70. Intuition of Mahalanobis Distance ๐‘‘2 (๐‘ƒ, ๐‘„) = ๐‘ฅ โˆ’ ๐‘ฆ โ€ฒ ๐ด(๐‘ฅ โˆ’ ๐‘ฆ) where, ๐‘ฅโ€ฒ = ๐‘ฅ1, ๐‘ฅ2, โ€ฆ , ๐‘ฅ ๐‘ ; ๐‘ฆโ€ฒ = (๐‘ฆ1, ๐‘ฆ2, โ€ฆ ๐‘ฆ๐‘) A=
  • 71. Mahalanobis Distance โ€ข Mahalanobis used ,inverse of covariance matrix ฮฃ instead of A โ€ข Thus ๐‘‘2 ๐‘‚, ๐‘ƒ = ๐‘ฅโ€ฒ ฮฃโˆ’1 ๐‘ฅ โ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ..(1) โ€ข And used ๐œ‡ (๐‘๐‘’๐‘›๐‘ก๐‘’๐‘Ÿ ๐‘œ๐‘“ ๐‘”๐‘Ÿ๐‘Ž๐‘ฃ๐‘–๐‘ก๐‘ฆ ) instead of y ๐‘‘2 (๐‘ƒ, ๐‘„) = (๐‘ฅ โˆ’ ๐œ‡ )โ€ฒฮฃโˆ’1 (๐‘ฅ โˆ’ ๐œ‡)โ€ฆโ€ฆโ€ฆ..(2) Mah- alan- obis dist- ance
  • 72. Mahalanobis Distance โ€ข The above equations are nothing but Mahalanobis Distance โ€ฆโ€ฆ โ€ข For example, suppose we took a single observation from a bivariate population with Variable X and Variable Y, and that our two variables had the following characteristics
  • 73. โ€ข single observation, X = 410 and Y = 400 The Mahalanobis distance for that single value as:
  • 75. โ€ข Therefore, our single observation would have a distance of 1.825 standardized units from the mean (mean is at X = 500, Y = 500). โ€ข If we took many such observations, graphed them and colored them according to their Mahalanobis values, we can see the elliptical Mahalanobis regions come out
  • 76. โ€ข The points are actually distributed along two primary axes:
  • 77.
  • 78. If we calculate Mahalanobis distances for each of these points and shade them according to their distance value, we see clear elliptical patterns emerge:
  • 79.
  • 80. โ€ข We can also draw actual ellipses at regions of constant Mahalanobis values: 68% obs 95% obs 99.7% obs
  • 81. โ€ข Which ellipse do you choose ?????? ๏ฑAns : Use the 68-95-99.7 rule . 1) about two-thirds (68%) of the points should be within 1 unit of the origin (along the axis). 2) about 95% should be within 2 units 3)about 99.7 should be within 3 units
  • 83. Sample Mahalanobis Distancce โ€ข The sample Mahalanobis distance is made by replacing ฮฃ by S and ๐œ‡ by ๐‘‹ โ€ข i.e (X- ๐‘‹)โ€™ ๐‘†โˆ’1 (X- ๐‘‹)
  • 84. For sample (X- ๐‘ฟ)โ€™ ๐‘บโˆ’๐Ÿ (X- ๐‘ฟ)โ‰ค ๐Œ ๐Ÿ ๐’‘ (โˆ) Distribution of mahalanobis distance
  • 85. Distribution of mahalanobis distance Let ๐‘‹1, ๐‘‹2, ๐‘‹3, โ€ฆ โ€ฆ โ€ฆ , ๐‘‹ ๐‘› be in dependent observation from any population with mean ๐œ‡ and finite (nonsingular) covariance ฮฃ . Then ๏ถ ๐‘› ( ๐‘‹ โˆ’ ๐œ‡) is approximately ๐‘๐‘(0, ฮฃ) and ๏ถ ๐‘› ๐‘‹ โˆ’ ๐œ‡ โ€ฒ ๐‘†โˆ’1 ( ๐‘‹ โˆ’ ๐œ‡) is approximately ฯ‡ ๐‘ 2 for n-p large This is nothing but central limit theorem
  • 86. Mahalanobis distance in R โ€ข ########### Mahalanobis Distance ########## โ€ข x=rnorm(100);x โ€ข dm=matrix(x,nrow=20,ncol=5,byrow=F);dm ##dm = data matrix โ€ข cm=colMeans(dm);cm ## cm= column means โ€ข cov=cov(dm);cov ##cov = covariance matrix โ€ข incov=solve(cov);incov ##incov= inverse of covarianc matrix
  • 87. Mahalanobis distance in R โ€ข ####### MAHALANOBIS DISTANCE : MANUALY ###### โ€ข @@@ Mahalanobis distance of first โ€ข observation@@@@@@ โ€ข ob1=dm[1,];ob1 ## first observation โ€ข mv1=ob1-cm;mv1 ## deviatiopn of first observation from center of gravity โ€ข md1=t(mv1)%*%incov%*%mv1;md1 ## mahalanobis distance of first observation from center of gravity โ€ข
  • 88. Mahalanobis distance in R โ€ข @@@@@@ Mahalanobis distance of second observation@@@@@ โ€ข ob2=dm[2,];ob2 ## second observation โ€ข mv2=ob2-cm;mv2 ## deviatiopn of second โ€ข observation from โ€ข center of gravity โ€ข md2=t(mv2)%*%incov%*%mv2;md2 ##mahalanobis distance of second observation from center of gravity ................ โ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ โ€ฆ..โ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ
  • 89. Mahalanobis distance in R โ€ฆโ€ฆโ€ฆ....... โ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ โ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ @@@@@ Mahalanobis distance of 20th observation@@@@@ โ€ข Ob20=dm[,20];ob20 [## 20th observation โ€ข mv20=ob20-cm;mv20 ## deviatiopn of 20th observation from center of gravity โ€ข md20=t(mv20)%*%incov%*%mv20;md20 ## mahalanobis distance of 20thobservation from center of gravity
  • 90. Mahalanobis distance in R ####### MAHALANOBIS DISTANCE : PACKAGE ######## โ€ข md=mahalanobis(dm,cm,cov,inverted =F);md ## md =mahalanobis distance โ€ข md=mahalanobis(dm,cm,cov);md
  • 91. Another example โ€ข x <- matrix(rnorm(100*3), ncol = 3) โ€ข Sx <- cov(x) โ€ข D2 <- mahalanobis(x, colMeans(x), Sx)
  • 92. โ€ข plot(density(D2, bw = 0.5), main="Squared Mahalanobis distances, n=100, p=3") โ€ข qqplot(qchisq(ppoints(100), df = 3), D2, main = expression("Q-Q plot of Mahalanobis" * ~D^2 * " vs. quantiles of" * ~ chi[3]^2)) โ€ข abline(0, 1, col = 'gray') โ€ข ?? mahalanobis
  • 93. Acknowledgement Prof . Mohammad Nasser . Richard A. Johnson & Dean W. Wichern . & others
  • 95. Necessity of Statistical Distance In home Mother In mess Female maid Student in mess