Different kind of distance and Statistical Distance

WELCOME TO MY
PRESENTATION
ON
STATISTICAL DISTANCE

Md. Menhazul Abedin
M.Sc. Student
Dept. of Statistics
Rajshahi University
Mob: 01751385142
Email: menhaz70@gmail.com

Objectives
• To know about the meaning of statistical
distance and it’s relation and difference with
general or Euclidean distance

Content
Definition of Euclidean distance
Concept & intuition of statistical distance
Definition of Statistical distance
Necessity of statistical distance
Concept of Mahalanobis distance (population
&sample)
Distribution of Mahalanobis distance
Mahalanobis distance in R
Acknowledgement

Euclidean Distance from origin
(0,0)
(X,Y)
X
Y

Euclidean Distance
P(X,Y)
Y
O (0,0) X
By Pythagoras
𝑑(𝑜, 𝑝) = 𝑋2 + 𝑌2

Euclidean Distance
Specific point

we see that two specific points in each picture
Our problem is to determine the length between
two points .
But how ??????????
Assume that these pictures are placed in two
dimensional spaces and points are joined by a
straight line

Let 1st point is (𝑥1,𝑦1) and 2nd point is (𝑥2, 𝑦2)
then distance is
D= √ ( (𝑥1−𝑥2)2
+ (𝑦1 − 𝑦2)2
)
What will be happen when dimension is three

Distance is given by
• Points are (x1,x2,x3) and (y1,y2,y3)
(𝑥1 − 𝑦1)2+(𝑥2 − 𝑦2)2+(𝑥3 − 𝑦3)2

For n dimension it can be written
as the following expression and
named as Euclidian distance
22
22
2
11
2121
)()()(),(
),,,(),,,,(
pp
pp
yxyxyxQPd
yyyQxxxP
 


12/12/2016 14
Properties of Euclidean Distance and
Mathematical Distance
• Usual human concept of distance is Eucl. Dist.
• Each coordinate contributes equally to the distance
22
22
2
11
2121
)()()(),(
),,,(),,,,(
pp
pp
yxyxyxQPd
yyyQxxxP
 

14
Mathematicians, generalizing its three properties ,
1) d(P,Q)=d(Q,P).
2) d(P,Q)=0 if and only if P=Q and
3) d(P,Q)=<d(P,R)+d(R,Q) for all R, define distance
on any set.

P(X1,Y1) Q(X2,Y2)
R(Z1,Z2))
R(Z1,Z2)

Taxicab Distance :NotionRed:
Manh
attan
distan
ce.
Green:
diagonal,
straight-
line
distance
Blue,
yello
w:
equiv
alent
Man
hatta
n
dista
nces.

• The Manhattan distance is the simple sum of
the horizontal and vertical components,
whereas
the diagonal distance might be computed by
applying the Pythagorean Theorem .

• Red: Manhattan distance.
• Green: diagonal, straight-line distance.
• Blue, yellow: equivalent Manhattan distances.

• Manhattan distance 12 unit
• Diagonal or straight-line distance or Euclidean
distance is 62 + 62 =6√2
We observe that Euclidean distance is less than
Manhattan distance

Taxicab/Manhattan distance :Definition
(p1,p2))
(q1,q2)
│𝑝1 − 𝑞2│
│p2-q2│

Manhattan Distance
• The taxicab distance between
(p1,p2) and (q1,q2)
is │p1-q1│+│p2-q2│

Relationship between Manhattan &
Euclidean distance.
7 Block
6 Block

Relationship between Manhattan &
Euclidean distance.
• It now seems that the distance from A to C is 7 blocks,
while the distance from A to B is 6 blocks.
• Unless we choose to go off-road, B is now closer to A
than C.
• Taxicab distance is sometimes equal to Euclidean
distance, but otherwise it is greater than Euclidean
distance.
Euclidean distance <Taxicab distance
Is it true always ???
Or for n dimension ???

Proof……..
Absolute values guarantee non-negative value
Addition property of inequality

For high dimension
• It holds for high dimensional case
• Σ │𝑥𝑖 − 𝑦𝑖│2
≤ Σ │𝑥𝑖 − 𝑦𝑖│2
+ 2Σ│𝑥𝑖 − 𝑥𝑖││𝑥𝑗 − 𝑥𝑗│
Which implies
Σ (𝑥𝑖 − 𝑦𝑖)2 ≤ Σ│𝑥𝑖 − 𝑥𝑗│
𝑑 𝐸 ≤ 𝑑 𝑇

12/12/2016
Statistical Distance
• Weight coordinates subject to a great deal of
variability less heavily than those that are not
highly variable
Whoisnearerto
datasetifitwere
point?
Same
distance from
origin

• Here
variability in x1 axis > variability in x2 axis
 Is the same distance meaningful from
origin ???
Ans: no
But, how we take into account the different
variability ????
Ans : Give different weights on axes.

12/12/2016
Statistical Distance for Uncorrelated Data
   
22
2
2
11
2
12*
2
2*
1
222
*
2111
*
1
21
),(
/,/
)0,0(),,(
s
x
s
x
xxPOd
sxxsxx
OxxP


weight
Standardization

all point that have coordinates (x1,x2) and
are a constant squared distance , c2
from the
origin must satisfy
𝑥12
𝑠11
+
𝑥22
𝑠22
=𝑐2
But … how to choose c ?????
It’s a problem
Choose c as 95% observation fall in this area ….
𝑠11 > 𝑠22
= >
1
𝑠11
<
1
𝑠22

12/12/2016
Ellipse of Constant Statistical Distance for
Uncorrelated Data
11sc 11sc
22sc
22sc
x1
x2
0

• This expression can be generalized as ………
statistical distance from an arbitrary point
P=(x1,x2) to any fixed point Q=(y1,y2)
;lk;lk;
For P dimension……………..

Remark :
1) The distance of P to the origin O is
obtain by setting all 𝑦𝑖 = 0
2) If all 𝑠𝑖𝑖 are equal Euclidean
distance formula is appropriate

Scattered Plot for
Correlated Measurements

• How do you measure the statistical distance of
the above data set ??????
• Ans : Firstly make it uncorrelated .
• But why and how………???????
• Ans: Rotate the axis keeping origin fixed.

12/12/2016
Scattered Plot for
Correlated Measurements

Rotation of axes keeping origin fixed
O M R
X1
N
Q
𝑥1
P(x1,x2)
x2
𝑥2
𝜃
𝜃

x=OM
=OR-MR
= 𝑥1 cos𝜃 – 𝑥2 sin𝜃 ……. (i)
y=MP
=QR+NP
= 𝑥1 sin𝜃 + 𝑥2 cos𝜃 ……….(ii)

• The solution of the above equations

Choice of 𝜃
What 𝜃 will you choice ?
How will you do it ?
 Data matrix → Centeralized data matrix → Covariance of
data matrix → Eigen vector
Theta = angle between 1st eigen vector and [1,0]
or
angle between 2nd eigen vector and [0,1]

Why is that angle between 1st eigen vector and
[0,1] or angle between 2nd eigen vector and [1,0]
??
Ans: Let B be a (p by p) positive definite matrix
with eigenvalues λ1≥λ2≥λ3≥ … … . . ≥ λp>0
and associated normalized eigenvectors
𝑒1, 𝑒2, … … … , 𝑒 𝑝.Then
𝑚𝑎𝑥 𝑥≠0
𝑥′ 𝐵𝑥
𝑥′ 𝑥
= λ1 attained when x= 𝑒1
𝑚𝑖𝑛 𝑥≠0
𝑥′ 𝐵𝑥
𝑥′ 𝑥
= λ 𝑝 attained when x= 𝑒 𝑝

𝑚𝑎𝑥 𝑥⊥𝑒1,𝑒2,…,𝑒 𝑘
𝑥′ 𝐵𝑥
𝑥′ 𝑥
= λ 𝑘+1 attained when
x= 𝑒 𝑘+1 , k = 1,2, … , p − 1.

Choice of 𝜃
#### Excercise 16.page(309).Heights in inches (x) &
Weights in pounds(y). An Introduction to Statistics
and Probability M.Nurul Islam #######
x=c(60,60,60,60,62,62,62,64,64,64,66,66,66,66,68,
68,68,70,70,70);x
y=c(115,120,130,125,130,140,120,135,130,145,135
,170,140,155,150,160,175,180,160,175);y
############
V=eigen(cov(cdata))$vectors;V
as.matrix(cdata)%*%V
plot(x,y)

data=data.frame(x,y);data
as.matrix(data)
colMeans(data)
xmv=c(rep(64.8,20));xmv ### x mean vector
ymv=c(rep(144.5,20));ymv ### y mean vector
meanmatrix=cbind(xmv,ymv);meanmatrix
cdata=data-meanmatrix;cdata
### mean centred data
plot(cdata) abline(h=0,v=0)
cor(cdata)

• ##################
cov(cdata)
eigen(cov( cdata))
xx1=c(1,0);xx1
xx2=c(0,1);xx2
vv1=eigen(cov(cdata))$vectors[,1];vv1
vv2=eigen(cov(cdata))$vectors[,2];vv2

################
theta = acos( sum(xx1*vv1) / ( sqrt(sum(xx1 * xx1)) *
sqrt(sum(vv1 * vv1)) ) );theta
theta = acos( sum(xx2*vv2) / ( sqrt(sum(xx2 * xx2)) *
sqrt(sum(vv2 * vv2)) ) );theta
###############
xx=cdata[,1]*cos( 1.41784)+cdata[,2]*sin( 1.41784);xx
yy=-cdata[,1]*sin( 1.41784)+cdata[,2]*cos( 1.41784);yy
plot(xx,yy)
abline(h=0,v=0)

V=eigen(cov(cdata))$vectors;V
tdata=as.matrix(cdata)%*%V;tdata
### transformed data
cov(tdata)
round(cov(tdata),14)
cor(tdata)
plot(tdata)
abline(h=0,v=0)
round(cor(tdata),16)

• ################ comparison of both
method ############
comparison=tdata -
as.matrix(cbind(xx,yy));comparison
round(comparison,4)

########### using package. md from original data #####
md=mahalanobis(data,colMeans(data),cov(data),inverted =F);md
## md =mahalanobis distance
######## mahalanobis distance from transformed data ########
tmd=mahalanobis(tdata,colMeans(tdata),cov(tdata),inverted =F);tmd
###### comparison ############
md-tmd

Mahalanobis distance : Manually
mu=colMeans(tdata);mu
incov=solve(cov(tdata));incov
md1=t(tdata[1,]-mu)%*%incov%*%(tdata[1,]-
mu);md1
mu);md2
mu);md3
............. ……………. …………..
mu);md20
md for package and manully are equal

tdata
s1=sd(tdata[,1]);s1
s2=sd(tdata[,2]);s2
xstar=c(tdata[,1])/s1;xstar
ystar=c(tdata[,2])/s2;ystar
md1=sqrt((-1.46787309)^2 + (0.1484462)^2);md1
md2=sqrt((-1.22516896 )^2 + ( 0.6020111 )^2);md2
………. ………… ……………..
Not equal to above distances……..
Why ???????
Take into account mean

12/12/2016
Statistical Distance under Rotated
Coordinate System
2
2222112
2
111
212
211
22
2
2
11
2
1
21
2),(
cossin~
sincos~
~
~
~
~
),(
)~,~(),0,0(
xaxxaxaPOd
xxx
xxx
s
x
s
x
POd
xxPO






𝑠11 𝑠22 are
sample
variances

• After some manipulation this can be written
in terms of origin variables
Whereas

Proof…………
• 𝑠11=
1
𝑛−1
Σ( 𝑥1 − 𝑥1 )
2
=
1
𝑛−1
Σ (𝑥1 cos 𝜃 + 𝑥2 sin 𝜃 − 𝑥1 cos 𝜃 − 𝑥2 sin 𝜃 )2
= 𝑐𝑜𝑠2(𝜃)𝑠11 + 2 sin 𝜃 cos 𝜃 𝑠12 + 𝑠𝑖𝑛2(𝜃)𝑠22
𝑠22 =
1
𝑛−1
Σ( 𝑥2 − 𝑥2 )
2
= Σ
1
𝑛−1
( − 𝑥1 sin 𝜃 + 𝑥2 cos 𝜃 + 𝑥1 sin(𝜃) + 𝑥2 cos 𝜃 ) 2
= 𝑐𝑜𝑠2(𝜃)𝑠22 - 2 sin 𝜃 cos 𝜃 𝑠12 + 𝑠𝑖𝑛2(𝜃)𝑠11

Continued………….
𝑑(𝑂, 𝑃)=
(𝑥1cos 𝜃 + 𝑥2 sin 𝜃) 2
𝑠11
+
(− 𝑥1 sin 𝜃 + 𝑥2 cos 𝜃)2
𝑠22

12/12/2016
General Statistical Distance
)])((2
))((2))((2
)(
)()([
),(
]222
[
),(
),,,(),0,,0,0(),,,,(
11,1
331113221112
2
2
2222
2
1111
1,131132112
22
222
2
111
2121
pppppp
pppp
pppp
ppp
pp
yxyxa
yxyxayxyxa
yxa
yxayxa
QPd
xxaxxaxxa
xaxaxa
POd
yyyQOxxxP
















• The above distances are completely
determined by the coefficients(weights)
𝑎𝑖𝑘 ; i, k = 1,2,3, … … … p. These are can be
arranged in rectangular array as
this array (matrix) must be symmetric positive
definite.

Why Positive definite ????
Let A be a positive definite matrix .
A=C’C
X’AX= X’C’CX = (CX)’(CX) = Y’Y It obeys
all the distance property.
X’AX is distance ,
For different A it gives different distance .

• Why positive definite matrix ????????
• Ans: Spectral decomposition : the spectral
decomposition of a k×k symmetric matrix
A is given by
• Where (λ𝑖, 𝑒𝑖); 𝑖 = 1,2, … … … , 𝑘 are pair of
eigenvalues and eigenvectors.
And λ1 ≥ λ2 ≥ λ3 ≥ … … . . And if pd λ𝑖 > 0
& invertible .

4.0 4.5 5.0 5.5 6.0
2
3
4
5
λ1
λ2
𝑒1
𝑒2

• Suppose p=2. The distance from origin is
By spectral decomposition
X1
X2
𝐶
√λ1
𝐶
√λ2

Another property is
Thus
We use this property in Mahalanobis distance

12/12/2016
Necessity of Statistical Distance
Center of
gravity
Another
point

• Consider the Euclidean distances from the
point Q to the points P and the origin O.
• Obviously d(PQ) > d (QO )
 But, P appears to be more like the points in
the cluster than does the origin .
 If we take into account the variability of the
points in cluster and measure distance by
statistical distance , then Q will be closer to P
than O .

Mahalanobis distance
• The Mahalanobis distance is a descriptive
statistic that provides a relative measure of a
data point's distance from a common point. It
is a unitless measure introduced by P. C.
Mahalanobis in 1936

Intuition of Mahalanobis Distance
• Recall the eqution
d(O,P)= 𝑥′ 𝐴𝑥
=> 𝑑2
(𝑂, 𝑃) =𝑥′
𝐴𝑥
Where x=
𝑥1
𝑥2
, A=
𝑎11 𝑎12
𝑎21 𝑎22

d(O,P)= 𝑥′ 𝐴𝑥
𝑑2
𝑂, 𝑃 = 𝑥′
𝐴𝑥
Where 𝑥′
= 𝑥1 𝑥2 𝑥3 ⋯ 𝑥 𝑝 ; A=

𝑑2
(𝑃, 𝑄) = 𝑥 − 𝑦 ′
𝐴(𝑥 − 𝑦)
where, 𝑥′
= 𝑥1, 𝑥2, … , 𝑥 𝑝 ; 𝑦′
= (𝑦1, 𝑦2, … 𝑦𝑝)
A=

Mahalanobis Distance
• Mahalanobis used ,inverse of covariance
matrix Σ instead of A
• Thus 𝑑2
𝑂, 𝑃 = 𝑥′
Σ−1
𝑥 ……………..(1)
• And used 𝜇 (𝑐𝑒𝑛𝑡𝑒𝑟 𝑜𝑓 𝑔𝑟𝑎𝑣𝑖𝑡𝑦 ) instead of y
𝑑2
(𝑃, 𝑄) = (𝑥 − 𝜇 )′Σ−1
(𝑥 − 𝜇)………..(2)
Mah-
alan-
obis
dist-
ance

Mahalanobis Distance
• The above equations are nothing but
Mahalanobis Distance ……
• For example, suppose we took a single
observation from a bivariate population with
Variable X and Variable Y, and that our two
variables had the following characteristics

• single observation, X = 410 and Y = 400
The Mahalanobis distance for that single value
as:

• Therefore, our single observation would have
a distance of 1.825 standardized units from
the mean (mean is at X = 500, Y = 500).
• If we took many such observations, graphed
them and colored them according to their
Mahalanobis values, we can see the elliptical
Mahalanobis regions come out

• The points are actually distributed along two
primary axes:

If we calculate Mahalanobis distances for each
of these points and shade them according to
their distance value, we see clear elliptical
patterns emerge:

• We can also draw actual ellipses at regions of
constant Mahalanobis values:
68%
obs
95%
obs
99.7%
obs

• Which ellipse do you choose ??????
Ans : Use the 68-95-99.7 rule .
1) about two-thirds (68%) of the points should
be within 1 unit of the origin (along the axis).
2) about 95% should be within 2 units
3)about 99.7 should be within 3 units

Sample Mahalanobis Distancce
• The sample Mahalanobis distance is made by
replacing Σ by S and 𝜇 by 𝑋
• i.e (X- 𝑋)’ 𝑆−1
(X- 𝑋)

For sample
(X- 𝑿)’ 𝑺−𝟏
(X- 𝑿)≤ 𝝌 𝟐
𝒑 (∝)
Distribution of mahalanobis distance

Distribution of mahalanobis distance
Let 𝑋1, 𝑋2, 𝑋3, … … … , 𝑋 𝑛 be in dependent
observation from
any population with
mean 𝜇 and finite (nonsingular) covariance Σ .
Then
 𝑛 ( 𝑋 − 𝜇) is approximately 𝑁𝑝(0, Σ)
and
 𝑛 𝑋 − 𝜇 ′
𝑆−1
( 𝑋 − 𝜇) is approximately χ 𝑝
2
for n-p large
This is nothing but central limit theorem

Mahalanobis distance in R
• ########### Mahalanobis Distance ##########
• x=rnorm(100);x
• dm=matrix(x,nrow=20,ncol=5,byrow=F);dm ##dm = data matrix
• cm=colMeans(dm);cm ## cm= column means
• cov=cov(dm);cov ##cov = covariance matrix
• incov=solve(cov);incov ##incov= inverse of
covarianc matrix

• ####### MAHALANOBIS DISTANCE : MANUALY ######
• @@@ Mahalanobis distance of first
• observation@@@@@@
• ob1=dm[1,];ob1 ## first observation
• mv1=ob1-cm;mv1 ## deviatiopn of first
observation from
center of gravity
• md1=t(mv1)%*%incov%*%mv1;md1 ## mahalanobis
distance of first
observation from center of
gravity
•

• @@@@@@ Mahalanobis distance of second
observation@@@@@
• ob2=dm[2,];ob2 ## second observation
• mv2=ob2-cm;mv2 ## deviatiopn of second
• observation from
• center of gravity
• md2=t(mv2)%*%incov%*%mv2;md2 ##mahalanobis
distance of second
observation from center of
gravity
................ ……………… …..……………

………....... ……………… ……………
@@@@@ Mahalanobis distance of 20th
observation@@@@@
• Ob20=dm[,20];ob20 [## 20th observation
• mv20=ob20-cm;mv20 ## deviatiopn of 20th
observation from
center of gravity
• md20=t(mv20)%*%incov%*%mv20;md20
## mahalanobis distance of
20thobservation from
center of gravity

####### MAHALANOBIS
DISTANCE : PACKAGE ########
• md=mahalanobis(dm,cm,cov,inverted =F);md
## md =mahalanobis
distance
• md=mahalanobis(dm,cm,cov);md

Another example
• x <- matrix(rnorm(100*3), ncol = 3)
• Sx <- cov(x)
• D2 <- mahalanobis(x, colMeans(x), Sx)

• plot(density(D2, bw = 0.5),
main="Squared Mahalanobis distances, n=100,
p=3")
• qqplot(qchisq(ppoints(100), df = 3), D2,
main = expression("Q-Q plot of Mahalanobis" *
~D^2 *
" vs. quantiles of" * ~ chi[3]^2))
• abline(0, 1, col = 'gray')
• ?? mahalanobis

Acknowledgement
Prof . Mohammad Nasser .
Richard A. Johnson
& Dean W. Wichern .
& others

Necessity of Statistical Distance
In home
Mother
In mess
Female
maid
Student
in mess

Different kind of distance and Statistical Distance

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Different kind of distance and Statistical Distance

Similar to Different kind of distance and Statistical Distance (20)

More from Khulna University

More from Khulna University (11)

Recently uploaded

Recently uploaded (20)

Different kind of distance and Statistical Distance