An application of probability theories on Sachin Tendulkar's cricketing records. A combination of statistics with cricket . Have fun while you learn :-)
1. PROBABILITY ANALYSIS : Sachin
Tendulkar’s Test Cricket Records
Presented by :
Sandipan Maiti
Lal Bahadur Shastri Institute of Management
2. FLOWLINE
Introduction
Data Summarization
Century Analysis
Half century analysis
Not out Analysis
Series Total
Key Learnings
3. INTRODUCTION
Statistics: Game of data.
Probability: Game of chances.
Cricket: Game close to Indian hearts.
Attempt to link the 3 games to reach to some
meaningful conclusions.
4. WHY SACHIN !
Most number of test played: Better applicability of
statistics.
Huge achievements: Solid base for discussion on
probability techniques.
Variety of data: Scope to cover different probability
techniques.
Finally: Because he is “THE SACHIN – GOD OF
CRICKET”
6. DATA SUMMARIZATION
Data Source: http://www.cricketarchive.com/
Raw data : Details of matches played against each
team in each season from 1989 to 2011. (Excluding
ongoing Test Series).
Using cross tabulations to structure the data in
organized manner.
7. OPPONENT-WISE RECORDS
Half Caught
Team Matches Innings Not out Runs HS Avg Century
Century out
Pakistan 18 27 2 1057 194 42.28 2 7 8
Newzeal
22 36 5 1532 217 49.42 4 8 10
and
England 24 39 4 2150 193 61.43 7 10 19
Srilanka 25 36 3 1995 203 60.45 9 6 12
Australia 31 59 7 3151 241 60.60 11 13 19
Banglade
7 9 3 820 248 136.67 5 0 6
sh
Zimbabw
9 14 2 918 201 76.50 3 3 5
e
West
16 25 2 1328 179 57.74 3 7 14
Indies
South
25 45 4 1741 169 42.46 7 5 13
Africa
Total 177 290 32 1469 248 56.95 51 59 106
11. PROBABILITY OF CENTURY IN A
MATCH
MATCHES CENTURIES PROBABILITY OF
TEAM
PLAYED SCORED CENTURY
Pakistan 18 2 0.111
Newzealand 22 4 0.182
England 24 7 0.292
Srilanka 25 9 0.360
Australia 31 11 0.355
Bangladesh 7 5 0.714
Zimbabwe 9 3 0.333
West Indies 16 3 0.188
South Africa 25 7 0.280
OVERALL 177 51 0.288
12. PROBABILITY OF CENTURY IN A
TEST MATCH
0.8
0.7
0.6
0.5
0.4
0.3 PROBABILITY OF CENTURY
0.2
0.1
0
13. IF A CENTURY, PROBABLE TEAM
TEAM CENTURIES SCORED PROBABILITY
Pakistan 2 0.039
Bangladesh 4 0.078
England 7 0.137
Srilanka 9 0.176
Australia 11 0.216
Bangladesh 5 0.098
Zimbabwe 3 0.059
West Indies 3 0.059
South Africa 7 0.137
TOTAL 51 1.000
14. IF A CENTURY, PROBABLE TEAM
0.25
0.2
0.15
PROBABILITY
0.1
0.05
0
Pakistan Bangladesh England Srilanka Australia Bangladesh Zimbabwe West Indies South
Africa
15. OBSERVATIONS
If Sachin plays a test match, then he is most likely to
score a century when opponent is Bangladesh.
If Sachin scores a century in a test match it is most
likely that the opponent is Australia.
In both type of above situations Sachin is least likely
to score a century against Pakistan.
17. Fifties Number of series f(x)
X
0 29 0 0.433
1 19 1 0.284
2 17 2 0.254
3 2 3 0.030
Total 67 ∑f(x) 1.000
18. Probability distribution for half Cumulative probability
centuries distribution
1.200
0.450
0.400 1.000
0.350
0.800
0.300
Probability
0.250
0.600
0.200 f(x) f(x)
0.150
0.400
0.100
0.050 0.200
0.000
0 1 2 3
Number of fifties in a series 0.000
0 1 2 3
19. VARIATION AND STD. DEVIATION
X x-µ (x-µ)^2 f(x) f(x)*(x-µ)^2
Variation of random
variable x (fifties in a
0 -0.88 0.775451 0.433 0.3356 series) is 0.7917 squared
fifties
1 0.12 0.014257 0.284 0.0040
Standard deviation in the
2 1.12 1.253063 0.254 0.3179 number of fifties in a
series (σ) is 0.8898
3 2.12 4.491869 0.030 0.1341
fifties
0.7917
=σ^2
20. EXPECTED FIFTIES IN A SERIES
X f(x) xf(x)
Thus the expected value
E(x) for Sachin scoring a
0 0.433 0.00
half century in a series is
1 0.284 0.28 0.88 or almost 1
In every test series he
2 0.254 0.51 plays, he is expected to
score a half century
3 0.030 0.09
1.000 0.88
22. TOTAL RUNS IN A SERIES
Histogram
14
Bin(Total in a series) Frequency
12
0 2
10
60 7
120 8 8
Frequency
180 7
6
240 13 Frequency
300 11 4
360 7
2
420 8
0
More than 420 4 0 60 120 180 240 300 360 420 More
than
420
Bin
23. TOTAL RUNS IN A SERIES..
Statistical Summary Slightly skewed towards
Mean 219.2836
right.
Median 213
Mode 199 Skewness is just 0.07
Standard Deviation 126.9055 approximately.
Sample Variance 16104.99
Data can be considered
Skewness 0.068918
Range 493 to be normally
Minimum 0 distributed for analysis
Maximum 493 purpose.
Sum 14692
Count 67
24. PROBABILITY OF A TOTAL OF 250 IN
A SERIES
Std Deviation σ = 127
Mean µ = 219
x = 400
f(x) = 0.00314* exp-(250-219)^2)/(2*127^2)
= 0.003
= 0.3 %
25. PROBABILITY OF A TOTAL OF 100 IN
A SERIES
Std Deviation σ = 127
Mean µ = 219
x = 400
f(x) = 0.00314* exp-(100-219)^2)/(2*127^2)
= 0.002
= 0.2 %
26. PROBABILITY OF A TOTAL OF 400 IN
A SERIES
Std Deviation σ = 127
Mean µ = 219
x = 400
f(x) = 0.00314* exp-(400-219)^2)/(2*127^2)
= 0.001
= 0.1 %
28. NOT OUT INNINGS: BINOMIAL
DISTRIBUTION
Only two possibilities in an innings - out or not out.
Remaining not out in any match is independent of
being out or not out in any other match.
Probability of remaining not out = 1- Probability of
being out.
=> Binomial Probability distribution
n! n x
P( X ) p (1 p)
x
x !(n x )!
29. NOT OUT INNINGS..
Probability of being not out in a match
P(N) = Total not out innings/ Total matches
= 32/290
= 0.11
Probability of being out in a match
P(O) = 1 - Probability of being not out in a match
= 1 – 0.11
= 0.89
30. PROBABILLITY OF SINGLE NOT OUT IN
10 MATCHES
X=1
n = 10
p = 0.11
1 – p = 0.89
Thus, n!
P( X ) p x (1 p)n x
x !(n x )!
= (10!/(1! * 9!)) * (0.11^1)*(0.89^9)
= 0.39
31. PROBABILLITY OF TWO NOT OUTS IN
20 MATCHES
X=2
n = 20
p = 0.11
1 – p = 0.89
Thus, n!
P( X ) p x (1 p)n x
x !(n x )!
= (20!/(2! * 18!)) * (0.11^2)*(0.89^18)
= 0. 28
32. KEY LEARNINGS
1. Trends contrasting to the preconceived notions.
(Binomial Probability Distribution)
2. Proves the general statements.(Half Century every
match)
3. Great tool for analysis.
4. Easy to use, apply and understand.