Worked with data from all 30 NBA teams to create insights and statistical models, by using R studio and statistical methods.
Topics include:
-Importance of a Lead in the NBA
-Overdispersion (Free Throws)
-Free Throw Momentum
-Linear Regression Predicting Wins
4. WIENER PROCESS
• Stochastic Process: A collection of random variables indexed by t (time).
• Wiener Process: A type of Stochastic Process with the following properties:
i. W(0) = 0
ii. W(t) – W(s) has a normal distribution with mean 0 and variance 𝜎2 𝑡 − 𝑠 ; 𝑠 ≤ 𝑡
iii. 𝑊(𝑡2) − 𝑊 𝑡1 , 𝑊 𝑡3 − 𝑊 𝑡2 , … . , 𝑊 𝑡 𝑛 − 𝑊 𝑡 𝑛−1 are independent for
𝑡1 ≤ 𝑡2 ≤ ⋯ ≤ 𝑡 𝑛
• Let W(t) be a Wiener process that represents the score differential at time t, where
t is the proportion of the game that has been played.
5.
6. FINDING 𝐸 𝑊 𝑡 UNDER THE ASSUMPTION THAT
BOTH TEAMS ARE EQUAL IN ABILITY
• What this means:
• What is the average point differential at time t, given both teams are equal
in ability?
• 𝐸 𝑊 𝑡 = 𝐸 𝑊 𝑡 − 𝑊 0 = 0 (tied)
7. FINDING V𝑎𝑟 𝑊 𝑡 , GIVEN 𝑊 1 = 𝜎2
(VARIANCE OF THE FINAL SCORE DIFFERENTIAL)
• 𝑉𝑎𝑟 𝑊 𝑡 = 𝑉𝑎𝑟 𝑊 𝑡 − 𝑊 0 = 𝜎2 𝑡 − 0 = 𝜎2
• Note: the variance of the score differential at time t is proportional to
the variance of the final score differential
• Ex: 𝑉𝑎𝑟 𝑊
1
2
=
1
2
𝜎2 (variance of the score differential at half-time)
14. COMPARING MODEL TO REALITY
Lead After: BVN Actual Model difference
1Q 64.5% 65.1% 66.7% -1.6
2Q 72.4% 72.5% 75.0% -2.5
3Q 81.1% 82.0% 83.3% -1.3
• Conclusions:
• This model is quite accurate compared to the Actual
percentage.
16. OVERDISPERSION
• Definition: “In statistics, overdispersion is the presence of greater variability
(statistical dispersion) in a data set than would be expected based on a given
statistical model.”
• In our case we would like to check for overdispersion for game-by-game free throws
success for certain players, based on the model x~Bin(n,𝜋)
• Where: n = number of free throws attempted
𝜋 = free throw percentage
x = number of makes
• Std(x) = 𝑛𝜋𝑞
• We Collected free throw data on a few players who averaged a large amount of free-
throws attempts per game, from the 2018-19 NBA season.
17. CHI-SQUARED TEST
• 𝒳2 = 𝑖 𝑗
𝜃 𝑖,𝑗−𝐸 𝑖,𝑗
2
𝐸 𝑖,𝑗
• Where 𝐸𝑖,𝑗 =
(𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙)(𝑐𝑜𝑙 𝑡𝑜𝑡𝑎𝑙)
𝑔𝑟𝑎𝑛𝑑 𝑡𝑜𝑡𝑎𝑙
• = (total ft’s in game i) × ( 𝜋)
• Test of homogeneity
• 𝐻0: 𝑃 𝑓𝑡 𝑠𝑎𝑚𝑒 𝑒𝑎𝑐ℎ 𝑔𝑎𝑚𝑒
• 𝐻0: 𝜋1 = 𝜋2 = ⋯ = 𝜋 𝑛 = 𝜋
• 𝜋 = mle ft% (season)
• A large 𝒳2 is evidence against our 𝐻0
Game FT made FT miss FTA
1 3 1 4
2 11 4 15
3 5 1 6
4 6 3 9
5 6 2 8
6 9 0 9
7 1 2 3
⋮ ⋮ ⋮ ⋮
50 9 2 11
Total: 513 79 592
James Harden FT’s (18’/19’
season)
18. CHI-SQUARED TEST
(RESULTS)
• We chose players who averaged a
high number of free-throw attempts
per game.
• Data suggests null hypothesis holds
• No player has a significant p-value
player P-value
James Harden .1
Joel Embiid .86
Giannis .58
Blake Griffin .77
Kevin Durant .66
Damian Lillard .74
Paul George .34
Anthony Davis .20
20. WALD-WOLFOWITZ RUNS TEST
• We want to test free-throw
Dependency within a game
• How one free-throw affects the next
• 𝐻0: the order of makes or misses in a
game is random.
• To do this we use Wald-Wolfowitz Runs
Test
• Longer runs is evidence against 𝐻0
• Data comes from 16/17 season.
22. SIMPLE LINEAR REGRESSION
• Data Used:
• 14’/15’ NBA Team Stats
• 15’/16’ NBA Team Stats
• 16/17’ NBA Team Stats
• 17/18’ NBA Team Stats
• 18/19’ NBA Team Stats
• Response Variable:
• won (total games won during the season)
• Predictor Variable:
• 3pa (three point attempts)
• 3pm (three pointers made)
• 3p%
• fg%
23. SIMPLE LINEAR REGRESSION: 𝑅2
VALUES
3pa 3pm 3p% fg%
18’/19’ 9.74% 23.44% 29.74% 36.89%
17’/18’ 5.79% 11.52% 23.21% 42.39%
16’/17’ 7.61% 19.88% 40.34% 44.38%
15’/16’ 6.23% 15.80% 41.54% 42.62%
14’/15’ 22.25% 37.27% 48.10% 55.51%
• Using fg% as the predictor variable gives us a higher 𝑅2
value
26. MULTIPLE LINEAR REGRESSION
• Data Used:
• 2016-2017 NBA Team Stats
• Response Variable:
• Won ( total games won during the
season)
• Predictor Variables:
• 2p%, 3p%, to (turnovers), tr ( total
rebounds), bk (blocks), O 2p%, O
3p%, O tr, O to
27. Regression Analysis: won versus 2p%, 3p%, tr, to, bk, O 2p%, O 3p%, O tr, O to
Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Regression 9 3393.07 377.01 31.82 0.000
2p% 1 293.05 293.05 24.74 0.000
3p% 1 119.74 119.74 10.11 0.005
tr 1 133.91 133.91 11.30 0.003
to 1 249.82 249.82 21.09 0.000
bk 1 79.57 79.57 6.72 0.017
O 2p% 1 184.10 184.10 15.54 0.001
O 3p% 1 207.14 207.14 17.49 0.000
O tr 1 161.41 161.41 13.63 0.001
O to 1 166.80 166.80 14.08 0.001
Error 20 236.93 11.85
Total 29 3630.00
Model Summary
S R-sq R-sq(adj) R-sq(pred)
3.44187 93.47% 90.54% 85.32%
28. Regression Analysis: won versus 2p%, 3p%, tr, to, bk, O 2p%, O 3p%, O tr, O to
Coefficients
Term Coef SE Coef T-Value P-Value VIF
Constant 86.5 65.2 1.33 0.200
2p% 238.7 48.0 4.97 0.000 2.00
3p% 168.0 52.8 3.18 0.005 2.23
tr 0.02197 0.00654 3.36 0.003 2.05
to -0.03498 0.00762 -4.59 0.000 1.59
bk -0.0418 0.0161 -2.59 0.017 2.11
O 2p% -256.6 65.1 -3.94 0.001 1.80
O 3p% -283.3 67.8 -4.18 0.000 1.94
O tr -0.01723 0.00467 -3.69 0.001 1.19
O to 0.0387 0.0103 3.75 0.001 1.70
Regression Equation
won = 86.5 + 238.7 2p% + 168.0 3p% + 0.02197 tr - 0.03498 to - 0.0418 bk - 256.6 O 2p%
- 283.3 O 3p% - 0.01723 O tr + 0.0387 O to
29. Correlation: 2p%, 3p%, tr, to, bk, O 2p%, O 3p%, O tr, O to
Correlations
2p% 3p% tr to bk O 2p% O 3p% O tr
3p% 0.478
tr 0.046 -0.332
to 0.283 -0.178 0.262
bk 0.132 0.172 0.204 0.120
O 2p% 0.008 -0.261 -0.136 0.148 -0.560
O 3p% -0.413 -0.313 -0.121 0.010 -0.546 0.396
O tr -0.141 -0.256 -0.126 0.075 0.078 0.065 0.020
O to 0.134 -0.002 -0.352 0.288 0.170 0.137 -0.171 0.136
Cell Contents: Pearson correlation
30.
31. OVERVIEW
• Importance of a lead in the NBA.
• It’s important to get off to a good start in a game.
• You don’t want to be behind going into the 4th quarter.
• Overdispersion?:
• Found no evidence that suggested overdispersion for game-by-game free-throw percentages
• Free-throw Momentum?:
• Found no evidence that suggested that one free-throw affects the next during a game.
• Predicting wins Using Team data (regression)
• Fg% had highest 𝑅2
value when using simple linear regression on our data.
• MULT. LINEAR REGRESSION EQUATION
• won = 86.5 + 238.7 2p% + 168.0 3p% + 0.02197 tr - 0.03498 to - 0.0418 bk -
256.6 O 2p% - 283.3 O 3p% - 0.01723 O tr + 0.0387 O to