Football Result Prediction using Dixon Coles Algorithm
ShootingStars_Powerpoint for march madness
1. MARCH DATA CRUNCH MADNESS
The Shooting Stars
Nan (Miya) Wang
John De Martino
Pritha Sinha
Armi Thassim
1
2. INTRODUCTION
Background: With 68 college basketball teams competing in a single-
elimination tournament, the National Collegiate Athletic Association
(NCAA) is played every spring in the US.
Objective: Create an optimized model to predict 2016 NCAA Finals,
based on historical regular season data from 2002 to 2015, through
applying various machine learning techniques.
Results:
http://shootingstarsnyc.azurewebsites.net/
Above link to our machine learning web API can help you make your own
2016 NCAA Predictions!
2
3. ANALYSIS KPI
Model Performance Evaluation Metrics
Find a set of predictions that minimizes Log loss.
Penalize heavily being simultaneously confident and wrong.
Balance between being too conservative and too confident.
Actual number of
games played in the
tournament
Predicted probability
that team A beats team
B
Actual binary
outcome of each
3
5. DATA PREPARATION
5
Feature Transformation and Normalization
Rank to Score
Team 1 Adjusted Seed = 0.5 + 0.03 *
(Team 2 Seed - Team 1 Seed)
Normalization
MinMax Scaler
Derive
differences
Team 1 score of an attribute - Team 2
score of an attribute
7. Importance Plotting and Recursive Elimination
Log Loss for Different Feature
Numbers
Feature Importance
FEATURE SELECTION
7
Optimal Number of
Feature: 9
● 97 Features to 9 Features
8. PERFORMANCE VALIDATION
Cross Validation and Different Training Size
Grid Searching/Parameter Tuning
Acceptable Model
Performance Variation
8
Learning Curve
Overfitting when
Training Size under 45%
Partition Size:
50% - 50%
9. PERFORMANCE VALIDATION
Model Fusion RF, GBT and Logistic
Regression are Top 3
Majority Voting
Leverage the information gleaned from different methods
Minimize the flaws in each model.
Increase stability and guarantee accuracy
9
10. PREDICTION REVIEW
Predicted Prob Distribution
for 2016 NCAA
Our model keeps more
affirmative on “Gonna
Win” Teams while
holding ambiguous to
“Gonna Lose” Teams.
10
14. INTERESTING ANALYSIS
14
Top Teams and Cinderella Teams
Top Eight Teams from 2002 to
2015
Detailed performance of
eight top teams in each
season ?
15. INTERESTING ANALYSIS
15
Eight Top Teams
UNC Michigan St.
ConnecticutKansas Kentucky Duke
LouisvilleFlorida
Championship Count:
1. Connecticut(3 times)
2. Duke; UNC; Florida(twice)
3. Kansas; Kentucky; Louisville(once)
Years Count:
1. Kansas(12 years)
2. Duke; UNC; Kentucky(11 years)
3. Florida; Michigan St.(10 years)
No Championship:
Michigan St.
16. INTERESTING ANALYSIS
16
Top Teams and Cinderella Teams
Most Frequent “Cinderella”
from 2002 to 2015
We define: In each game, a winning team with higher seed and lower RPI, as Cinderella
Top Teams being Cinderella:
Michigan St.
Connecticut
Kentucky
17. INTERESTING ANALYSIS
17
Cinderella Teams
We define: In each game, a winning team with higher seed and lower RPI, as Cinderella
Model Prediction for
Cinderella
Our model accurately
identified all Cinderella.
Mean
Score:
80%
18. CONCLUSION
Self Attribute(importance descending)
offensive efficiency
defensive efficiency
block shots
Opponent Attribute
2 point field goals shooting
3 point field goals shooting
On Training Dataset:
Log_loss: 0.46
Accuracy: 81%
On 2016 Testing Dataset:
Accuracy: 75%-78%
Primary Factors for Win-Lose:
Model Accuracy
18
Outer Factor
distance
Useful Indicator
RPI
seed