SlideShare une entreprise Scribd logo
1  sur  18
MARCH DATA CRUNCH MADNESS
The Shooting Stars
Nan (Miya) Wang
John De Martino
Pritha Sinha
Armi Thassim
1
INTRODUCTION
 Background: With 68 college basketball teams competing in a single-
elimination tournament, the National Collegiate Athletic Association
(NCAA) is played every spring in the US.
 Objective: Create an optimized model to predict 2016 NCAA Finals,
based on historical regular season data from 2002 to 2015, through
applying various machine learning techniques.
 Results:
http://shootingstarsnyc.azurewebsites.net/
Above link to our machine learning web API can help you make your own
2016 NCAA Predictions!
2
ANALYSIS KPI
Model Performance Evaluation Metrics
 Find a set of predictions that minimizes Log loss.
 Penalize heavily being simultaneously confident and wrong.
 Balance between being too conservative and too confident.
Actual number of
games played in the
tournament
Predicted probability
that team A beats team
B
Actual binary
outcome of each
3
ANALYSIS PROCESS
Model Evaluation
4
DATA PREPARATION
5
Feature Transformation and Normalization
Rank to Score
Team 1 Adjusted Seed = 0.5 + 0.03 *
(Team 2 Seed - Team 1 Seed)
Normalization
MinMax Scaler
Derive
differences
Team 1 score of an attribute - Team 2
score of an attribute
FEATURE SELECTION
Feature Correlation
Heatmap
Feature Distribution
Histogram
Correlation and Distribution
6
A few Features have
linear Correlation
Most Features are
Normal Distributed
Importance Plotting and Recursive Elimination
Log Loss for Different Feature
Numbers
Feature Importance
FEATURE SELECTION
7
Optimal Number of
Feature: 9
● 97 Features to 9 Features
PERFORMANCE VALIDATION
Cross Validation and Different Training Size
Grid Searching/Parameter Tuning
Acceptable Model
Performance Variation
8
Learning Curve
Overfitting when
Training Size under 45%
Partition Size:
50% - 50%
PERFORMANCE VALIDATION
Model Fusion RF, GBT and Logistic
Regression are Top 3
Majority Voting
 Leverage the information gleaned from different methods
 Minimize the flaws in each model.
 Increase stability and guarantee accuracy
9
PREDICTION REVIEW
Predicted Prob Distribution
for 2016 NCAA
Our model keeps more
affirmative on “Gonna
Win” Teams while
holding ambiguous to
“Gonna Lose” Teams.
10
PREDICTION REVIEW
11
Predicted Round of 32 for 2016 NCAA
Our Model Accurately
Predicted 25 out of 32.
Accuracy: 78%
PREDICTION REVIEW
12
Our Model Accurately
Predicted 12 out of 16.
Accuracy: 75%
Predicted Sweet 16 for 2016 NCAA
PREDICTION REVIEW
13
Predicted Elite Eight for 2016 NCAA
Our Model Accurately
Predicted 6 out of 8.
Accuracy: 75%
INTERESTING ANALYSIS
14
Top Teams and Cinderella Teams
Top Eight Teams from 2002 to
2015
Detailed performance of
eight top teams in each
season ?
INTERESTING ANALYSIS
15
Eight Top Teams
UNC Michigan St.
ConnecticutKansas Kentucky Duke
LouisvilleFlorida
Championship Count:
1. Connecticut(3 times)
2. Duke; UNC; Florida(twice)
3. Kansas; Kentucky; Louisville(once)
Years Count:
1. Kansas(12 years)
2. Duke; UNC; Kentucky(11 years)
3. Florida; Michigan St.(10 years)
No Championship:
 Michigan St.
INTERESTING ANALYSIS
16
Top Teams and Cinderella Teams
Most Frequent “Cinderella”
from 2002 to 2015
We define: In each game, a winning team with higher seed and lower RPI, as Cinderella
Top Teams being Cinderella:
 Michigan St.
 Connecticut
 Kentucky
INTERESTING ANALYSIS
17
Cinderella Teams
We define: In each game, a winning team with higher seed and lower RPI, as Cinderella
Model Prediction for
Cinderella
Our model accurately
identified all Cinderella.
Mean
Score:
80%
CONCLUSION
Self Attribute(importance descending)
 offensive efficiency
 defensive efficiency
 block shots
Opponent Attribute
 2 point field goals shooting
 3 point field goals shooting
On Training Dataset:
 Log_loss: 0.46
 Accuracy: 81%
On 2016 Testing Dataset:
 Accuracy: 75%-78%
Primary Factors for Win-Lose:
Model Accuracy
18
Outer Factor
 distance
Useful Indicator
 RPI
 seed

Contenu connexe

Similaire à ShootingStars_Powerpoint for march madness

NCAA March Madness Recruiting For Success
NCAA March Madness Recruiting For SuccessNCAA March Madness Recruiting For Success
NCAA March Madness Recruiting For SuccessJonathan Stryer
 
Do lower-seeded teams really play with an "underdog" mentality?
Do lower-seeded teams really play with an "underdog" mentality?Do lower-seeded teams really play with an "underdog" mentality?
Do lower-seeded teams really play with an "underdog" mentality?Kymee Noll
 
Bracketology talk at the Crossroads of ideas
Bracketology talk at the Crossroads of ideasBracketology talk at the Crossroads of ideas
Bracketology talk at the Crossroads of ideasLaura Albert
 
2015 Sport Analysis for March Madness
2015 Sport Analysis for March Madness2015 Sport Analysis for March Madness
2015 Sport Analysis for March MadnessYi Chun (Nancy) Chien
 
March Madness Probabilities
March Madness ProbabilitiesMarch Madness Probabilities
March Madness ProbabilitiesLiana Valentino
 
Our march madness bracket by audit score
Our march madness bracket by audit scoreOur march madness bracket by audit score
Our march madness bracket by audit scoreObservePoint
 
Vanderbilt Football- A New Meaning to Anchor Down
Vanderbilt Football- A New Meaning to Anchor DownVanderbilt Football- A New Meaning to Anchor Down
Vanderbilt Football- A New Meaning to Anchor DownCharlie Pallett
 
CLanctot_DSlavin_JMiron_Stats415_Project
CLanctot_DSlavin_JMiron_Stats415_ProjectCLanctot_DSlavin_JMiron_Stats415_Project
CLanctot_DSlavin_JMiron_Stats415_ProjectDimitry Slavin
 
Pierre massé portfolio
Pierre massé portfolioPierre massé portfolio
Pierre massé portfolioPierre Massé
 
m503 Project1 FINAL DRAFT
m503 Project1 FINAL DRAFTm503 Project1 FINAL DRAFT
m503 Project1 FINAL DRAFTBrian Becker
 
Honors Research Colloquium Final Paper
Honors Research Colloquium Final PaperHonors Research Colloquium Final Paper
Honors Research Colloquium Final PaperMark Edwards
 
NBA playoff prediction Model.pptx
NBA playoff prediction Model.pptxNBA playoff prediction Model.pptx
NBA playoff prediction Model.pptxrishikeshravi30
 
Cricket score and winner predictor
Cricket score and winner predictorCricket score and winner predictor
Cricket score and winner predictorKeyaShukla3
 
Punt Monster - Joshua H. Lee's Capstone
Punt Monster - Joshua H. Lee's CapstonePunt Monster - Joshua H. Lee's Capstone
Punt Monster - Joshua H. Lee's CapstoneJoshua H. Lee
 
Lessons from Harvard: Using Gamification to Juice Your Sales Training by Dr....
Lessons from Harvard:  Using Gamification to Juice Your Sales Training by Dr....Lessons from Harvard:  Using Gamification to Juice Your Sales Training by Dr....
Lessons from Harvard: Using Gamification to Juice Your Sales Training by Dr....QstreamInc
 
Football Result Prediction using Dixon Coles Algorithm
Football Result Prediction using Dixon Coles AlgorithmFootball Result Prediction using Dixon Coles Algorithm
Football Result Prediction using Dixon Coles AlgorithmAakash Jacobs
 

Similaire à ShootingStars_Powerpoint for march madness (20)

NCAA March Madness Recruiting For Success
NCAA March Madness Recruiting For SuccessNCAA March Madness Recruiting For Success
NCAA March Madness Recruiting For Success
 
Do lower-seeded teams really play with an "underdog" mentality?
Do lower-seeded teams really play with an "underdog" mentality?Do lower-seeded teams really play with an "underdog" mentality?
Do lower-seeded teams really play with an "underdog" mentality?
 
Bracketology talk at the Crossroads of ideas
Bracketology talk at the Crossroads of ideasBracketology talk at the Crossroads of ideas
Bracketology talk at the Crossroads of ideas
 
2015 Sport Analysis for March Madness
2015 Sport Analysis for March Madness2015 Sport Analysis for March Madness
2015 Sport Analysis for March Madness
 
March madness sports analysis
March madness sports analysisMarch madness sports analysis
March madness sports analysis
 
March Madness Probabilities
March Madness ProbabilitiesMarch Madness Probabilities
March Madness Probabilities
 
Our march madness bracket by audit score
Our march madness bracket by audit scoreOur march madness bracket by audit score
Our march madness bracket by audit score
 
Vanderbilt Football- A New Meaning to Anchor Down
Vanderbilt Football- A New Meaning to Anchor DownVanderbilt Football- A New Meaning to Anchor Down
Vanderbilt Football- A New Meaning to Anchor Down
 
CLanctot_DSlavin_JMiron_Stats415_Project
CLanctot_DSlavin_JMiron_Stats415_ProjectCLanctot_DSlavin_JMiron_Stats415_Project
CLanctot_DSlavin_JMiron_Stats415_Project
 
CRA-IM-Group4.pptx
CRA-IM-Group4.pptxCRA-IM-Group4.pptx
CRA-IM-Group4.pptx
 
IM Final.pptx
IM Final.pptxIM Final.pptx
IM Final.pptx
 
Pierre massé portfolio
Pierre massé portfolioPierre massé portfolio
Pierre massé portfolio
 
m503 Project1 FINAL DRAFT
m503 Project1 FINAL DRAFTm503 Project1 FINAL DRAFT
m503 Project1 FINAL DRAFT
 
Honors Research Colloquium Final Paper
Honors Research Colloquium Final PaperHonors Research Colloquium Final Paper
Honors Research Colloquium Final Paper
 
NBA playoff prediction Model.pptx
NBA playoff prediction Model.pptxNBA playoff prediction Model.pptx
NBA playoff prediction Model.pptx
 
Cricket score and winner predictor
Cricket score and winner predictorCricket score and winner predictor
Cricket score and winner predictor
 
Punt Monster - Joshua H. Lee's Capstone
Punt Monster - Joshua H. Lee's CapstonePunt Monster - Joshua H. Lee's Capstone
Punt Monster - Joshua H. Lee's Capstone
 
Cricket predictor
Cricket predictorCricket predictor
Cricket predictor
 
Lessons from Harvard: Using Gamification to Juice Your Sales Training by Dr....
Lessons from Harvard:  Using Gamification to Juice Your Sales Training by Dr....Lessons from Harvard:  Using Gamification to Juice Your Sales Training by Dr....
Lessons from Harvard: Using Gamification to Juice Your Sales Training by Dr....
 
Football Result Prediction using Dixon Coles Algorithm
Football Result Prediction using Dixon Coles AlgorithmFootball Result Prediction using Dixon Coles Algorithm
Football Result Prediction using Dixon Coles Algorithm
 

ShootingStars_Powerpoint for march madness

  • 1. MARCH DATA CRUNCH MADNESS The Shooting Stars Nan (Miya) Wang John De Martino Pritha Sinha Armi Thassim 1
  • 2. INTRODUCTION  Background: With 68 college basketball teams competing in a single- elimination tournament, the National Collegiate Athletic Association (NCAA) is played every spring in the US.  Objective: Create an optimized model to predict 2016 NCAA Finals, based on historical regular season data from 2002 to 2015, through applying various machine learning techniques.  Results: http://shootingstarsnyc.azurewebsites.net/ Above link to our machine learning web API can help you make your own 2016 NCAA Predictions! 2
  • 3. ANALYSIS KPI Model Performance Evaluation Metrics  Find a set of predictions that minimizes Log loss.  Penalize heavily being simultaneously confident and wrong.  Balance between being too conservative and too confident. Actual number of games played in the tournament Predicted probability that team A beats team B Actual binary outcome of each 3
  • 5. DATA PREPARATION 5 Feature Transformation and Normalization Rank to Score Team 1 Adjusted Seed = 0.5 + 0.03 * (Team 2 Seed - Team 1 Seed) Normalization MinMax Scaler Derive differences Team 1 score of an attribute - Team 2 score of an attribute
  • 6. FEATURE SELECTION Feature Correlation Heatmap Feature Distribution Histogram Correlation and Distribution 6 A few Features have linear Correlation Most Features are Normal Distributed
  • 7. Importance Plotting and Recursive Elimination Log Loss for Different Feature Numbers Feature Importance FEATURE SELECTION 7 Optimal Number of Feature: 9 ● 97 Features to 9 Features
  • 8. PERFORMANCE VALIDATION Cross Validation and Different Training Size Grid Searching/Parameter Tuning Acceptable Model Performance Variation 8 Learning Curve Overfitting when Training Size under 45% Partition Size: 50% - 50%
  • 9. PERFORMANCE VALIDATION Model Fusion RF, GBT and Logistic Regression are Top 3 Majority Voting  Leverage the information gleaned from different methods  Minimize the flaws in each model.  Increase stability and guarantee accuracy 9
  • 10. PREDICTION REVIEW Predicted Prob Distribution for 2016 NCAA Our model keeps more affirmative on “Gonna Win” Teams while holding ambiguous to “Gonna Lose” Teams. 10
  • 11. PREDICTION REVIEW 11 Predicted Round of 32 for 2016 NCAA Our Model Accurately Predicted 25 out of 32. Accuracy: 78%
  • 12. PREDICTION REVIEW 12 Our Model Accurately Predicted 12 out of 16. Accuracy: 75% Predicted Sweet 16 for 2016 NCAA
  • 13. PREDICTION REVIEW 13 Predicted Elite Eight for 2016 NCAA Our Model Accurately Predicted 6 out of 8. Accuracy: 75%
  • 14. INTERESTING ANALYSIS 14 Top Teams and Cinderella Teams Top Eight Teams from 2002 to 2015 Detailed performance of eight top teams in each season ?
  • 15. INTERESTING ANALYSIS 15 Eight Top Teams UNC Michigan St. ConnecticutKansas Kentucky Duke LouisvilleFlorida Championship Count: 1. Connecticut(3 times) 2. Duke; UNC; Florida(twice) 3. Kansas; Kentucky; Louisville(once) Years Count: 1. Kansas(12 years) 2. Duke; UNC; Kentucky(11 years) 3. Florida; Michigan St.(10 years) No Championship:  Michigan St.
  • 16. INTERESTING ANALYSIS 16 Top Teams and Cinderella Teams Most Frequent “Cinderella” from 2002 to 2015 We define: In each game, a winning team with higher seed and lower RPI, as Cinderella Top Teams being Cinderella:  Michigan St.  Connecticut  Kentucky
  • 17. INTERESTING ANALYSIS 17 Cinderella Teams We define: In each game, a winning team with higher seed and lower RPI, as Cinderella Model Prediction for Cinderella Our model accurately identified all Cinderella. Mean Score: 80%
  • 18. CONCLUSION Self Attribute(importance descending)  offensive efficiency  defensive efficiency  block shots Opponent Attribute  2 point field goals shooting  3 point field goals shooting On Training Dataset:  Log_loss: 0.46  Accuracy: 81% On 2016 Testing Dataset:  Accuracy: 75%-78% Primary Factors for Win-Lose: Model Accuracy 18 Outer Factor  distance Useful Indicator  RPI  seed