Use Data Mining technique (Decision Tree, Bootstrap Forest,Boosted Tree, Neural Network, Nominal Logistic Regression) to predict wining probability for each team. Accuracy in 2015 (until 3/26) is 73%.
2. Introduction
❖ Background: NCAA Men’s Basketball Tournament is a single-elimination tournament,
currently featuring 68 college teams.
❖ Objective: Create an effective model that examines factors contributing to a team’s
performance, based on data from 2001-2014.
❖ Result: As can be analyzed from the model, box score has a large effect on a team’s
result in 2015, which is helpful to predict:
➢ Win/Lose
➢ Winning Probability
➢ Sweet Sixteen
2
3. 3
Independent & Dependent Variables
Independent
Variables
SeedLocation
Box
Score
Assist, Steal, Block Shot,
% 2/3 Point Field Goals,
% Free Throws, Tempo
Seed#,
If this team is Top 5,
If this team is 15/16
Latitude, Longitude,
Distance Difference
Dependent Variable:
Win/Lose
5. ● Distribution Review: Most variables are normal distributed
5
Distribution and Correlation
● Scatter Matrix: Few variables has linear correlation
6. 5 Models Performance
Validation
Nominal Logistic Regression Accuracy: 72%
ROC Curve for Validation
Nominal Logistic
Regression has the
best performance
Performance Validation
6
Training
7. Result Lose Win
Lose 6 6
Win 5 24
Total 11 30
● 2015 Forecast Top 16 team● 2015 Forecast Result: 73% accuracy
Prediction
7
8. Model Explanation
Defensive efficiency, offensive efficiency, opponent’s
blocked shots and assists are most important attributes
based on individual p-value
According to our analysis results, good offensive efficiency
contributes more than defensive efficiency in leading a
team’s success
The closer
the distance
to stadium,
the better
result a team
performs
8
9. Interesting Analysis
● Average score difference is narrowing down
● The score pattern for Top 5 Seeds is less volatile
than the one for bottom 2 seeds
● 9 out of 16 is predicted correctly
● Only Georgetown shows a declining pattern
of winning probability
9
10. Result and Conclusion
❖ Whether a team wins or loses is positively related to four
primary factors:
➢offensive efficiency
➢defensive efficiency
➢block shots
➢assists
❖ Accuracy: Our model is 72.19% accurate in predicting a
team’s result for 2015.
10