Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Ā
Kaggle digits analysis_final_fc
1. Kaggle Digits Analysis
Zachary Combs, Philip Remmele, M.S. Data Science Candidates
South Dakota State University
July 2, 2015
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
2. Introduction
In the following presentation we will be discussing our analysis of the Kaggle Digits data.
The Digits data set is comprised of a training set of 42,000 observations and 784
variables (not including the response), and a test set, containing 28,000 observations.
The variables contain pixelation values of hand written digits, ranging from 0-9.
For more information regarding the Kaggle Digits data please visit the site:
https://www.kaggle.com/c/digit-recognizer.
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
3. Objective
Develop a classiļ¬cation model that is able to accurately classify digit labels in the
test set where class labels are unknown.
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
4. Methods
Employed a repeated 10-fold cross-validation to obtain stable estimates of
classiļ¬cation accuracy.
Iteratively maximized model tuning parameters (e.g. number of components, decay
factor, etc.).
Performed model comparison.
Selected optimal model based on accuracy measure.
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
6. Data Exploration: Mean
0.00
0.02
0.04
0.06
0.08
0 50 100 150
Mean
Density
Train Data Mean Pixel Values
Table 1:Train Data Summary Statistics
Mean Median
33.40891 7.2315
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
7. Data Exploration: Percent Unique
0.00
0.05
0.10
0.15
0 20 40 60 80
Percent Unique
Density
Percent of Unique Pixel Values in Train Data
Table 2:Train Data Summary Statistics
Max Percentage Unique
60.95238
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
8. Data Exploration: Max
0.00
0.02
0.04
0.06
0 100 200 300
Max
Density
Max Pixel Values in Training Data
Table 3:Train Data Summary Statistics
Maximum Pixel Values
255
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
10. PCA With Diļ¬erent Transformations
0.25
0.50
0.75
1.00
0 50 100 150 200
Number of Components
PercentofTotalVarianceExplained
transform_Type
Dr. Saunder's Transform
Log Transformation
No Transform
Square Root
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
13. Shiny Applications: PCA Exploration
Shiny PCA 1
Shiny PCA 2
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
14. Data Partitioning
We created a 70/30 split of the data based on the distributions of class labels for
our training and validation set.
training_index <- createDataPartition(y = training[,1],
p = .7,
list = FALSE)
training <- training[training_index,]
validation <- training[-training_index,]
100 covariates were kept due to explaining approximately 95% of variation in the
data, and for the ease of presentation.
dim(training)
## [1] 29404 101
dim(validation)
## [1] 8821 101
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
17. Linear Discriminant Analysis
Discriminant Function
Ī“k (x) = xT
Ī£ā1
Āµk ā
1
2
ĀµT
k Ī£ā1
Āµk + logĻk
Estimating Class Probabilities
Pr(Y = k|X = x) =
Ļk e
Ī“k
K
l=1 Ļl e
Ī“ l (x)
Assigning x to the class with the largest discriminant score Ī“k (x) will result in the
highest probability for that classiļ¬cation. [James, 2013]
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
18. Model Fitting: LDA
ind - seq(10,100,10)
lda_Ctrl - trainControl(method = repeatedcv, repeats = 3,
classProbs = TRUE,
summaryFunction = defaultSummary)
accuracy_measure_lda - NULL
ptm - proc.time()
for(i in 1:length(ind)){
lda_Fit - train(label ~ ., data = training[,1:(ind[i]+1)],
method = lda,
metric = Accuracy,
maximize = TRUE,
trControl = lda_Ctrl)
accuracy_measure_lda[i] - confusionMatrix(validation$label,
predict(lda_Fit,
validation[,2:(ind[i]+1)]))$overall[1]
}
proc.time() - ptm
## user system elapsed
## 22.83 2.44 129.86
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
19. LDA Optimal Model: Number of Components vs. Model Accuracy
0.876 0.876
0.78
0.80
0.82
0.84
0.86
0.88
25 50 75 100
Number of Components
ClassificationAccuracy
LDA Accuracy vs. Number of Components
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
20. LDA Optimal Model Summary Statistics
Table 5:Confusion Matrix (Columns:Predicted,Rows:Actual)
zero one two three four ļ¬ve six seven eight nine
zero 827 1 2 4 2 16 7 2 4 5
one 0 916 2 4 0 7 3 2 16 1
two 9 31 726 17 21 8 19 11 42 7
three 3 11 23 803 6 41 7 26 26 25
four 0 9 2 0 770 2 5 1 8 56
ļ¬ve 10 16 2 39 5 653 18 9 29 15
six 11 9 2 3 13 23 804 0 9 0
seven 2 26 9 4 16 4 0 791 3 76
eight 4 46 6 28 13 32 7 3 686 17
nine 8 5 1 16 28 1 1 29 5 748
Table 6:Overall Accuracy
Accuracy 0.8756377
AccuracyLower 0.8685703
AccuracyUpper 0.8824559
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
22. LDA Optimal Model Bar Plot
0
300
600
900
zero one two three four five six seven eight nine
Labels
Count
Labels
actual
predicted
LDA Optimal Model
Predicted vs. Actual Class Labels
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
24. LDA Summary Statistics on Manually Labeled Test Set
Table 7:Confusion Matrix (Columns:Predicted,Rows:Actual)
zero one two three four ļ¬ve six seven eight nine
zero 92 1 1 0 1 3 1 0 3 0
one 0 111 0 0 0 1 0 0 3 0
two 1 6 62 2 3 1 1 3 4 0
three 1 1 4 100 0 4 1 5 5 1
four 0 0 0 0 100 1 0 1 0 6
ļ¬ve 0 2 0 3 1 83 0 0 4 2
six 2 0 1 0 0 1 92 0 4 0
seven 0 1 1 0 1 0 0 91 1 6
eight 0 8 1 2 1 5 0 0 65 4
nine 1 0 0 1 4 0 0 1 1 80
Table 8:Overall Accuracy
Accuracy 0.8760000
AccuracyLower 0.8539602
AccuracyUpper 0.8957969
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
25. Quadratic Discriminant Analysis
Discriminant Function
Ī“k (x) = ā
1
2
(x ā Āµk )T
Ī£ā1
k (x ā Āµk ) + logĻk
Estimating Class Probabilities
Pr(Y = k|X = x) =
Ļk fk (x)
K
l=1 Ļl fl (x)
While fk (x) are Gaussian densities with diļ¬erent covariance matrix
for each class
we obtain a Quadratic Discriminant Analysis. [James, 2013]
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
26. Model Fitting: QDA
qda_Ctrl - trainControl(method = repeatedcv, repeats = 3,
classProbs = TRUE,
summaryFunction = defaultSummary)
accuracy_measure_qda - NULL
ptm - proc.time()
for(i in 1:length(ind)){
qda_Fit - train(label ~ ., data = training[,1:(ind[i]+1)],
method = qda,
metric = Accuracy,
maximize = TRUE,
trControl = lda_Ctrl)
accuracy_measure_qda[i] - confusionMatrix(validation$label,
predict(qda_Fit,
validation[,2:(ind[i]+1)]))$overall[1]
}
proc.time() - ptm
## user system elapsed
## 20.89 2.16 66.20
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
27. QDA Optimal Model: Number of Components vs. Model Accuracy
0.967
0.875
0.900
0.925
0.950
25 50 75 100
Number of Components
ClassificationAccuracy
QDA Accuracy vs. Number of Components
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
28. QDA Optimal Model Summary Statistics
Table 9:Confusion Matrix (Columns:Predicted,Rows:Actual)
zero one two three four ļ¬ve six seven eight nine
zero 862 0 2 1 0 1 0 0 4 0
one 0 917 10 2 2 0 1 2 17 0
two 1 0 871 0 1 0 0 3 15 0
three 0 0 12 929 0 9 0 4 17 0
four 0 1 1 0 838 0 0 0 6 7
ļ¬ve 2 0 1 13 0 773 0 0 6 1
six 2 0 0 1 2 14 850 0 5 0
seven 3 4 15 3 3 3 0 874 11 15
eight 0 1 9 7 2 4 0 0 816 3
nine 1 0 5 12 5 1 0 9 9 800
Table 10:Overall Accuracy
Accuracy 0.9670105
AccuracyLower 0.9630690
AccuracyUpper 0.9706396
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
29. QDA Optimal Model Confusion Matrix Image
862 0 2 1 0 1 0 0 4 0
0 917 10 2 2 0 1 2 17 0
1 0 871 0 1 0 0 3 15 0
0 0 12 929 0 9 0 4 17 0
0 1 1 0 838 0 0 0 6 7
2 0 1 13 0 773 0 0 6 1
2 0 0 1 2 14 850 0 5 0
3 4 15 3 3 3 0 874 11 15
0 1 9 7 2 4 0 0 816 3
1 0 5 12 5 1 0 9 9 800
9.8% 0.0% 0.0% 0.0% 0.0%
10.4% 0.1% 0.0% 0.0% 0.0% 0.0% 0.2%
0.0% 9.9% 0.0% 0.0% 0.2%
0.1% 10.5% 0.1% 0.0% 0.2%
0.0% 0.0% 9.5% 0.1% 0.1%
0.0% 0.0% 0.1% 8.8% 0.1% 0.0%
0.0% 0.0% 0.0% 0.2% 9.6% 0.1%
0.0% 0.0% 0.2% 0.0% 0.0% 0.0% 9.9% 0.1% 0.2%
0.0% 0.1% 0.1% 0.0% 0.0% 9.3% 0.0%
0.0% 0.1% 0.1% 0.1% 0.0% 0.1% 0.1% 9.1%
nine
eight
seven
six
five
four
three
two
one
zero
zero one two three four five six seven eight nine
Predicted
Actual
0
20
40
60
80
Count
QDA Optimal Model
Confusion Matrix Image
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
30. QDA Optimal Model Bar Plot
0
250
500
750
1000
zero one two three four five six seven eight nine
Labels
Count
Labels
actual
predicted
QDA Optimal Model
Predicted vs. Actual Class Labels
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
32. QDA Summary Statistics on Manually Labeled Test Set
Table 11:Confusion Matrix (Columns:Predicted,Rows:Actual)
zero one two three four ļ¬ve six seven eight nine
zero 99 0 0 0 0 1 0 0 1 1
one 0 111 1 0 0 0 0 0 3 0
two 0 0 79 1 1 0 1 1 0 0
three 0 0 1 117 0 0 0 0 4 0
four 0 0 0 0 107 0 0 1 0 0
ļ¬ve 0 0 0 1 0 93 0 0 1 0
six 0 0 0 0 0 1 98 0 1 0
seven 1 0 1 0 0 0 0 98 1 0
eight 0 0 0 0 1 1 0 0 84 0
nine 0 0 0 1 0 0 0 0 1 86
Table 12:Overall Accuracy
Accuracy 0.9720000
AccuracyLower 0.9597851
AccuracyUpper 0.9813153
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
33. K-Nearest Neighbor
KNN Algorithm
1. Each predictor in the training set represents a dimension in some space.
2. The value that an observation has for each predictor is that values coordinates in
this space.
3. The similarity between points are based on a distance metric (e.g. Euclidean
Distance).
4. The class of an observation is predicted by taking the k-closest data points to
that observation, and assigning the observation to that class which it has most in
common with.
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
34. KNN Model Fitting and Parameter Tuning
0.80
0.85
0.90
0.95
1.00
1 2 3 4 5
Neighbors
Accuracy
Component
10
20
30
40
KNN Accuracy vs. Number of Components
and
Number of Neighbors
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
35. KNN: Number of Components vs. Accuracy
0.972
0.92
0.94
0.96
10 20 30 40
Number of Components
ClassificationAccuracy
KNN Classification Accuracy
vs
Number of Components
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
37. KNN Optimal Model Summary Statistics
Table 13:Confusion Matrix (Columns:Predicted,Rows:Actual)
zero one two three four ļ¬ve six seven eight nine
zero 868 0 0 0 0 0 2 0 0 0
one 0 945 1 0 0 0 0 2 2 1
two 1 0 879 0 0 0 1 8 2 0
three 0 0 6 949 0 7 0 4 4 1
four 0 3 0 0 835 0 1 1 0 13
ļ¬ve 2 1 0 4 0 781 7 0 0 1
six 1 0 0 0 1 1 871 0 0 0
seven 0 9 5 1 1 0 0 909 0 6
eight 0 3 1 2 4 6 2 1 822 1
nine 0 0 2 7 4 1 1 4 1 822
Table 14:Overall Accuracy
Accuracy 0.9841288
AccuracyLower 0.9812982
AccuracyUpper 0.9866327
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
38. KNN Optimal Model Confusion Matrix Image
868 0 0 0 0 0 2 0 0 0
0 945 1 0 0 0 0 2 2 1
1 0 879 0 0 0 1 8 2 0
0 0 6 949 0 7 0 4 4 1
0 3 0 0 835 0 1 1 0 13
2 1 0 4 0 781 7 0 0 1
1 0 0 0 1 1 871 0 0 0
0 9 5 1 1 0 0 909 0 6
0 3 1 2 4 6 2 1 822 1
0 0 2 7 4 1 1 4 1 822
9.8% 0.0%
10.7% 0.0% 0.0% 0.0% 0.0%
0.0% 10.0% 0.0% 0.1% 0.0%
0.1% 10.8% 0.1% 0.0% 0.0% 0.0%
0.0% 9.5% 0.0% 0.0% 0.1%
0.0% 0.0% 0.0% 8.9% 0.1% 0.0%
0.0% 0.0% 0.0% 9.9%
0.1% 0.1% 0.0% 0.0% 10.3% 0.1%
0.0% 0.0% 0.0% 0.0% 0.1% 0.0% 0.0% 9.3% 0.0%
0.0% 0.1% 0.0% 0.0% 0.0% 0.0% 0.0% 9.3%nine
eight
seven
six
five
four
three
two
one
zero
zero one two three four five six seven eight nine
Predicted
Actual
0
20
40
60
80
Count
KNN Optimal Model
Confusion Matrix Image
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
39. KNN Optimal Bar Plot
0
250
500
750
1000
zero one two three four five six seven eight nine
Labels
Count
Labels
actual
predicted
KNN Optimal Model
Predicted vs. Actual Class Labels
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
41. KNN Summary Statistics on Manually Labeled Test Set
Table 15:Confusion Matrix (Columns:Predicted,Rows:Actual)
zero one two three four ļ¬ve six seven eight nine
zero 101 0 0 0 0 0 0 0 0 1
one 0 115 0 0 0 0 0 0 0 0
two 0 0 81 0 1 0 0 1 0 0
three 0 0 2 116 0 1 0 1 2 0
four 0 0 0 0 105 0 0 0 0 3
ļ¬ve 0 0 0 0 0 95 0 0 0 0
six 0 1 0 0 0 2 97 0 0 0
seven 0 1 0 0 1 0 0 99 0 0
eight 0 1 0 0 0 1 0 0 82 2
nine 0 0 0 0 0 1 0 0 1 86
Table 16:Overall Accuracy
Accuracy 0.9770000
AccuracyLower 0.9656877
AccuracyUpper 0.9853654
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
42. Random Forest
āA random forest is a classiļ¬er consisting of a collection of tree-structured
classiļ¬ers {h(x, Īøk ), k = 1} where the {Īøk } are independent identically
distributed random vectors and each tree casts a unit vote for the most
popular class input x.ā [Breiman, 2001]
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
43. RF Model Fitting: Recursive Feature Selection
subsets - c(1:40,seq(45,100,5)) # vector of variable subsets
# for recursive feature selection
ptm - proc.time() # starting timer for code execution
ctrl - rfeControl(functions = rfFuncs, method = repeatedcv,
number = 3, verbose = FALSE,
returnResamp = all, allowParallel = FALSE)
rfProfile - rfe(x = training[,-1],
y = as.factor(as.character(training$label)),
sizes = subsets, rfeControl = ctrl)
rf_opt - rfProfile$optVariables
proc.time() - ptm
## user system elapsed
## 7426.48 64.87 7491.48
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
44. Random Forest: Accuracy vs. Number of Variables
0.4
0.6
0.8
1.0
0 25 50 75 100
Variables
Accuracy(RepeatedCrossāValidation)
Random Forest Recursive Feature Selection
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
45. Random Forest Optimal Model Summary Statistics
Table 17:Confusion Matrix (Columns:Predicted,Rows:Actual)
eight ļ¬ve four nine one seven six three two zero
eight 842 0 0 0 0 0 0 0 0 0
ļ¬ve 0 796 0 0 0 0 0 0 0 0
four 0 0 853 0 0 0 0 0 0 0
nine 0 0 0 842 0 0 0 0 0 0
one 0 0 0 0 951 0 0 0 0 0
seven 0 0 0 0 0 931 0 0 0 0
six 0 0 0 0 0 0 874 0 0 0
three 0 0 0 0 0 0 0 971 0 0
two 0 0 0 0 0 0 0 0 891 0
zero 0 0 0 0 0 0 0 0 0 870
Table 18:Overall Accuracy
Accuracy 1.0000000
AccuracyLower 0.9995819
AccuracyUpper 1.0000000
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
46. Random Forest Optimal: Confusion Matrix Image
842 0 0 0 0 0 0 0 0 0
0 796 0 0 0 0 0 0 0 0
0 0 853 0 0 0 0 0 0 0
0 0 0 842 0 0 0 0 0 0
0 0 0 0 951 0 0 0 0 0
0 0 0 0 0 931 0 0 0 0
0 0 0 0 0 0 874 0 0 0
0 0 0 0 0 0 0 971 0 0
0 0 0 0 0 0 0 0 891 0
0 0 0 0 0 0 0 0 0 870
9.5%
9.0%
9.7%
9.5%
10.8%
10.6%
9.9%
11.0%
10.1%
9.9%zero
two
three
six
seven
one
nine
four
five
eight
eight five four nine one seven six three two zero
Predicted
Actual
0
20
40
60
80
Count
Random Forest Optimal Model
Confusion Matrix Image
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
47. Random Forest Bar Plot
0
250
500
750
1000
eight five four nine one seven six three two zero
Labels
Count
Labels
actual
predicted
Random Forest
Actual vs. Predicted Class Labels
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
48. RF Summary Statistics on Manually Labeled Test Set
Table 19:Confusion Matrix (Columns:Predicted,Rows:Actual)
eight ļ¬ve four nine one seven six three two zero
eight 82 1 0 1 0 1 1 2 2 0
ļ¬ve 1 93 0 1 1 0 1 2 0 0
four 1 0 104 0 0 0 0 0 1 0
nine 0 0 1 84 0 0 0 1 0 0
one 2 0 0 0 114 0 0 0 0 0
seven 0 0 2 1 0 100 0 2 0 1
six 0 0 1 0 0 0 97 0 1 0
three 0 1 0 1 0 0 0 114 0 0
two 0 0 0 0 0 0 1 1 77 0
zero 0 0 0 0 0 0 0 0 2 101
Table 20:Overall Accuracy
Accuracy 0.9660000
AccuracyLower 0.9528106
AccuracyUpper 0.9763414
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
49. Conditional Inference Tree
General Recursive Partitioning Tree
1. Perform an exhaustive search over all possible splits
2. Maximize information measure of node impurity
3. Select covariate split that maximized this measure
CTREE
1. In each node the partial hypotheses Hj
o : D(Y |Xj ) = D(Y ) is tested against the
global null hypothesis of H0 =
m
j=1 Hj
0.
2. If the global hypothesis can be rejected then the association between Y and each
of the covariates Xj , j = 1..., m is measured by P-value.
3. If we are unable to reject H0 at the speciļ¬ed Ī± then recursion is stopped.
[Hothorn, 2006]
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
50. CTREE Model Fitting and Tuning
0.83
0.805
0.810
0.815
0.820
0.825
0.830
10 15 20 25 30
Number of Components
ClassificationAccuracy
CTREE Classification Accuracy
vs
Number of Components
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
54. CTREE Optimal Bar Plot
0
250
500
750
1000
zero one two three four five six seven eight nine
Labels
Count
Labels
actual
predicted
CTREE Optimal Model
Predicted vs. Actual Class Labels
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
55. CTREE Optimal Model Confusion Matrix on Manually Labeled Test Set
Table 23:Confusion Matrix (Columns:Predicted,Rows:Actual)
zero one two three four ļ¬ve six seven eight nine
zero 93 0 1 3 0 1 2 0 1 1
one 0 110 0 0 3 0 1 0 1 0
two 1 0 74 2 1 0 2 1 2 0
three 2 0 4 96 0 7 0 3 9 1
four 0 0 2 1 89 1 1 2 0 12
ļ¬ve 1 0 0 2 2 77 3 1 6 3
six 0 1 3 0 0 2 90 0 4 0
seven 0 0 2 1 4 2 0 90 0 2
eight 0 2 4 1 1 3 1 1 70 3
nine 0 0 1 1 11 1 0 1 3 70
Table 24:Overall Accuracy
Accuracy 0.8590000
AccuracyLower 0.8358734
AccuracyUpper 0.8799885
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
56. Multinomial Logistic Regression
Class Probabilities
Pr(Y = k|X = x) =
eĪ²0k +Ī²1k X1+...+Ī²pk Xp
K
l=1 eĪ²0l +Ī²1l X1+...+Ī²pl Xp
Logistic Regression Model generalized for problems containing more than two classes.
[James, 2013]
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
57. MLR Model Fitting and Tuning
0.80
0.82
0.84
0.86
0.88
20 40 60
Number of Components
ClassificationAccuracy
Multinomial Logistic Model:
Number of Components vs. Accuracy
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
58. MLR Optimal Model Summary Statistics
Table 25:Confusion Matrix (Columns:Predicted,Rows:Actual)
zero one two three four ļ¬ve six seven eight nine
zero 802 0 5 8 0 43 6 0 2 4
one 0 900 16 6 0 14 4 2 9 0
two 25 19 674 28 34 7 54 15 31 4
three 11 12 27 730 5 90 8 12 60 16
four 5 8 3 4 672 9 22 9 7 114
ļ¬ve 27 19 9 68 14 585 14 15 31 14
six 16 20 29 7 12 31 748 3 6 2
seven 8 17 22 8 10 14 0 775 12 65
eight 6 31 39 68 6 48 6 5 608 25
nine 14 8 7 15 142 16 1 71 17 551
Table 26:Overall Accuracy
Accuracy 0.7986623
AccuracyLower 0.7901393
AccuracyUpper 0.8069875
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
60. MLR Optimal Bar Plot
0
250
500
750
1000
zero one two three four five six seven eight nine
Labels
Count
Labels
actual
predicted
Multinomial Logistic Optimal Model
Predicted vs. Actual Class Labels
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
61. MLR Optimal Model Confusion Matrix on Manually Labeled Test Set
Table 27:Confusion Matrix (Columns:Predicted,Rows:Actual)
zero one two three four ļ¬ve six seven eight nine
zero 93 0 0 0 1 4 3 1 0 0
one 0 109 2 0 0 1 1 1 1 0
two 1 1 74 3 2 0 1 1 0 0
three 1 0 0 108 0 4 1 3 0 5
four 0 0 0 0 104 0 0 1 0 3
ļ¬ve 2 1 0 3 4 81 1 0 2 1
six 0 0 1 0 0 1 97 1 0 0
seven 0 0 2 0 3 0 0 88 1 7
eight 0 1 0 2 2 11 0 0 62 8
nine 0 0 0 1 8 1 0 1 0 77
Table 28:Overall Accuracy
Accuracy 0.8930000
AccuracyLower 0.8721714
AccuracyUpper 0.9114796
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
72. Ensemble Predictions:
Goal: Develop a method though which the class accuracy of each āoptimizedā
model can be employed in making class predictions.
Condition 1: Majority vote wins.
Condition 2: If each model predicts a diļ¬erent class label, go with the prediction
from the model that has the maximum accuracy for that class prediction.
Condition 3: If there is a two-way tie or split-vote then go with that class label
that has the maximum mean accuracy among all models for that class.
Condition 4: If there is a three-way tie then go with that class label that has the
maximum mean accuracy among all models for that class.
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
74. Conclusion
1. KNN was the best performing model with a classiļ¬cation accuracy of 0.978.
2. Examine eļ¬ectiveness of Support Vector Machine classiļ¬ers, as well as Neural
Network models.
3. Also, may wish to examine the eļ¬ectiveness of employing a hierarchical clustering
technique for dimension reduction and compare results with principle component
analysis.
4. Continue to explore ensemble prediction method, with a variety of logic rules.
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
75. Parallel Processing
50
100
25 50 75 100
Number of Components
TimeElasped(seconds)
group
LDA
LDA 2 cores
QDA
QDA 2 cores
LDA and QDA
Parallel vs Nonāparallel Processing
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis
77. References
Breiman, L. (2001). āRandom forests.ā Machine learning 45(1): 5-32.
Hothorn, T., et al. (2006). āUnbiased recursive partitioning: A conditional inference
framework.ā Journal of Computational and Graphical statistics 15(3): 651-674.
James, G., et al. (2013). An introduction to statistical learning, Springer.
Kuhn, M. and K. Johnson (2013). Applied predictive Modeling, Springer.
Zachary Combs, Philip Remmele, M.S. Data Science Candidates Kaggle Digits Analysis