Csrde discriminant analysis final

Using Discriminant Analysis
to Identify Students for a Corequisite
College Algebra Course
Presented by:
Dr. David G. Underwood
and
Dr. Susan J. Underwood

Arkansas Math Remediation Rate - Fall 2012 *
• Arkansas set ACT Math at <19 for remediation
• 38.4% for all public colleges (two and four-year)
• 25.5% for the public, four-year institutions
• 38.3% for the public, four-year institutions **
*Reported by the Arkansas Department of Higher
Education (ADHE)
**Adjusted with “selective” schools (University of
Arkansas and Arkansas State University) removed

Context – Arkansas Tech University
First Generation College Students – Almost 60%
Some Type Financial Aid – 95.4%
Pell Grant Eligible – 61%
Graduation Rate – 40.2%
Math Remediation Rate Fall 2012 - 40%

ATU’s Approach to Math Remediation
Two-step process
• MATH 0802 (Beginning Algebra) for students
scoring 16 or below on Math ACT
• MATH 0903 (Intermediate Algebra) for students
earning 17 or 18 on Math ACT
Very Traditional Approach
• Students attend a mathematics class
• Teachers provide lecture and homework
• Students take traditional exams throughout
semester

Complete College America Challenge
Identify students scheduled for remediation
who could potentially be successful in college
algebra during their first semester if necessary
skills are provided in a corequisite.

Primary Question
Can a statistical model be developed, using
variables that most public 4-year institutions in
Arkansas will have in their database, that will
identify students scoring less than 19 on the ACT
Math section who are most likely to be successful
in College Algebra if additional assistance is
provided, at better than chance selection
accuracy?

Stipulations:
Data must be “ambient” – data that are likely to be
readily available to any state institution in
Arkansas.
The statistical methodology should be something
within the ability of most campuses to perform.
It should be relatively easy to interpret.

Discriminant Analysis
DA is used to classify cases into groups and to
decide how to assign new cases to groups.
The interpretation is similar to multiple regression.
The Canonical Correlation can be squared and
interpreted similarly to R2 such that squaring it
indicates the amount of variance accounted for by
the model. The Canonical Discriminant Function
Coefficients may be interpreted similarly to beta
weights in multiple regression and so forth.

Data used were from the fall 2012 student body
and included all students who were taking
remedial mathematics for the first time during
the fall 2012 semester. Success is defined as
completing enough modules to enter college
algebra with a grade equivalent to an “A”, “B”, or
“C”.

Those classified as unsuccessful received a grade
lower than “C”, or a “W”.
The grade of “W” was included for two reasons
1) those students did not successfully complete
the class, and 2) although Analysis of Variance
showed four significant differences between
students who received a grade of “W” and those
who received a failing grade on the variables in
the analysis, in all cases the mean was lower for
students receiving a “W” than those with an “F”.

Remedial Math 1.00=W .00=F
GradeW N Mean Sig
HS_GPA 1.00 163 2.5847 .000
.00 609 2.8682
HS_CLASS_RANK_
PERCENTILE
1.00 156 35.93 .001
.00
563 47.92
ACT_COMP 1.00 150 18.22
.00
542 18.56
ACT_MATH 1.00 150 17.05
.00 542 17.08

ACT_READ
GradeW N Mean Sig
1.00
.00
150
542
19.78
20.08
ACT_SCI 1.00 150 19.14
.00 542 19.53
HS_CLASS_SIZE 1.00 156 162.01 .045
.00 562 179.98
HS_CLASS_RANK 1.00
157 123.24 .036
.00 564 97.54

The total number used in the analysis was 640.
The groups were almost evenly split with 318 in
the “unsuccessful” group and 322 in the
“successful” group.

Variables Included In Analysis
ACT Composite Score
ACT Math Score
ACT Science Score
ACT Reading Score
High School Grade Point Average
High School Class Rank
High School Class Size
High School Class Rank as a Percentile Score

Variables Found to Be Significant Predictors
ACT Comp Score
ACT Math Score
ACT Science Score
High School Grade Point Average
High School Class Rank
High School Class Rank as a Percentile

Decision to Use Stepwise
1) The original analysis using all variables was
found to violate the assumption of equality of
covariance matrices, although large group sizes
decrease the importance of the assumption being
met.

2) several of the variables included in the full
model, i.e., Class Rank and Class Rank as a
Percentile, and ACT Math Score, ACT Science
Score and ACT Comp Score, etc., could be highly
correlated and therefore responsible for the
violation of the assumption of equality of
covariance matrices due to multicollinearity.

3) the stepwise function is designed to find the
best set of predictors from among a larger
number and use only those contributing a
significant amount of unique variance to the
model.
The stepwise procedure was used with an F to
enter of .05 and an F to remove of .1 to identify
only those variables adding a significant amount
of explained variance to the model.

The stepwise method identified three significant
predictors accounting for 22.8% of the explained
variance. Box’s M was found to be insignificant,
indicating the assumption of equality of
covariance matrices was met.
The significant predictors identified from the
Structure Matrix were High School Grade Point
Average (.964), ACT Math Score (.239) and ACT
Reading Score (.109).

Based on the Canonical Discriminant Function
Coefficients, the discriminant function, used to
compute a discriminant score, can be stated as:
D = (.223*ACT_Math) +(.-.056*ACT_Reading)
+(2.534*HSGPA) -9.856

The discriminant score is important because, although the
algorithm for computing the score was developed using this
group of students, it is also the “best guess” for classifying
students who might take this course in the future. The higher
the discriminant score, in a positive direction, the more likely
the student is to be successful. Conversely, the lower the
discriminant score, in a negative direction, the less likely a
student is to be successful. By knowing the likelihood of success
or failure in advance, and the numbers of students in each
category, one could decide which students are most likely to
benefit from a corequisite, or, whether to suggest additional
services such as tutoring, study groups, etc. to help with
successful completion.

The model exceeds the commonly accepted level of
providing at least a 25% improvement over chance
assignment. Summing the squared prior
probabilities provides a prior chance probability of
50%. Multiplying 50% by 1.25 provides a figure of
62.5%. An acceptable model should be equal to
or greater than 62.5%. The cross validated
classification model of 71.6% is above the
commonly accepted threshold.

In this instance the Discriminant scores distribute
themselves as an approximately standard normal
distribution with a mean of 0 and a standard
deviation of 1.

With this data, if a score of +1.5 or greater is
selected, the model identifies 76 students. Of
those, 70 were actually successful for a
classification accuracy of 92.1%.

A similar analysis was conducted for students
scoring 19 or above and entering directly into
college algebra.
The total number used in the analysis was 1,874.
The groups were unevenly split with 608 in the
“unsuccessful” group and 1266 in the “successful”
group.

The same 8 variables were allowed to enter the
model and 7 were found to be significant
predictors when the full model was used.
The stepwise method identified only 2 significant
predictors accounting for 26.6% of the explained
variance. Box’s M was found to be insignificant,
indicating the assumption of equality of
covariance matrices was met.

The significant predictors identified from the
Structure Matrix were High School Grade Point
Average (.986), and High School Class Size (.034).
Based on the Canonical Discriminant Function
Coefficients, the discriminant function, used to
compute a discriminant score, can be stated as:
D = (2.85*HSGPA) + (.001*HS_Class Size) – 8.042

The model exceeds the commonly accepted level
of providing at least a 25% improvement over
chance assignment. Summing the squared prior
probabilities provides a prior chance probability of
56.2%. Multiplying 56.2% by 1.25 provides a figure
of 70.25%. An acceptable model should be
equal to or greater than 70.25%. The cross
validated classification model of 76.5% is above
the commonly accepted threshold.

In this case, the rationale would be to select those
students with negative Discriminant Scores…Those
least likely to be successful if some type of
intervention is not applied.
The same method (using the discriminant score)
can be used to determine the number of students
to be selected as in the case with the remedial
students.

Conclusions
• Discriminant Analysis can be used to identify
students who are most likely to be successful or
unsuccessful depending on which students one
needs to identify.
• The classification is better than chance accuracy.
• The Discriminant Score can be used to determine
how many students will be selected.
• Predictive variables of students with “W”
grades may be worse than with “F” grades.

Reference
• Burns, R., & Burns, R. (2008). Business
Research Methods and Statistics using
SPSS. London: Sage Publications Ltd.
• Burns, R. & Burns, R. (2008). Chapter 25:
Discriminant Analysis (WWW page). URL
http://www.uk.sagepub.com/burns/website%
20material/Chapter%2025%20-
%20Discriminant%20Analysis.pdf

Csrde discriminant analysis final

Csrde discriminant analysis final

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Csrde discriminant analysis final

Similaire à Csrde discriminant analysis final (20)

Dernier

Dernier (20)

Csrde discriminant analysis final