2. Machine Learning
“Field of study that gives computers the ability to
learn without being explicitly programmed.”
(Arthur Samuel, 1959)
“A computer program is said to learn from
experience E with respect to some task T and some
performance measure P, if its performance on T, as
measured by P, improves with experience E.”
( Tom Mitchel (1997), Machine Learning, McGraw Hill | Web Page
http://www.cs.cmu.edu/~tom/ )
3. Two Main Types of Algorithms
Supervised learning:
• What we are commonly used to in educational research
• We know the data and outputs
• We have an idea of the kids of analysis we plan to run (e.g., Linear
Regression)
Unsupervised learning:
• Used less often in educational research
• We try to find a hidden structure to data that may not be labeled
• We have more of an intuition of what we are trying to find (e.g. K-Means
Cluster)
4. My Interest in Machine Learning
Q: Can we begin to build software
programs that learn who we are and can
then provide individual learning supports
through the use of assessment and
feedback?
5. Markov Models
“The future is independent of the past,
given the present.” (translation Andrey
Markov, 1856-1922)
Limitation – Only takes into account current
state and the most recent prior state
6. Hidden Markov Models
- A method(s) of finding a hidden (latent)
structure with a sequential data set
Ghahramani, Z.(2001) An introduction to hidden Markov models and
Bayesian networks. International Journal of Pattern Recognition and
Artificial Intelligence, 15(1): 9-42.
7. Piaget Developmental Data
• Visser & SpeekenBrink (2010). depmixS4: An R package for hidden Markov Models.
International Journal of Statistical Software, 36(7).
http://dare.uva.nl/document/361939
• Data from: Jansen, B.R.J., & van der Maas, H.L.J. (2002). The development of children’s
rule use on the balance scale task. Journal of Experimental Child Psychology, 81(4), 383–
416.
• Siegler, R.S. (1981). Developmental sequences within and between concepts. Number 46
in Monographs of the Society for Research in Child Development. SRCD.
8. depmixS4 – Balance Data
> data(balance)
- 779 participants
- Ages from 5-19 years
- 4 distance items
9. CAPP - OSCE
• Clinical Assessment for Practice Program
(CAPP)
• A program of the College of Physicians and
Surgeons of Nova Scotia (CPSNS)
• Objective Structured Clinical Exam (OSCE)
• Multiple stations with sequences &
competencies
10. CAPP OSCE Dataset
• 434 observations
• 31 participants
• 14 stations
• 9 measures of competency (Coded P/F)
• 13 different case IDs
11. My Learning
• I conducted the balance data analysis
first
• Then I began to examine the OSCE data
• The next slides compare the two as
preliminary analysis
12. Balance
• “Used Age as a
covariate on class
membership”
• 3 State Model best
• Converged in 77
iterations
• loglink = -917.50
• AIC = 1867
• BIC = 1942
OSCE
• Used CASE ID as a
covariate on class
membership
• 2 state model best
• Converged in 55
iterations
• loglink = -1757.81
• AIC = 3555
• BIC = 3637
13. Balance
• Probabilities at
zero values of the
covariates
• 0.001, 0.988,
0.009
OSCE
• Probabilities at
zero values of the
covariates
• 0.606, 0.394
14. Balance
OSCE2 4 6 8 10 12
0.00.20.40.60.81.0
Prior probabilities by CASE ID, OSCE scale data
CASEID
Pr
1 2 3 4 5 6 7 8 9 10 11 12 13
6 8 10 12 14 16 18
0.00.20.40.60.81.0
Prior probabilities by age, balance scale data
age
Pr
Class 1 (correct)
Class 2 (incorrect)
Class 3 (guess)
15. Balance
OSCE2 4 6 8 10 12
0.00.20.40.60.81.0
Prior probabilities by CASE ID, OSCE scale data
CASEID
Pr
1 2 3 4 5 6 7 8 9 10 11 12 13
6 8 10 12 14 16 18
0.00.20.40.60.81.0
Prior probabilities by age, balance scale data
age
Pr
Class 1 (correct)
Class 2 (incorrect)
Class 3 (guess)
16. Balance
OSCE2 4 6 8 10 12
0.00.20.40.60.81.0
Prior probabilities by CASE ID, OSCE scale data
CASEID
Pr
1 2 3 4 5 6 7 8 9 10 11 12 13
6 8 10 12 14 16 18
0.00.20.40.60.81.0
Prior probabilities by age, balance scale data
age
Pr
Class 1 (correct)
Class 2 (incorrect)
Class 3 (guess)
17. What's Next
• OSCE data did not fit as well as the
Balance data – More Years may help
• Learning HMM further and potential
application to performance assessments
• Experiment with different covariates in
the datasets
18. Acknowledgements
• Acknowledge the use code by Visser &
SpeekenBrink (2010) in depmixS4 package
• Thank you to CAPP & Bruce Holmes for
the use of OSCE data
Akaike information criterion
Bayesian information criterion
Akaike information criterion
Bayesian information criterion
Balance data
There are three classes identified, at age 5 90% of children apply the wrong rule
***note that different children at each age category so its not the same kids and tracking them
OSCE data
There are two classes in the 1 station, they are both at relatively the same point – As the participants progress from one station to the next it becomes clear that there are two groups. However there seems to be differences in the skills that are being emphasized at each of the stations. After station 7 tit becomes clear that there are two groups some that do better and other that are not doing as well in each of the stations.
Balance data
There are three classes identified, at age 5 90% of children apply the wrong rule
***note that different children at each age category so its not the same kids and tracking them
OSCE data
There are two classes in the 1 station, they are both at relatively the same point – As the participants progress from one station to the next it becomes clear that there are two groups. However there seems to be differences in the skills that are being emphasized at each of the stations. After station 7 tit becomes clear that there are two groups some that do better and other that are not doing as well in each of the stations.