SlideShare une entreprise Scribd logo
1  sur  70
Core Methods in
Educational Data Mining
EDUC6191
Fall 2022
Questions about Basic HW 1?
Reminders
• You don’t have to do it perfectly, you just have
to do it
• You will NOT be penalized for using hints
(appropriately)
• If you run into trouble, feel free to email me
or, better yet, use the discussion forum
Let’s get warmed up
What is big data?
Some definitions
• “Big data” is data big enough that traditional
statistical significance testing becomes useless
• “Big data” is data too big to input into a
traditional relational database
• “Big data” is data too big to work with on a
single machine
A moving target
• 2004: I reported a data set with 31,450 data
points. People were impressed.
• 2014: A reviewer in an education journal
criticized me for referring to 817,485 data
points as “big data”.
Not just big but open
• More and more educational data sets can be
accessed by the public
• The EDM Society even runs an annual best
open data set competition
• Very different than how things used to be
Tycho Brahe
• Spent 24 years observing the sky from a
custom-built castle on the island of Hven
Johannes Kepler
• Had to take a job with Brahe to get Brahe’s
data
Johannes Kepler
• Had to take a job with Brahe to get Brahe’s
data
• Only got unrestricted access to data…
Johannes Kepler
• Had to take a job with Brahe to get Brahe’s
data
• Only got unrestricted access to data…
when Brahe died
Johannes Kepler
• Had to take a job with Brahe to get Brahe’s
data
• Only got unrestricted access to data…
when Brahe died
• and Kepler stole the data and
fled to Germany
What are the types of EDM method?
• According to Baker (any version)
• Top-level first
What type of method are each of these?
(According to Baker)
• Classification
• Regression
• Correlation Mining
• Factor Analysis
• Domain Structure Discovery
• Network Analysis
• Clustering
• Association rule mining
• Sequential pattern mining
• Latent Knowledge Estimation
• Causal data mining
Questions? Comments?
Today
• Prediction Modeling
• Classic Categories
– Classification
– Regression
– Density Estimation
What is a classifier?
What is a regressor?
Density Estimator
• Predicts a probability density function
• Not used much in education
• Used more in other domains
But things have moved really fast
recently…
• From predicting single values, to predicting
sequences, to predicting n-dimensional
matrices
Transformer/Foundation Models
• Who here can tell us what a transformer
model/foundation model is?
Transformer/Foundation Models
• Some examples
BERT, MathBERT, GPT-2, GPT-3, DALL-E 2,
StableDiffusion, …, …
Transformer/Foundation Models
• Can predict
– Words
– Sentences
– Pixels
– Computer program text
– Mathematical equations
– Anything?
Transformer/Foundation Models
• Can predict
– Words
– Sentences
– Pixels
– Computer program text
– Mathematical equations
– Anything?
• And, in a sudden light-switch transformation,
prediction becomes generation
GPT-3 Chatbot
(courtesy of Kelsey Piper)
• If I injured my hand and I couldn’t
afford to go to the doctor, would it
make sense for me to replace it with a
new hand I got at the grocery store?
• GPT-3: No, that would not make sense. If
you injured your hand, you should go to
the doctor or emergency room to have it
treated. You should not try to replace it
with a new hand from the grocery store.
AI Dungeon (GPT-2)
• You are a friendly blanket. You live in a nice warm house with some nice humans.
You like cuddling and keeping people warm. But you don’t want to be a "blanket
baby" anymore, so you have decided that it is time for you to leave this place.
…
> You look for a job as a blanket.
You find a job searching for blankets in the classifieds. You start reading the ads.
Most of them are for low paying menial jobs, but you find three that look interesting.
> You read the best ad.
You find the best job listed: WANTED: One blanket for sleeping on the cold ground
during the night. Must be able to fold yourself into one-third of the blanket's volume.
> You call the job ad.
…
A: The number you have called goes through to the "Blanket Museum“ where a
robotic voice informs you that the number you have called is not in their phone book.
AI Dungeon (GPT-2)
• …
> You go inside museum.
You walk into the museum and see a robotic voice greet you. "Hello,
and welcome to the Blanket Museum!" it says.
> You say "I want a job" A: You are asked some questions and then fitted for a
security uniform, just like in the movies. You are given a tour of the museum and the
exhibits.
> You put on security uniform.
> You fold yourself. You find that you can easily turn the security uniform into a small
blanket. You spend the rest of your life working at the museum. Every day you spend
your time looking after the exhibits and cleaning the building.
YOU HAVE DIED
AI Dungeon Harry Potter story
DALL-E 2
• "Teddy bears
working on new
AI research
underwater with
1990s
technology"
DALL-E 2
Still not perfect
(Randall Munro)
Questions? Comments?
The Elephant in the Room
• Why not just replace this class with a class about
foundation models?
• When they succeed, they succeed spectacularly
• But…
– They can’t do everything (actually, they don’t do most
any of the things this class will cover)
– Unless you’re using an existing tool, applying them to
a specific problem requires programming outside the
scope of this class (for now)
– When they fail, they fail spectacularly (as our
examples show)
We will discuss
these models…
• On December 1, when we discuss text mining
Questions? Comments?
Let’s look at a very simple regressor
• Numhints = 0.12*Pknow + 0.932*Time –
0.11*Totalactions
Skill pknow time totalactions numhints
COMPUTESLOPE 0.2 7 3 ?
Which of the variables has the largest
impact on numhints?
(Assume they are scaled the same)
However…
• These variables are unlikely to be scaled the
same!
• If Pknow is a probability
– From 0 to 1
• And time is a number of seconds to respond
– From 0 to infinity
• Then you can’t interpret the weights in a
straightforward fashion
• What could you do?
Let’s do another example
• Numhints = 0.12*Pknow + 0.932*Time –
0.11*Totalactions
Skill pknow time totalactions numhints
COMPUTESLOPE 0.2 2 35 ?
Is this plausible?
What might you want to do if you got
this result in a real system?
Interpreting Regression Models
• Let’s quickly review the example from the
video
Example of Caveat
• Let’s graph the relationship between number
of graduate students and number of papers
per year
Data
0
2
4
6
8
10
12
14
16
0 2 4 6 8 10 12 14 16
Papers
per
year
Number of graduate students
Model
• Number of papers =
4 +
2 * # of grad students
- 0.1 * (# of grad students)2
• But does that actually mean that
(# of grad students)2 is associated with less
publication?
• No!
Example of Caveat
• (# of grad students)2 is actually
positively correlated with
publications!
– r=0.46
0
2
4
6
8
10
12
14
16
0 2 4 6 8 10 12 14 16
Papers
per
year
Number of graduate students
Example of Caveat
• The relationship is only in the
negative direction when the
number of graduate students is
already in the model…
0
2
4
6
8
10
12
14
16
0 2 4 6 8 10 12 14 16
Papers
per
year
Number of graduate students
How would you deal with this?
• How can we interpret individual features in a
comprehensive model?
The videos discussed a range of
algorithms
• Linear Regression
• Regression Trees
• Logistic Regression (a classifier!)
• Decision Trees
• Decision Rules
• Instance-Based Classifiers
• Support Vector Machines
• Random Forest
• Neural Networks/Recurrent Neural Networks
– What Transformer/Foundation Models are built on
Questions or comments
about any of these?
Has anyone
• Used any classification algorithms outside the
set discussed/recommended in the videos?
• Say more?
A practical question
Should you
• Pick one algorithm that seems really
appropriate?
• Run every algorithm that will actually run for
your data?
• Something in between?
My typical lab practice
• Pick a small number of algorithms that
– Have worked on past similar problems
– Fit different kinds of patterns from each other
Is it really the algorithm?
• Or is it the data you put into it?
• We’ll come back to this in the Feature
Engineering lecture at the end of the month
Questions? Comments?
Coleman article
• Any questions or comments?
Did anyone read Hand article?
• Thoughts?
What is Hand’s main thesis?
What is Hand’s main thesis?
“A great many tools have been developed for supervised
classification, ranging from early methods such as linear
discriminant analysis through to modern developments such
as neural networks and support vector machines. A large
number of comparative studies have been conducted in
attempts to establish the relative superiority of these
methods. This paper argues that these comparisons often fail
to take into account important aspects of real problems, so
that the apparent superiority of more sophisticated methods
may be something of an illusion. In particular, simple methods
typically yield performance almost as good as more
sophisticated methods, to the extent that the difference in
performance may be swamped by other sources of
uncertainty that generally are not considered in the classical
supervised classification paradigm.”
I used to ask the class
if they agree with Hand…
I used to ask the class
if they agree with Hand…
• But
But we can still learn a lot from Hand
• Hand notes that for small data sets, simpler
algorithms often work just as well or better
• And in education, we often have small data
sets
• We haven’t yet figured out how to leverage
web-scale data for a lot of problems in
education
The problem of generalizability
(Hand)
• Data points trained on are not usually drawn
from the same distribution
• As the data points where the classifier will be
applied
• Can anyone think of examples of this in
education?
Another of Hand’s key arguments
• Data points trained on are often treated as
certainly true and objective
• But they are often arbitrary and uncertain
Another of Hand’s key arguments
• Data points trained on are often treated as
certainly true and objective
• But they are often arbitrary and uncertain
• Where might this apply in education?
We will discuss these issues further
• Over the next two weeks
• As we discuss topics like
– Over-fitting and generalizability (next week)
– Algorithmic bias (next week)
– Diagnostic metrics (in two weeks)
Any other questions or comments?
Next classes
• September 15
– Behavior and Affect Detection
– Basic: Classifier Due
• September 22
– Diagnostic Metrics
– Creative: Behavior Detection Due
• September 29 VIRTUAL
– Feature Engineering and Distillation
– Basic: Diagnostic Metrics Due
The End

Contenu connexe

Similaire à Core Methods In Educational Data Mining

Machine Learning Summary for Caltech2
Machine Learning Summary for Caltech2Machine Learning Summary for Caltech2
Machine Learning Summary for Caltech2
Lukas Mandrake
 
Introduction-To-AI(L-1).pdf
Introduction-To-AI(L-1).pdfIntroduction-To-AI(L-1).pdf
Introduction-To-AI(L-1).pdf
ssuser8324dd
 

Similaire à Core Methods In Educational Data Mining (20)

Machine Learning for Non-technical People
Machine Learning for Non-technical PeopleMachine Learning for Non-technical People
Machine Learning for Non-technical People
 
Deep learning with tensorflow
Deep learning with tensorflowDeep learning with tensorflow
Deep learning with tensorflow
 
Machine Learning Summary for Caltech2
Machine Learning Summary for Caltech2Machine Learning Summary for Caltech2
Machine Learning Summary for Caltech2
 
The zen of predictive modelling
The zen of predictive modellingThe zen of predictive modelling
The zen of predictive modelling
 
Clare Corthell: Learning Data Science Online
Clare Corthell: Learning Data Science OnlineClare Corthell: Learning Data Science Online
Clare Corthell: Learning Data Science Online
 
James Coplien - Trygve - October 17, 2016
James Coplien - Trygve - October 17, 2016James Coplien - Trygve - October 17, 2016
James Coplien - Trygve - October 17, 2016
 
HCI-Lecture-1
HCI-Lecture-1HCI-Lecture-1
HCI-Lecture-1
 
Artificail Intelligent lec-1
Artificail Intelligent lec-1Artificail Intelligent lec-1
Artificail Intelligent lec-1
 
Discussant EARLI sig 27
Discussant EARLI sig 27Discussant EARLI sig 27
Discussant EARLI sig 27
 
POWRR Tools: Lessons learned from an IMLS National Leadership Grant
POWRR Tools: Lessons learned from an IMLS National Leadership GrantPOWRR Tools: Lessons learned from an IMLS National Leadership Grant
POWRR Tools: Lessons learned from an IMLS National Leadership Grant
 
Agile Analysis 101: Agile Stats v Command & Control Maths
Agile Analysis 101: Agile Stats v Command & Control MathsAgile Analysis 101: Agile Stats v Command & Control Maths
Agile Analysis 101: Agile Stats v Command & Control Maths
 
Binary crosswords
Binary crosswordsBinary crosswords
Binary crosswords
 
Introduction-To-AI(L-1).pdf
Introduction-To-AI(L-1).pdfIntroduction-To-AI(L-1).pdf
Introduction-To-AI(L-1).pdf
 
Understanding Deep Learning Requires Rethinking Generalization
Understanding Deep Learning Requires Rethinking GeneralizationUnderstanding Deep Learning Requires Rethinking Generalization
Understanding Deep Learning Requires Rethinking Generalization
 
tensorflow.pptx
tensorflow.pptxtensorflow.pptx
tensorflow.pptx
 
sicsa-phd2016
sicsa-phd2016sicsa-phd2016
sicsa-phd2016
 
Artificial Intelligence by B. Ravikumar
Artificial Intelligence by B. RavikumarArtificial Intelligence by B. Ravikumar
Artificial Intelligence by B. Ravikumar
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
lec1.ppt
lec1.pptlec1.ppt
lec1.ppt
 
"Startups, comment gérer une équipe de développeurs" par Laurent Cerveau
"Startups, comment gérer une équipe de développeurs" par Laurent Cerveau"Startups, comment gérer une équipe de développeurs" par Laurent Cerveau
"Startups, comment gérer une équipe de développeurs" par Laurent Cerveau
 

Dernier

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 

Core Methods In Educational Data Mining

  • 1. Core Methods in Educational Data Mining EDUC6191 Fall 2022
  • 3. Reminders • You don’t have to do it perfectly, you just have to do it • You will NOT be penalized for using hints (appropriately) • If you run into trouble, feel free to email me or, better yet, use the discussion forum
  • 5. What is big data?
  • 6. Some definitions • “Big data” is data big enough that traditional statistical significance testing becomes useless • “Big data” is data too big to input into a traditional relational database • “Big data” is data too big to work with on a single machine
  • 7. A moving target • 2004: I reported a data set with 31,450 data points. People were impressed. • 2014: A reviewer in an education journal criticized me for referring to 817,485 data points as “big data”.
  • 8. Not just big but open • More and more educational data sets can be accessed by the public • The EDM Society even runs an annual best open data set competition • Very different than how things used to be
  • 9. Tycho Brahe • Spent 24 years observing the sky from a custom-built castle on the island of Hven
  • 10. Johannes Kepler • Had to take a job with Brahe to get Brahe’s data
  • 11. Johannes Kepler • Had to take a job with Brahe to get Brahe’s data • Only got unrestricted access to data…
  • 12. Johannes Kepler • Had to take a job with Brahe to get Brahe’s data • Only got unrestricted access to data… when Brahe died
  • 13. Johannes Kepler • Had to take a job with Brahe to get Brahe’s data • Only got unrestricted access to data… when Brahe died • and Kepler stole the data and fled to Germany
  • 14. What are the types of EDM method? • According to Baker (any version) • Top-level first
  • 15. What type of method are each of these? (According to Baker) • Classification • Regression • Correlation Mining • Factor Analysis • Domain Structure Discovery • Network Analysis • Clustering • Association rule mining • Sequential pattern mining • Latent Knowledge Estimation • Causal data mining
  • 17. Today • Prediction Modeling • Classic Categories – Classification – Regression – Density Estimation
  • 18. What is a classifier?
  • 19. What is a regressor?
  • 20. Density Estimator • Predicts a probability density function • Not used much in education • Used more in other domains
  • 21. But things have moved really fast recently… • From predicting single values, to predicting sequences, to predicting n-dimensional matrices
  • 22. Transformer/Foundation Models • Who here can tell us what a transformer model/foundation model is?
  • 23. Transformer/Foundation Models • Some examples BERT, MathBERT, GPT-2, GPT-3, DALL-E 2, StableDiffusion, …, …
  • 24. Transformer/Foundation Models • Can predict – Words – Sentences – Pixels – Computer program text – Mathematical equations – Anything?
  • 25. Transformer/Foundation Models • Can predict – Words – Sentences – Pixels – Computer program text – Mathematical equations – Anything? • And, in a sudden light-switch transformation, prediction becomes generation
  • 26. GPT-3 Chatbot (courtesy of Kelsey Piper) • If I injured my hand and I couldn’t afford to go to the doctor, would it make sense for me to replace it with a new hand I got at the grocery store? • GPT-3: No, that would not make sense. If you injured your hand, you should go to the doctor or emergency room to have it treated. You should not try to replace it with a new hand from the grocery store.
  • 27. AI Dungeon (GPT-2) • You are a friendly blanket. You live in a nice warm house with some nice humans. You like cuddling and keeping people warm. But you don’t want to be a "blanket baby" anymore, so you have decided that it is time for you to leave this place. … > You look for a job as a blanket. You find a job searching for blankets in the classifieds. You start reading the ads. Most of them are for low paying menial jobs, but you find three that look interesting. > You read the best ad. You find the best job listed: WANTED: One blanket for sleeping on the cold ground during the night. Must be able to fold yourself into one-third of the blanket's volume. > You call the job ad. … A: The number you have called goes through to the "Blanket Museum“ where a robotic voice informs you that the number you have called is not in their phone book.
  • 28. AI Dungeon (GPT-2) • … > You go inside museum. You walk into the museum and see a robotic voice greet you. "Hello, and welcome to the Blanket Museum!" it says. > You say "I want a job" A: You are asked some questions and then fitted for a security uniform, just like in the movies. You are given a tour of the museum and the exhibits. > You put on security uniform. > You fold yourself. You find that you can easily turn the security uniform into a small blanket. You spend the rest of your life working at the museum. Every day you spend your time looking after the exhibits and cleaning the building. YOU HAVE DIED
  • 29. AI Dungeon Harry Potter story
  • 30. DALL-E 2 • "Teddy bears working on new AI research underwater with 1990s technology"
  • 31. DALL-E 2 Still not perfect (Randall Munro)
  • 33. The Elephant in the Room • Why not just replace this class with a class about foundation models? • When they succeed, they succeed spectacularly • But… – They can’t do everything (actually, they don’t do most any of the things this class will cover) – Unless you’re using an existing tool, applying them to a specific problem requires programming outside the scope of this class (for now) – When they fail, they fail spectacularly (as our examples show)
  • 34. We will discuss these models… • On December 1, when we discuss text mining
  • 36. Let’s look at a very simple regressor • Numhints = 0.12*Pknow + 0.932*Time – 0.11*Totalactions Skill pknow time totalactions numhints COMPUTESLOPE 0.2 7 3 ?
  • 37. Which of the variables has the largest impact on numhints? (Assume they are scaled the same)
  • 38. However… • These variables are unlikely to be scaled the same! • If Pknow is a probability – From 0 to 1 • And time is a number of seconds to respond – From 0 to infinity • Then you can’t interpret the weights in a straightforward fashion • What could you do?
  • 39. Let’s do another example • Numhints = 0.12*Pknow + 0.932*Time – 0.11*Totalactions Skill pknow time totalactions numhints COMPUTESLOPE 0.2 2 35 ?
  • 41. What might you want to do if you got this result in a real system?
  • 42. Interpreting Regression Models • Let’s quickly review the example from the video
  • 43. Example of Caveat • Let’s graph the relationship between number of graduate students and number of papers per year
  • 44. Data 0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16 Papers per year Number of graduate students
  • 45. Model • Number of papers = 4 + 2 * # of grad students - 0.1 * (# of grad students)2 • But does that actually mean that (# of grad students)2 is associated with less publication? • No!
  • 46. Example of Caveat • (# of grad students)2 is actually positively correlated with publications! – r=0.46 0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16 Papers per year Number of graduate students
  • 47. Example of Caveat • The relationship is only in the negative direction when the number of graduate students is already in the model… 0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16 Papers per year Number of graduate students
  • 48. How would you deal with this? • How can we interpret individual features in a comprehensive model?
  • 49. The videos discussed a range of algorithms • Linear Regression • Regression Trees • Logistic Regression (a classifier!) • Decision Trees • Decision Rules • Instance-Based Classifiers • Support Vector Machines • Random Forest • Neural Networks/Recurrent Neural Networks – What Transformer/Foundation Models are built on
  • 51. Has anyone • Used any classification algorithms outside the set discussed/recommended in the videos? • Say more?
  • 53. Should you • Pick one algorithm that seems really appropriate? • Run every algorithm that will actually run for your data? • Something in between?
  • 54. My typical lab practice • Pick a small number of algorithms that – Have worked on past similar problems – Fit different kinds of patterns from each other
  • 55. Is it really the algorithm? • Or is it the data you put into it? • We’ll come back to this in the Feature Engineering lecture at the end of the month
  • 57. Coleman article • Any questions or comments?
  • 58. Did anyone read Hand article? • Thoughts?
  • 59. What is Hand’s main thesis?
  • 60. What is Hand’s main thesis? “A great many tools have been developed for supervised classification, ranging from early methods such as linear discriminant analysis through to modern developments such as neural networks and support vector machines. A large number of comparative studies have been conducted in attempts to establish the relative superiority of these methods. This paper argues that these comparisons often fail to take into account important aspects of real problems, so that the apparent superiority of more sophisticated methods may be something of an illusion. In particular, simple methods typically yield performance almost as good as more sophisticated methods, to the extent that the difference in performance may be swamped by other sources of uncertainty that generally are not considered in the classical supervised classification paradigm.”
  • 61. I used to ask the class if they agree with Hand…
  • 62. I used to ask the class if they agree with Hand… • But
  • 63. But we can still learn a lot from Hand • Hand notes that for small data sets, simpler algorithms often work just as well or better • And in education, we often have small data sets • We haven’t yet figured out how to leverage web-scale data for a lot of problems in education
  • 64. The problem of generalizability (Hand) • Data points trained on are not usually drawn from the same distribution • As the data points where the classifier will be applied • Can anyone think of examples of this in education?
  • 65. Another of Hand’s key arguments • Data points trained on are often treated as certainly true and objective • But they are often arbitrary and uncertain
  • 66. Another of Hand’s key arguments • Data points trained on are often treated as certainly true and objective • But they are often arbitrary and uncertain • Where might this apply in education?
  • 67. We will discuss these issues further • Over the next two weeks • As we discuss topics like – Over-fitting and generalizability (next week) – Algorithmic bias (next week) – Diagnostic metrics (in two weeks)
  • 68. Any other questions or comments?
  • 69. Next classes • September 15 – Behavior and Affect Detection – Basic: Classifier Due • September 22 – Diagnostic Metrics – Creative: Behavior Detection Due • September 29 VIRTUAL – Feature Engineering and Distillation – Basic: Diagnostic Metrics Due