SlideShare une entreprise Scribd logo
1  sur  25
Text Mining and
Educational Discourse
Dr. Chi-Un Lei, Dept. of Electrical and Electronic Eng.
LASI-HK 2014
(Adopted from LASI workshop 2014)
1
Words from the Speaker
“The key insight communicated through this
workshop is that …
If we can understand the connection between socio-
psychological processes and language by means of
the social signals encoded in them, we can
structure computational models of language
interactions more effectively.”
--- Carolyn Penstein Rosé
2
Outline
 Theoretical: Connection between discourse and
learning
 From rich but implicit constructs to explicit features
that capture the essence for machine learning
 Hands-on: Machine learning for text extraction
and classification
3
Automatic
Analysis
Of
Conversation
Conversational
Interventions
Positive
Learning
Outcomes
Educational Discourse
4
Sociolinguistics
Discourse Analysis
Language
And
Identity
Language
Use
Machine
Learning
Multi-
Level
Modeling
Applied
Statistics
Computational
Models
Of
Discourse
Analysis
Souffle Framework
5
Engagement Engagement
• Transactive
• Knowledge Integration
Person Person
Authority Authority
 Analysis of discussions for learning
Transactivity
 Building on an idea expressed earlier in a
conversation
 Using a reasoning statement
6
We don't want tmax
to be at 570 both
for the material and
[the Environment]
Well, for power and
efficiency, we want a
high tmax, but
environmentally, we
want a lower one.
7
8
System of Engagement
 Showing openness to the existence of other
perspectives
 Examples
 Nuclear is a good choice
 I consider nuclear to be a good choice
 There’s no denying that nuclear is a superior choice
 Is nuclear a good choice?
9
10
11
What is machine learning?
 Automatically or semi-automatically
 Inducing concepts (i.e., rules) from data
 Finding patterns in data (For human and computer)
 Explaining data
 Making predictions
12
Data Learning Algorithm Model
New Data
Prediction
Classification Engine
13
Keep this picture in mind…
 Machine learning isn’t magic
 But it can be useful for identifying meaningful
patterns in your data when used properly
 Proper use requires insight into your data
 Otherwise, GIGO (Garbage In Garbage Out)
 Think like a computer!
14
Machine Learning for Text Mining
 Basic features: “Bag of Words”
 Represent text as a vector where each position
corresponds to a term
15
• Cows make cheese. (110010)
• Cheese make cows. (110010)
• Hamsters eat seeds. (001101)
Cheese
Cows
Eat
Hamsters
Make
Seeds
Basic Types of Features
 Unigram
 Single words (e.g. prefer, sandwhich)
 Bigram
 Pairs of words next to each other (e.g. eat bread)
 Simple lexical patterns
 e.g. “common denominator” versus “common multiple”
 Punctuation
 “You think the answer is 9?” vs. “You think the answer is 9.”
16
Part of Speech (POS) Tagging
 POS bigrams capture syntactic or stylistic
information
 e.g. “the answer which is …” vs “which is the answer”
 Pairs of POS (Part-of-Speech) tags next to each other
 DT_NN: "Determiner"_"Noun, singular or mass “
 NNP_NNP: “Proper noun, singular”_“Proper noun,
singular”
 Examples
 JJR: Adjective, comparative
17
http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html
Feature Space Customizations
 Machine learning algorithms look for features
that are good predictors, NOT features that
are necessarily meaningful
 Look for approximations
e.g. Don’t need to do a complete syntactic
analysis for questions
Look for question marks
Look for wh-terms that occur immediately before an
auxilliary verb --- Combined features
18
LightSide
 Easy UI
 Feature Extraction
 Model Building /
Machine Learning
 Error Analysis
 Data Structuring
 Free/open-source for
adoption and extension
19
Feature
Extraction
20
Machine
Learning
and
Evaluation
21
Recap …
“The key insight communicated through this
workshop is that …
If we can understand the connection between socio-
psychological processes and language by means of
the social signals encoded in them, we can
structure computational models of language
interactions more effectively.”
--- Carolyn Penstein Rosé
22
Examples of Part of Speech Tagging
1. CC Coordinating conjunction
2. CD Cardinal number
3. DT Determiner
4. EX Existential there
5. FW Foreign word
6. IN Preposition/subord
7. JJ Adjective
8. JJR Adjective, comparative
9. JJS Adjective, superlative
10.LS List item marker
11.MD Modal
12.NN Noun, singular or mass
13.NNS Noun, plural
14.NNP Proper noun, singular
15.NNPS Proper noun, plural
16.PDT Predeterminer
17.POS Possessive ending
18.PRP Personal pronoun
19.PP Possessive pronoun
20.RB Adverb
21.RBR Adverb, comparative
22.RBS Adverb, superlative23
http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html
Examples of Part of Speech Tagging
23.RP Particle
24.SYM Symbol
25.TO to
26.UH Interjection
27.VB Verb, base form
28.VBD Verb, past tense
29.VBG Verb,
gerund/present participle
30.VBN Verb, past participle
31.VBP Verb, non-3rd ps.
sing. present
32.VBZ Verb, 3rd ps. sing.
present
33.WDT wh-determiner
34.WP wh-pronoun
35.WP Possessive wh-
pronoun
36.WRB wh-adverb
http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html
Findings
 Transactivity (Berkowitz & Gibbs, 1983)
 Moderating effect on learning (Joshi & Rosé, 2007; Russell, 2005;
Kruger & Tomasello, 1986; Teasley, 1995)
 Moderating effect on knowledge sharing in working groups (Gweon et
al., 2011)
 Engagement (Martin & White, 2005)
 Correlational analysis: Strong correlation between displayed openness
of group members and articulation of reasoning (R = .72) (Dyke et al.,
in press)
 Intervention study: Causal effect on propensity to articulate ideas in
group chats (effect size .6 standard deviations) (Kumar et al., 2011)
 Mediating effect of idea contribution on learning in scientific inquiry
(Wang et al., 2011)
25

Contenu connexe

Similaire à Chi-Un Lei "Text Mining and Educational Discourse"

EuroVis DocuBurst Presentation 2009
EuroVis DocuBurst Presentation 2009EuroVis DocuBurst Presentation 2009
EuroVis DocuBurst Presentation 2009
Christopher Collins
 
Simulating meaning: a neural theory of discourse coherence
Simulating meaning: a neural theory of discourse coherenceSimulating meaning: a neural theory of discourse coherence
Simulating meaning: a neural theory of discourse coherence
Terry McDonough
 

Similaire à Chi-Un Lei "Text Mining and Educational Discourse" (20)

Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
DDH 2021-03-03: Text Processing and Searching in the Medical Domain
DDH 2021-03-03: Text Processing and Searching in the Medical DomainDDH 2021-03-03: Text Processing and Searching in the Medical Domain
DDH 2021-03-03: Text Processing and Searching in the Medical Domain
 
NLP
NLPNLP
NLP
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
REPORT.doc
REPORT.docREPORT.doc
REPORT.doc
 
Nlp Sentemental analysis of Tweetr And CaseStudy
Nlp Sentemental analysis of Tweetr And CaseStudyNlp Sentemental analysis of Tweetr And CaseStudy
Nlp Sentemental analysis of Tweetr And CaseStudy
 
Cognitive plausibility in learning algorithms
Cognitive plausibility in learning algorithmsCognitive plausibility in learning algorithms
Cognitive plausibility in learning algorithms
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Nlp
NlpNlp
Nlp
 
Syntax (2).pptx
Syntax (2).pptxSyntax (2).pptx
Syntax (2).pptx
 
EuroVis DocuBurst Presentation 2009
EuroVis DocuBurst Presentation 2009EuroVis DocuBurst Presentation 2009
EuroVis DocuBurst Presentation 2009
 
Simulating meaning: a neural theory of discourse coherence
Simulating meaning: a neural theory of discourse coherenceSimulating meaning: a neural theory of discourse coherence
Simulating meaning: a neural theory of discourse coherence
 
NLP
NLPNLP
NLP
 
NLP
NLPNLP
NLP
 
Discourse Analysis for Social Research
Discourse Analysis for Social ResearchDiscourse Analysis for Social Research
Discourse Analysis for Social Research
 
Natural lanaguage processing
Natural lanaguage processingNatural lanaguage processing
Natural lanaguage processing
 
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
 
Artificial Thinking: can machines reason with analogies?
Artificial Thinking:  can machines reason with analogies? Artificial Thinking:  can machines reason with analogies?
Artificial Thinking: can machines reason with analogies?
 
Midwest km pugh conversational ai and ai for conversation 190809
Midwest km pugh conversational ai and ai for conversation 190809Midwest km pugh conversational ai and ai for conversation 190809
Midwest km pugh conversational ai and ai for conversation 190809
 

Plus de CITE

The implementation of "Reading Battle" in Lam Tin Methodist Primary School
The implementation of "Reading Battle" in Lam Tin Methodist Primary SchoolThe implementation of "Reading Battle" in Lam Tin Methodist Primary School
The implementation of "Reading Battle" in Lam Tin Methodist Primary School
CITE
 
Strengthening students' reading comprehension ability (both Chinese and Engli...
Strengthening students' reading comprehension ability (both Chinese and Engli...Strengthening students' reading comprehension ability (both Chinese and Engli...
Strengthening students' reading comprehension ability (both Chinese and Engli...
CITE
 
Gobert, Dede, Martin, Rose "Panel: Learning Analytics and Learning Sciences"
Gobert, Dede, Martin, Rose "Panel: Learning Analytics and Learning Sciences"Gobert, Dede, Martin, Rose "Panel: Learning Analytics and Learning Sciences"
Gobert, Dede, Martin, Rose "Panel: Learning Analytics and Learning Sciences"
CITE
 
Xiao Hu "Learning Analytics Initiatives"
Xiao Hu "Learning Analytics Initiatives"Xiao Hu "Learning Analytics Initiatives"
Xiao Hu "Learning Analytics Initiatives"
CITE
 
Tiffany Barnes "Making a meaningful difference: Leveraging data to improve le...
Tiffany Barnes "Making a meaningful difference: Leveraging data to improve le...Tiffany Barnes "Making a meaningful difference: Leveraging data to improve le...
Tiffany Barnes "Making a meaningful difference: Leveraging data to improve le...
CITE
 
Phil Winne "Learning Analytics for Learning Science When N = me"
Phil Winne "Learning Analytics for Learning Science When N = me"Phil Winne "Learning Analytics for Learning Science When N = me"
Phil Winne "Learning Analytics for Learning Science When N = me"
CITE
 

Plus de CITE (20)

Keynote 1: Teaching and Learning Computational Thinking at Scale
Keynote 1: Teaching and Learning Computational Thinking at ScaleKeynote 1: Teaching and Learning Computational Thinking at Scale
Keynote 1: Teaching and Learning Computational Thinking at Scale
 
Keynote 2: Social Epistemic Cognition in Engineering Learning: Theory, Pedago...
Keynote 2: Social Epistemic Cognition in Engineering Learning: Theory, Pedago...Keynote 2: Social Epistemic Cognition in Engineering Learning: Theory, Pedago...
Keynote 2: Social Epistemic Cognition in Engineering Learning: Theory, Pedago...
 
Changing Technology Changing Practice: Empowering Staff and Building Capabili...
Changing Technology Changing Practice: Empowering Staff and Building Capabili...Changing Technology Changing Practice: Empowering Staff and Building Capabili...
Changing Technology Changing Practice: Empowering Staff and Building Capabili...
 
Traditional Large Scale Educational Assessment and the Incorporation of Digit...
Traditional Large Scale Educational Assessment and the Incorporation of Digit...Traditional Large Scale Educational Assessment and the Incorporation of Digit...
Traditional Large Scale Educational Assessment and the Incorporation of Digit...
 
Scaling up Assessment for Learning
Scaling up Assessment for LearningScaling up Assessment for Learning
Scaling up Assessment for Learning
 
Seminar on policy study on e-Learning in Informal Learning contexts
Seminar on policy study on e-Learning in Informal Learning contextsSeminar on policy study on e-Learning in Informal Learning contexts
Seminar on policy study on e-Learning in Informal Learning contexts
 
Seminar on policy study on e-Learning in Formal & Open Learning contexts
Seminar on policy study on e-Learning in Formal & Open Learning contextsSeminar on policy study on e-Learning in Formal & Open Learning contexts
Seminar on policy study on e-Learning in Formal & Open Learning contexts
 
Prof. Gerald KNEZEK: Implications of Digital Generations for a Learning Society
Prof. Gerald KNEZEK: Implications of Digital Generations for a Learning Society Prof. Gerald KNEZEK: Implications of Digital Generations for a Learning Society
Prof. Gerald KNEZEK: Implications of Digital Generations for a Learning Society
 
G:\CITERS2015\29May2015\2 Invited-Talk-2-Sidorko-Fred
G:\CITERS2015\29May2015\2 Invited-Talk-2-Sidorko-FredG:\CITERS2015\29May2015\2 Invited-Talk-2-Sidorko-Fred
G:\CITERS2015\29May2015\2 Invited-Talk-2-Sidorko-Fred
 
Dr. David Gibson: Challenge-Based Learning
Dr. David Gibson: Challenge-Based LearningDr. David Gibson: Challenge-Based Learning
Dr. David Gibson: Challenge-Based Learning
 
Analogy, Causality, and Discovery in Science: The engines of human thought
Analogy, Causality, and Discovery in Science: The engines of human thoughtAnalogy, Causality, and Discovery in Science: The engines of human thought
Analogy, Causality, and Discovery in Science: The engines of human thought
 
Educating the Scientific Brain and Mind: Insights from The Science of Learnin...
Educating the Scientific Brain and Mind: Insights from The Science of Learnin...Educating the Scientific Brain and Mind: Insights from The Science of Learnin...
Educating the Scientific Brain and Mind: Insights from The Science of Learnin...
 
Science of Learning — Why it matters to schools and families?
Science of Learning — Why it matters to schools and families?Science of Learning — Why it matters to schools and families?
Science of Learning — Why it matters to schools and families?
 
Understanding the self through self bias
Understanding the self through self biasUnderstanding the self through self bias
Understanding the self through self bias
 
The implementation of "Reading Battle" in Lam Tin Methodist Primary School
The implementation of "Reading Battle" in Lam Tin Methodist Primary SchoolThe implementation of "Reading Battle" in Lam Tin Methodist Primary School
The implementation of "Reading Battle" in Lam Tin Methodist Primary School
 
Strengthening students' reading comprehension ability (both Chinese and Engli...
Strengthening students' reading comprehension ability (both Chinese and Engli...Strengthening students' reading comprehension ability (both Chinese and Engli...
Strengthening students' reading comprehension ability (both Chinese and Engli...
 
Gobert, Dede, Martin, Rose "Panel: Learning Analytics and Learning Sciences"
Gobert, Dede, Martin, Rose "Panel: Learning Analytics and Learning Sciences"Gobert, Dede, Martin, Rose "Panel: Learning Analytics and Learning Sciences"
Gobert, Dede, Martin, Rose "Panel: Learning Analytics and Learning Sciences"
 
Xiao Hu "Learning Analytics Initiatives"
Xiao Hu "Learning Analytics Initiatives"Xiao Hu "Learning Analytics Initiatives"
Xiao Hu "Learning Analytics Initiatives"
 
Tiffany Barnes "Making a meaningful difference: Leveraging data to improve le...
Tiffany Barnes "Making a meaningful difference: Leveraging data to improve le...Tiffany Barnes "Making a meaningful difference: Leveraging data to improve le...
Tiffany Barnes "Making a meaningful difference: Leveraging data to improve le...
 
Phil Winne "Learning Analytics for Learning Science When N = me"
Phil Winne "Learning Analytics for Learning Science When N = me"Phil Winne "Learning Analytics for Learning Science When N = me"
Phil Winne "Learning Analytics for Learning Science When N = me"
 

Dernier

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 

Dernier (20)

Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 

Chi-Un Lei "Text Mining and Educational Discourse"

  • 1. Text Mining and Educational Discourse Dr. Chi-Un Lei, Dept. of Electrical and Electronic Eng. LASI-HK 2014 (Adopted from LASI workshop 2014) 1
  • 2. Words from the Speaker “The key insight communicated through this workshop is that … If we can understand the connection between socio- psychological processes and language by means of the social signals encoded in them, we can structure computational models of language interactions more effectively.” --- Carolyn Penstein Rosé 2
  • 3. Outline  Theoretical: Connection between discourse and learning  From rich but implicit constructs to explicit features that capture the essence for machine learning  Hands-on: Machine learning for text extraction and classification 3 Automatic Analysis Of Conversation Conversational Interventions Positive Learning Outcomes
  • 5. Souffle Framework 5 Engagement Engagement • Transactive • Knowledge Integration Person Person Authority Authority  Analysis of discussions for learning
  • 6. Transactivity  Building on an idea expressed earlier in a conversation  Using a reasoning statement 6 We don't want tmax to be at 570 both for the material and [the Environment] Well, for power and efficiency, we want a high tmax, but environmentally, we want a lower one.
  • 7. 7
  • 8. 8
  • 9. System of Engagement  Showing openness to the existence of other perspectives  Examples  Nuclear is a good choice  I consider nuclear to be a good choice  There’s no denying that nuclear is a superior choice  Is nuclear a good choice? 9
  • 10. 10
  • 11. 11
  • 12. What is machine learning?  Automatically or semi-automatically  Inducing concepts (i.e., rules) from data  Finding patterns in data (For human and computer)  Explaining data  Making predictions 12 Data Learning Algorithm Model New Data Prediction Classification Engine
  • 13. 13
  • 14. Keep this picture in mind…  Machine learning isn’t magic  But it can be useful for identifying meaningful patterns in your data when used properly  Proper use requires insight into your data  Otherwise, GIGO (Garbage In Garbage Out)  Think like a computer! 14
  • 15. Machine Learning for Text Mining  Basic features: “Bag of Words”  Represent text as a vector where each position corresponds to a term 15 • Cows make cheese. (110010) • Cheese make cows. (110010) • Hamsters eat seeds. (001101) Cheese Cows Eat Hamsters Make Seeds
  • 16. Basic Types of Features  Unigram  Single words (e.g. prefer, sandwhich)  Bigram  Pairs of words next to each other (e.g. eat bread)  Simple lexical patterns  e.g. “common denominator” versus “common multiple”  Punctuation  “You think the answer is 9?” vs. “You think the answer is 9.” 16
  • 17. Part of Speech (POS) Tagging  POS bigrams capture syntactic or stylistic information  e.g. “the answer which is …” vs “which is the answer”  Pairs of POS (Part-of-Speech) tags next to each other  DT_NN: "Determiner"_"Noun, singular or mass “  NNP_NNP: “Proper noun, singular”_“Proper noun, singular”  Examples  JJR: Adjective, comparative 17 http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html
  • 18. Feature Space Customizations  Machine learning algorithms look for features that are good predictors, NOT features that are necessarily meaningful  Look for approximations e.g. Don’t need to do a complete syntactic analysis for questions Look for question marks Look for wh-terms that occur immediately before an auxilliary verb --- Combined features 18
  • 19. LightSide  Easy UI  Feature Extraction  Model Building / Machine Learning  Error Analysis  Data Structuring  Free/open-source for adoption and extension 19
  • 22. Recap … “The key insight communicated through this workshop is that … If we can understand the connection between socio- psychological processes and language by means of the social signals encoded in them, we can structure computational models of language interactions more effectively.” --- Carolyn Penstein Rosé 22
  • 23. Examples of Part of Speech Tagging 1. CC Coordinating conjunction 2. CD Cardinal number 3. DT Determiner 4. EX Existential there 5. FW Foreign word 6. IN Preposition/subord 7. JJ Adjective 8. JJR Adjective, comparative 9. JJS Adjective, superlative 10.LS List item marker 11.MD Modal 12.NN Noun, singular or mass 13.NNS Noun, plural 14.NNP Proper noun, singular 15.NNPS Proper noun, plural 16.PDT Predeterminer 17.POS Possessive ending 18.PRP Personal pronoun 19.PP Possessive pronoun 20.RB Adverb 21.RBR Adverb, comparative 22.RBS Adverb, superlative23 http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html
  • 24. Examples of Part of Speech Tagging 23.RP Particle 24.SYM Symbol 25.TO to 26.UH Interjection 27.VB Verb, base form 28.VBD Verb, past tense 29.VBG Verb, gerund/present participle 30.VBN Verb, past participle 31.VBP Verb, non-3rd ps. sing. present 32.VBZ Verb, 3rd ps. sing. present 33.WDT wh-determiner 34.WP wh-pronoun 35.WP Possessive wh- pronoun 36.WRB wh-adverb http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html
  • 25. Findings  Transactivity (Berkowitz & Gibbs, 1983)  Moderating effect on learning (Joshi & Rosé, 2007; Russell, 2005; Kruger & Tomasello, 1986; Teasley, 1995)  Moderating effect on knowledge sharing in working groups (Gweon et al., 2011)  Engagement (Martin & White, 2005)  Correlational analysis: Strong correlation between displayed openness of group members and articulation of reasoning (R = .72) (Dyke et al., in press)  Intervention study: Causal effect on propensity to articulate ideas in group chats (effect size .6 standard deviations) (Kumar et al., 2011)  Mediating effect of idea contribution on learning in scientific inquiry (Wang et al., 2011) 25

Notes de l'éditeur

  1. Don’t forget that you need to prune