Towards Automatic Analysis of Online Discussions among Hong Kong Students

Xiao Hu
University of Hong Kong
CITE Research Symposium 2013
May 12, 2013
Towards Automatic Analysis of Online
Discussions Among Hong Kong
Students

Outline
 Goals and Purposes
 Data Mining and Applications to Online Discussions
 Classification
 Association Rule Mining
 Findings
 More questions to answer
 Bridging research and teaching

Goals and Purposes
 Online discussions are widely used in education
 Effective for communication and collaboration
 Need tools to monitor online discussions
 Data mining may help (semi-)automatically identify
various patterns in online discussions, for example:
 Threads that need interventions
 Outcome predictions
 Role identification (e.g., question raiser, answer
provide, etc.)
 Network analysis of student groups
 Assessment of discussion quality
 .....

This Study
 How effective it is to mine online discussions
of HK students?
 A case study on
 1,965 discussion posts
 on the subject of global warming
 collected from five primary or secondary schools in
Hong Kong from years 2006-2009
 383 discussion threads involving 1 to 21
participants
 Two commonly used Data Mining techniques
 Classification
 Association rule mining

What is Data Mining?
 To identify patterns (or to prove no patterns) from a
dataset
 DM is NOT querying databases
 Where you know what you are looking for
 E.g., total sales in the past three years
 DM is NOT statistical testing
 Where you know the hypotheses
 E.g. H0: the means of two groups are equal
 DM is discovery-based
 Find out unknown patterns, generate hypotheses
 DM is iterative
 exhaustively explore very large data sets

Data Mining –
Classification
 Functionality: to assign one of a number of class
labels to each instance of your data
 Examples of classification tasks:
 Predicting tumor cells as benign or malignant
 Classifying credit card transactions as legitimate or
fraudulent
 Categorizing news stories as finance, weather,
entertainment, sports, etc
 Categorizing library materials by catalogs
 Predicting whether a post in an online forum will get
replies or not

How Classification Works?
 Given a collection of data (training set )
 Each instance contains a set of attributes, one of the
attributes is the class label.
 Find (calculate) a model for the class label as a
function of the values of other attributes
 Goal: previously unseen data can then be fed to
the model and the model assigns a class label
as accurately as possible
 Performance measure: accuracy
 How many instances are correctly classified

An Illustrative Example (1)
8
Training
Data
NAME RANK YEARS TENURED
Mike Assistant Prof 3 no
Mary Assistant Prof 7 yes
Bill Professor 2 yes
Jim Associate Prof 7 yes
Dave Assistant Prof 6 no
Anne Associate Prof 3 no
Classification
Algorithms
IF rank = ‘professor’
OR years > 6
THEN tenured = ‘yes’
Classifier
(Model)

An Illustrative Example (2)
9
Classification
Algorithms
IF rank = ‘professor’
OR years > 6
THEN tenured = ‘yes’
Classifier
(Model)
Unseen Data
(Jeff, Professor, 4)
Tenured?

Classifying Online Discussions
(1)
 Task1: threads with one vs. many participants
 To predict whether a post belongs to a thread
involving only one participant or a thread involving
many (> 14) participants
 Attributes used to build classification model
 Words in the posts: individual words (unigram)
two consecutive words (bigrams)
 Classification algorithm: Naive Bayesian
 Empirically effective in text categorization
 Performance: 79.07%

Classifying Online Discussions
(2)
 Task2: initial posts with vs. without replies
 To predict whether an initial post are likely to get
replies or not
 Attributes used to build classification model
 Words in the posts: individual words (unigram)
two consecutive words (bigrams)
 Classification algorithm: Naive Bayesian
 Empirically effective in text categorization
 Performance: 64%
Need to look deeper: mine patterns in each
category

Data Mining – Association Rules
 Functionality: to find associative relations
between patterns frequently occurring in your
data
 {Pattern A} => {Pattern B} with certain probability
 Examples of association rule mining tasks:
 Basket (shopping cart) analysis: customers buying
product A often also buy product B
 Medical diagnosis: a patient with symptoms A is
likely to have disease B
 Protein sequences: the appearances of amino acids
A indicates a greater chance of also having amino
acids C
 Online discussions: a post with word or phrase A is
likely to be in class B

Mining Association Rules from
Online Discussions (1)
 Task 1: Words and phrases strongly associated
with threads with one or many participants
Rank One participant Many participants
1 dioxide i agree
2 carbon dioxide agree
3 carbon i
4 temperature greenhouse gases
5 global warming i think
6 global think
7 warming yes
8 power carbon dioxide
9 air global warming
10 water yeah

Mining Association Rules from
Online Discussions (2)
 Task 2: Words and phrases strongly associated
with initial posts with or without replies
Rank Has no reply Has replies
1 global warming protect
2 earth’s melt
3 global world
4 warming warming
5 earth sea
6 s i
7 greenhouse ice
8 effect rise
9 gases global warming
10 greenhouse effect global

Findings and future work
 Data mining techniques were able to find patterns
from online discussions among Hong Kong
students
 It was feasible to distinguish threads and posts in
contrast categories
 Same techniques can be applied to distinguish
 Shallow and deep discussions (depth of threads)
 Confusion level of posts (need annotations on
training data)
 Speech acts of posts (need annotations on training
data)
 Emotions in the posts (need annotations on training
data)

Integrating Research and
Teaching
 Both data mining techniques are discussed and
practiced in the Data Mining course in the
Bachelor of Science in Information Management
(BSIM 0018)
 The tool used in this project is also taught in the
course
 Projects like this can be students’ course projects,

Thank you!
Questions, comments, and suggestions are
appreciated!
Xiao Hu: xiaoxhu@hku.hk

Towards Automatic Analysis of Online Discussions among Hong Kong Students

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (6)

Similaire à Towards Automatic Analysis of Online Discussions among Hong Kong Students

Similaire à Towards Automatic Analysis of Online Discussions among Hong Kong Students (20)

Plus de CITE

Plus de CITE (20)

Dernier

Dernier (20)

Towards Automatic Analysis of Online Discussions among Hong Kong Students