A Question of Complexity - Measuring the Maturity of Online Enquiry Communities

A QUESTION OF COMPLEXITY − MEASURING THE
MATURITY OF ONLINE ENQUIRY COMMUNITIES
GRÉGOIRE BUREL1 AND YULAN HE2
1Knowledge Media Institute, The Open University, Milton Keynes, UK.
2School of Engineering & Applied Science Aston University, UK.
HT2013
Paris, France. 2013

OUTLINE
A QUESTION OF COMPLEXITY − MEASURING THE MATURITY OF ONLINE ENQUIRY COMMUNITIES
- Question Complexity and Community Maturity
- Enquiry Communities
- Server Fault
- Needs and Motivations
- Contributions
- Hypotheses and Validation
- Two Definitions
- Five Hypotheses
- Validation
- Computing and Mapping Features
- Predictors
- Feature Computation: Users, Content and Threads.
- Measuring Content Complexity and Community Maturity
- Prediction Results
- Feature Ranking
- Community Maturity
- Future Work
- Conclusion

ENQUIRY COMMUNITIES
“Enquiry Communities are communities
composed of askers and answerers looking
for solutions to particular issues.”

- Server Fault (SF):
- A web based enquiry IT
community specialised in
server related issues.
- Factual questions rather than
conversational questions.
- Dataset (Data up to April
2011):
- 71,962 Questions
- 162,401 Answers
- 51,727 Users
- 4,999 Topics (Tags)
http://serverfault.com

ENQUIRY COMMUNITIES
- Enquiry Communities Needs (Rowe et al. 2011, Burel
et al. 2012):
- Community Managers:
- Make sure that the community is “happy” (questions are solved).
- Make sure that the community becomes more knowledgeable
over time (users gain expertise and experience).
- Identify and implement features that help users goals.
- Askers:
- Get answers related to a particular issue.
- Make sure that a community can fulfil their needs before asking
a questions.
- Answerers:
- Find which question they can answer.
- Find questions that are challenging.

ISSUES AND MOTIVATION
- Enquiry Communities Needs:
- Questions have uneven complexity:
- Difficulty to identify how hard are particular questions and who
can answer them.
- Communities have different answering abilities:
- Some communities can answers simple questions about a topic
while other communities can also answer complex questions.
- How do determine if a community is able to answer complex
questions?
- Some communities are more knowledgeable and
experienced than others:
- How do we measure experience and expertise?
- Features can support the identification of mature
communities and complex content, but which ones?
- What features help to measure community maturity and content
complexity?

IDENTIFYING COMPLEX QUESTIONS AND MATURE
COMMUNITIES
How user, content, thread and platform features affect contentcomplexity identification? How can we measure maturity based oncontent complexity?
1. Identifying Complex Questions:– Helping answerer to find relevant and challenging questions.2. Analysis of Complexity Predictors:– Helping community manager to identify important complexityfactors3. Measuring Community Maturity:– Helping users to decide if their question will beanswered/Helping community manager to understand theircommunity abilities.

CONTRIBUTIONS
How user, content, thread and platform features affect quality contentcomplexity? How can we use content complexity for measuring thematurity of communities?
- Introduce a definition of question complexity and validate thehypothesis that question complexity increases with askers’community involvement.- Study the influence of features relating to askers, answerers,questions and answers on question complexity prediction.- Introduce the concept of community maturity, a measure ofcommunity knowledge and specialisation.- Investigate the evolution of community maturity in Server Fault anddemonstrate that community maturity is influenced by topicaldynamics.

LITERATURE
How user, content, thread and platform features affect quality content
complexity? How can we use content complexity for measuring the
maturity of communities?
- No empirical study of the relation between content complexity and
community involvement.
- No free-form model of content complexity. Typically very domain
dependent (Wu, 2009; Bachrach et al. 2012).
- Community health metrics (Welinder, et al. 2010; Toral et al., 2009;
Rowe et al. 2011) tend to neglect skill building as a key health
indicator despite the importance of such factor in user participation
(Pal et al., 2012; Nam et al., 2009).

QUESTION COMPLEXITY AND MATURITY
- Definition 1 (Question Complexity):- Question complexity is a value representing thedifficulty and level of expertise required for answeringa question.
- Definition 2 (Community Maturity):- Community Maturity is a value representing the levelof knowledge and specialisation achieved by acommunity. A more mature community focuses onmore complex questions whereas a community lessmature has simpler and less focused questions.

- Hypothesis 1 (Temporality):
- For a given user, question complexity increases as a function of time and participation.
The longer a user is actively involved in a community, the more complex are her
questions.
- Hypothesis 2 (Enquiry):
- For a given user, question complexity increases with the number of question asked.
The more a user asks questions, the more likely her questions will become more
complex.
- Hypothesis 3 (Commitment):
- For a given user, question complexity increases with her activity levels. The more
frequently a user is involved in a community, the more complex are her questions.
- Hypothesis 4 (Accomplishment):
- For a given user, question complexity increases with the number of questions she has
found answers before. The more a user finds answers to some questions, the more
likely she can improve her knowledge skill and thus asks more complex questions in
the future.
- Hypothesis 5 (Focus):
complex.

- Hypothesis 1 (Temporality):
- For a given user, question complexity increases as a function of time and participation.
The longer a user is actively involved in a community, the more complex are her
questions.
- Hypothesis 2 (Enquiry):
complex.
- Hypothesis 3 (Commitment):
- For a given user, question complexity increases with her activity levels. The more
frequently a user is involved in a community, the more complex are her questions.
- Hypothesis 4 (Accomplishment):
- For a given user, question complexity increases with the number of questions she has
found answers before. The more a user finds answers to some questions, the more
likely she can improve her knowledge skill and thus asks more complex questions in
the future.
- Hypothesis 5 (Focus):
complex.
Participation
Complexity

HYPOTHESES VALIDATION
- Methodology:
1. Select 510 question pairs based on the previous hypotheses:
- Questions from early and late user contributions.
2. Annotate the question pairs by selecting what question is the most
complex:
- Due to low inter-annotator agreement (for 3 annotators, κ = 0.146), we focus on
pairs that have more than 75% agreement (220 pairs, 440 questions).
3. Calculate the statistical significance of hypothesis
- Concentration on Hypothesis 1: Temporality.
- Results (Hypothesis 1):

FEATURES
1. User Features (Askers and Answerers):
– Represents the characteristics and reputation of
askers and answerers (e.g. reputation, number
of best answers, normalised topic entropy…).
2. Questions and Answers Features:
– Questions and answers features (e.g.
readability, ratings, number of views…).
– Represents relation between answers within a
particular thread. (e.g. topic reputation, elapsed
days…).
– Content based features (e.g. term entropy,
readability…).

FEATURES
Type Features
Askers Community Age (Experience), Community Age Difference, Number of Questions
(Enquiry), Number of Answers, Asking Rate (Asker Commitment), Answering Rate,
Ratio of Successfully-Answered Questions (Accomplishment), Ratio of Question
Successfully Answered by Others, Normalised Question Topic Entropy (Focus),
Normalised Answer Topic Entropy, Average Number of Replies per Question, Average
Number of Question Views, Z-score, Reputation.
Answerers Askers features + Mean and Standard deviation forms.
Questions Number of Views, Number of Words, Readability with Gunning Fog , Readability
with Flesch-Kincaid Grade, Existing Value, Status, Number of Answers, Favourites,
Score, Informativeness, Cumulative Term Entropy.
Answers Questions features + Mean and Standard deviation forms + Elapsed Days,
Elapsed Days First, Elapsed Days Last, Number of Comments Mean, Score.

QUESTION COMPLEXITY PREDICTION
- Experimental Setting:
1. Split the annotated questions in complex
and non-complex questions (440 questions).
2. Compute features.
3. Use Logistic Regression algorithm and
validate results using 10-folds cross
validation.
4. Compute Precision (P), Recall (R), F-
Measure (F1) and area under the Receiver
Operator Curve (ROC) for different feature
groups.

COMPLEXITY PREDICTION RESULTS

COMPLEXITY PREDICTION RESULTS
- Best Answer Identification (F1 0.60):
– Baseline Models:
- Asker’s age in a community correlates better than question
length.
- Question length is not correlated with complex questions.
– Feature Types Models and Complete Model:
- Askers and answerer’s features are the best: Question
complexity is mostly related with asker’s features.
- The full model performs better than the feature type models.

FEATURES RANKING
- Features Ranking:
1. For each feature, Information Gain Ratio
(IGR), Correlation Feature Selection (CFS)
and F1 Feature Drop (FD) is computed
2. The features are then sorted by their
respective importance.
3. The best features are then selected for
computing a new question complexity model
by accounting for the best F1.

FEATURES RANKING RESULTS

FEATURES RANKING RESULTS
- Features Impact Comparison:
– Asker’s community age and topical focus are the
most important features.
– User features are the most significant (73.3% of
the top ten features).
– Answer features are low ranked.
– Focused users are more likely to ask complex
questions.
– Questions with low value (Pal et al., 2010) are
more likely to be complex (complements findings
on question selection behaviour of experts (Pal et
al., 2010)).

BEST MODEL RESULTS
- Best Model (F1 0.64):
– The best model is obtained
when using CFS, the selected
features are:
1. Asker’s question topical
focus.
2. Asker’s ratio of
successfully-answered
questions.
3. Askers’ community age.
4. Questions’ existing value
(Pal et al., 2010).
5. Questions’ views.

COMMUNITY MATURITY
- Maturity Measure:
- Experimental Setting:
1. Calculate question complexity based on the proportion of
complex questions asked per month.
2. Compute maturity on different users sets depending on
their age in the community.
3. Compute maturity for the most discussed topics (tags)
and users that have been active for more than a day.
4. Observe the evolution of maturity for the most discussed
topics and the different users groups.

COMMUNITY MATURITY RESULTS
Users Topics/Communities

COMMUNITY MATURITY RESULTS
- User Evolution:- Maturity increases over time.- Maturity drop can be explained by the drop of averagecommunity age at the end of 2010 (229 to 185 days).- Committed users are more likely to become more mature (0.64 >0.4).- Community Evolution and Topics:- Maturity increases over time.- Different topics/Different growth rates. For example:- Linux: Slow but sustained → Linux users becomes more knowledgeableover time.- Windows-server-2008: Initially high, then low → Users migrating toWindows-server-2008-r2.

FUTURE WORK
- Perform similar analysis on other Enquiry
Communities:
- Confirm our results on additional datasets.
- Derive a complexity metric that can be
applied to any online community based on
the 5 factors of complexity:
- Create a measure that does not require
annotations.

CONCLUSION
- We showed that current health measures do not help in identifying
communities that become more topic proficient over time.
- We introduced the concept of question complexity and community
maturity and provided a complexity model (F1 ≈ 0.65) and a maturity
measure.
- We showed that question complexity depends on user activity and
commitment as well as other factors (hypotheses testing).
- We found that complex questions depends on five key factors: 1)
asker’s question topical focus; 2) asker’s ratio of successfully-
answered questions; 3) askers’ community age; 4) questions’
existing value (Pal et al., 2010), and; 5) questions’ views.
- We showed that SF is a mature community and that maturity has
topical dynamics.

QUESTIONS?
Web: http://evhart.online.fr
Email: g.burel@open.ac.uk
Twitter: @evhart
@www

REFERENCES
- Rowe, M., Alani, H., Angeletou, S., and Burel, G. Report on social, technical and corporate
needs in online communities. Tech. Rep. 3.1, ROBUST, 2011.
- Burel, G, Yulan H., Alani H. Automatic Identification Of Best Answers In Online Enquiry
Communities. In Proceeding of ESWC2012 (2012). Heraklion, Greece.
- Wu, M. The community health index. In Proceedings of the 4th International Conference on
Persuasive Technology (New York, NY, USA, 2009), Persuasive ’09, ACM, pp. 24:1–24:2.
- Bachrach, Y., Graepel, T., Minka, T., and Guiver, J. How to grade a test without knowing the
Answers - A bayesian graphical model for adaptive crowdsourcing and aptitude testing. arXiv
preprint arXiv:1206.6386 (2012).
- Welinder, P., Branson, S., Belongie, S., and Perona, P. The multidimensional wisdom of crowds. In
In Proc. of NIPS (2010), pp. 2424–2432.
- Toral, S. L., Martınez-Torres, M. R., Barrero, F., and Cortals, F. An empirical study of the driving
forces behind online communities. Internet Research 19, 4 (2009), 378–392.
- Pal, A., Chang, S., and Konstan, J. Evolution of experts in question answering communities. In
Proceedings of the International AAAI Conference on Weblogs and Social Media (2012), pp. 274–
281.
- Nam, K., Ackerman, M., and Adamic, L. Questions in, knowledge in?: a study of naver’s question
answering community. In Proceedings of the 27th international conference on Human factors in
computing systems (2009), pp. 779–788.
- Pal, A., Chang, S., and Konstan, J. Evolution of experts in question answering communities. In
Proceedings of the International AAAI Conference on Weblogs and Social Media (2012), pp. 274–
281.

A Question of Complexity - Measuring the Maturity of Online Enquiry Communities

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (8)

Similaire à A Question of Complexity - Measuring the Maturity of Online Enquiry Communities

Similaire à A Question of Complexity - Measuring the Maturity of Online Enquiry Communities (20)

Plus de Gregoire Burel

Plus de Gregoire Burel (8)

Dernier

Dernier (20)

A Question of Complexity - Measuring the Maturity of Online Enquiry Communities

Notes de l'éditeur