Wikidata is a collaborative knowledge graph edited by both humans and bots. Research found that a mix of human and bot edits, along with diversity in editor tenure and interests, led to higher quality items. Most external references in Wikidata were found to be relevant and from authoritative sources like governments and academics. Neural networks can generate multilingual summaries for Wikidata items that match Wikipedia style and are useful for editors in underserved language editions.
2. OVERVIEW
Wikidata is a critical AI asset
in many domains
Recent project of Wikimedia
(2012), edited
collaboratively
Our research assesses the
quality of Wikidata and the
link between community
processes and quality
5. THE KNOWLEDGE GRAPH
STATEMENTS, ITEMS, PROPERTIES
Item identifiers start with a Q, property identifiers
start with a P
5
Q84
London
Q334155
Sadiq Khan
P6
head of government
6. THE KNOWLEDGE GRAPH
ITEMS CAN BE CLASSES, ENTITIES, VALUES
6
Q7259
Ada Lovelace
Q84
London
Q334155
Sadiq Khan
P6
head of government
Q727
Amsterdam
Q515
city
Q6581097
male
Q59360
Labour party
Q145
United Kingdom
7. THE KNOWLEDGE GRAPH
ADDING CONTEXT TO STATEMENTS
Statements may include context
Qualifiers (optional)
References (required)
Two types of references
Internal, linking to another item
External, linking to webpage
7
Q84
London
Q334155
Sadiq Khan
P6
head
of government
9 May 2016
https://www.london.gov.uk/...
8. THE KNOWLEDGE GRAPH
CO-EDITED BY BOTS AND HUMANS
Human editors can register or work anonymously
Bots created by community for routine tasks
18k active human users, 200+ bots
9. OUR WORK
Effects of editing behaviour and community
make-up on the knowledge graph
Content quality as a function of its provenance
Tools to improve content diversity
10. THE RIGHT MIX OF USERS
Piscopo, A., Phethean, C., & Simperl, E. (2017). What
Makes a Good Collaborative Knowledge Graph:
Group Composition and Quality in Wikidata.
International Conference on Social Informatics, 305-
322, Springer.
11. BACKGROUND
Wikidata editors have varied tenure and interests
Group composition impacts outcomes
Diversity can have multiple effects
Moderate tenure diversity increases outcome quality
Interest diversity leads to increased group productivity
Chen, J., Ren, Y., Riedl, J.: The effects of diversity on group productivityand member withdrawalin online volunteer groups. In: Proceedingsof the 28th international
conference on human factors in computing systems - CHI ’10. p. 821. ACM Press, New York, USA (2010)
12. OUR STUDY
Analysed the edit history of items
Corpus of 5k items, whose quality has been
manually assessed (5 levels)*
Edit history focused on community make-up
Community is defined as set of editors of item
Considered features from group diversity
literature and Wikidata-specific aspects
*https://www.wikidata.org/wiki/Wikidata:Item_quality
14. DATA AND METHODS
Ordinal regression analysis, four models were trained
Dependent variable: 5k labelled Wikidata items
Independent variables
Proportion of bot edits
Bot human edit proportion
Proportion of anonymous edits
Tenure diversity: Coefficient of variation
Interest diversity: User editing matrix
Control variables: group size, item age
18. LIMITATIONS AND FUTURE WORK
Did not consider evolution of quality over time
Sample vs Wikidata (most items C or lower)
Other group features (e.g., coordination) not
considered
No distinction between editing activities (e.g.,
schema vs instances, topics etc.)
Different metrics of interest (topics, type of
activity)
18
19. THE CONTENT IS AS
GOOD AS ITS
REFERENCES
Piscopo, A., Kaffee, L. A., Phethean, C., & Simperl, E.
(2017). Provenance Information in a Collaborative
Knowledge Graph: an Evaluation of Wikidata External
References. International Semantic Web Conference,
542-558, Springer.
19
20. PROVENANCE IN WIKIDATA
Statements may include context
Qualifiers (optional)
References (required)
Two types of references
Internal, linking to another item
External, linking to webpage
Q84
London
Q334155
Sadiq Khan
P6
head
of government
9 May 2016
https://www.london.gov.uk/...
21. THE ROLE OF PROVENANCE
Wikidata aims to become a hub of references
Provenance increases trust in Wikidata
Lack of provenance hinders content reuse
Quality of references is yet unknown
Hartig, O. (2009). Provenance Information in the Web of Data. LDOW, 538.
22. OUR STUDY
Approach to evaluate quality of external
references in Wikidata
Quality is defined by the Wikidata verifiability
policy
Relevant: support the statement they are attached to
Authoritative: trustworthy, up-to-date, and free of bias for supporting a
particular statement
Large-scale (the whole of Wikidata)
Bot vs. human-contributed references
23. RESEARCH QUESTIONS
RQ1 Are Wikidata external references relevant?
RQ2 Are Wikidata external references
authoritative?
I.e., do they match the author and publisher types from
the Wikidata policy?
RQ3 Can we automatically detect non-relevant
and non-authoritative references?
24. METHODS
TWO STAGE MIXED APPROACH
1. Microtask crowdsourcing
Evaluate relevance & authoritativeness
of a reference sample
Create training set for machine
learning model
2. Machine learning
Large-scale reference quality prediction
RQ1 RQ2
RQ3
25. STAGE 1: MICROTASK CROWDSOURCING
3 tasks on Crowdflower
5 workers/task, majority voting
Test questions to select workers
25
Feature Microtask Description
Relevance T1 Does the reference support the statement?
Authoritativeness
T2 Choose author type from list
T3.A Choose publisher type from list
T3.B Verify publisher type, then choose sub-type from list
RQ1
RQ2
26. STAGE 2: MACHINE LEARNING
Compared three algorithms
Naïve Bayes, Random Forest, SVM
Features based on [Lehmann et al., 2012 & Potthast et
al. 2008]
Baseline: item labels matching (relevance);
deprecated domains list (authoritativeness)
RQ3
Features
URL reference uses Subject parent class
Source HTTP code Property parent class
Statement item vector Object parent class
Statement object vector Author type
Author activity Author activity on references
27. DATA
1.6M external references (6% of total)
1.4M from two sources (protein KBs)
83,215 English-language references
Sample of 2586 (99% conf., 2.5% m. of error)
885 assessed automatically, e.g., links not working
or csv files
28. RESULTS: CROWDSOURCING
CROWDSOURCING WORKS
Trusted workers: >80% accuracy
95% of responses from T3.A confirmed in T3.B
Task No. of microtasks Total workers Trusted workers Workers’ accuracy Fleiss’ k
T1 1701 references 457 218 75% 0.335
T2 1178 links 749 322 75% 0.534
T3.A 335 web domains 322 60 66% 0.435
T3.B 335 web domains 239 116 68% 0.391
29. RESULTS: CROWDSOURCING
MAJORITY OF REFERENCES ARE HIGH QUALITY
2586 references evaluated
Found 1674 valid references from 345 domains
Broken URLs deemed not relevant and not authoritative
RQ1
RQ2
31. RESULTS: CROWDSOURCING
DATA FROM GOVERNMENT AND ACADEMIC SOURCES
Most common author type (T2)
Organisation (78%)
Most common publisher types (T3)
Governmental agencies (37%)
Academic organisations (24%)
RQ2
32. RESULTS: MACHINE LEARNING
RANDOM FORESTS PERFORM BEST
F1 MCC
Relevance
Baseline 0.84 0.68
Naïve Bayes 0.90 0.86
Random Forest 0.92 0.89
SVM 0.91 0.87
Authoritativeness
Baseline 0.53 0.16
Naïve Bayes 0.86 0.78
Random Forest 0.89 0.83
SVM 0.89 0.79
RQ3
33. LESSONS LEARNED
Crowdsourcing+ML works!
Many external sources are high quality
Bad references mainly non-working links,
continuous control required
Lack of diversity in bot-added sources
Humans and bots are good at different things
34. LIMITATIONS AND FUTURE WORK
Studies with non-English sources
Did not consider internal references
Deployment in Wikidata, including changes in
editing behaviour
35. FROM NEURAL
NETWORKS TO A
MULTILINGUAL
WIKIPEDIA
Kaffee, L., Elsahar, H., Vougiouklis, P., Gravier, C.,
Laforest, F., Hare, J., & Simperl, E. (2018) Mind the
(Language) Gap: Generation of Multilingual
Wikipedia Summaries from Wikidata for
ArticlePlaceholders. European Semantic Web
Conference, to appear. Springer
35
36. BACKGROUND
Wikipedia is available in 287
languages, but content is unevenly
distributed
Wikidata is cross-lingual
ArticlePlaceholders display
Wikidata triples as stubs for
articles in underserved
Wikipedia’s
Currently deployed in 11
Wikipedia’s
37. OUR STUDY
Enrich ArticlePlaceholders with textual
summaries generated from Wikidata
triples
Train a neural network to generate one
sentence summaries resembling the
opening paragraph of a Wikipedia
article
Test the approach on two languages,
Esperanto and Arabic with readers and
editors of those Wikipedia’s
38. RESEARCH QUESTIONS
RQ1 Can we automatically generate summaries
that match the quality and feel of Wikipedia in
different languages?
RQ2 Are summaries useful for the communities
editing underserved Wikipedia’s?
39. APPROACH
NEURAL NETWORK TRAINED ON WIKIDATA/WIKIPEDIA
Feed-forward architecture
encodes triples from the
ArticlePlaceholder into vector of
fixed dimensionality
RNN-based decoder generates
text summaries, one token at a
time
Optimisations for different
entity verbalisations, rare
entities etc.
40. EVALUATION
AUTOMATIC EVALUATION
Trained on corpus of Wikipedia sentences and
corresponding Wikidata triples (205k Arabic;
102k Esperanto)
Tested against three baselines: machine
translation (MT) and template retrieval (TR, TRext)
Using standard metrics: BLEU, METEOR, ROUGEL
RQ1
41. EVALUATION
USER STUDIES
Two 15 days online surveys with readers and
editors of the Arabic and Esperanto Wikipedia’s
Readers survey
60 articles (30 ours, 15 news items, 15 Wikipedia summaries from the training
corpus)
Fluency: Is the text understandable and grammatically correct?
Appropriateness: Does the summary ‘feel’ like a Wikipedia article?
Editors survey
30 automatically generated summaries
Editors were asked to edit the article starting from our summary (2-3 sentences)
Measured the extent to which the summary was reused (Greedy String Tiling – GST
– metric)
RQ1
RQ2
44. LIMITATIONS AND
FUTURE WORK
No easy way to test whether
summaries would indeed
lead to more participation
on underserved Wikipedia’s
Wikidata itself needs more
multilingual labels
Ongoing Wikipedia study:
ask editors of Wikipedia
articles opportunistically to
add missing labels of
relevant Wikidata items
and properties
46. SUMMARY OF FINDINGS
Collaboration between human and bots is important
Tools needed to identify tasks for bots and continuously
study their effects on outcomes and community
Quality is a complex concept, we studied only a subset of
aspects
References are high quality, though biases exist in terms of
choice of sources
Automatically created content is useful to editors of
underserved Wikipedia’s