SlideShare une entreprise Scribd logo
1  sur  58
Télécharger pour lire hors ligne
Enhancing diversity in the content creation process
                   at Wikipedia
                           WP5
        Mathias Schindler (WIKI), Delia Rusu (JSI),
                   Fabian Flöck (KIT)
                     Project review Y1
              Luxembourg, 1st of December, 2011
Agenda

1.   Overview
2.   Problem scenarios
3.   Solution including reuse of R&D results
4.   Mockup demo
5.   Outlook




 01/12/2011               www.render-project.eu   2
Overview
   Problem scenarios
   Solution including reuse of R&D results
   Mockup demo
   Outlook


   Overview




01/12/2011                               www.render-project.eu   3
Overview

o The goal of Wikimedia’s case study is to support Wikipedia
  editors in maintaining and improving the site, and to support
  readers in understanding the quality and biases of a given
  article.

o We are creating tools and extensions to support editors in
  the management, understanding, and decision-making
  about complex and heated controversies on Wikipedia.

o We want Wikipedia to offer high quality articles on both
  highly visible and as well as on more obscure topics.




 01/12/2011               www.render-project.eu                4
Diversity challenge (1) – authorship

o Not everyone contributes to Wikipedia
o Standard authorship has a strong spot at
   • Male
   • Academic background
   • 25-50 years of age
   • Developed countries
o Bias is a vicious circle
o Wikimedia Foundation targets for 2015 include:
„ Support healthy diversity in the editing community by doubling the
percentage of female editors to 25 percent and increasing the percentage
of Global South editors to 37 percent“

 01/12/2011                   www.render-project.eu                    5
Diversity challenge (2) – lemma selection

o Defining the scope of Wikipedia boils down to editorial
  decisions such as: An article for every episode and
  character and location of "The Simpsons" vs. a single article
  on the entire TV series.
o These decisions will influence:
   • the audience composition
   • the authorship recruitment
   • the internal and external perception of the project




 01/12/2011                   www.render-project.eu         6
Diversity challenge (3) – data inconsistency

o Variations in the number of „people of Muslim faith“

    Lang                     Lemma                         Figure (in bn)
    EN        Islam                                             Over 1.5
    EN        Major religious groups                            1.3 – 1.65
    EN        Claims to be fastest-growing religion               1.57
    HE        Islam                                                1.4
    LB        Islam                                             1.1 – 1.5
    ID        Islam                                             1.25 – 1.4

o Variations in borders drawn on maps in various Wikipedia
  language editions (Kashmir, Sea of Japan)


 01/12/2011                             www.render-project.eu                7
Diversity challenge (4) – editor behaviour



o Certain editing behaviours
  can lead to biased articles
   • e.g. a dominant editor
     group in an article that
     wins an edit war,
     'pushing out' minority
     views
o Newcomers and
  'outsiders' to an article can
  encounter problems
  adding content, especially              History flow project, IBM 2005



  mayor changes
 01/12/2011                    www.render-project.eu                       8
Work done in the first year

o Definition of use case scenarios
o Collection of existing approaches in quantitative metrics for
  quality assessment
o Collection of bias-inducing editor behavior patterns and
  development of methods to the detect them
o Metric definition in order to evaluate the development of
  Wikipedia article quality
o Engagement with the Wikipedian and Wikimedian
  community to explain the scope and the public benefit of the
  RENDER project
o Participation in Wikimedia community events to outline the
  current state of the RENDER project and to invite feedback
  from scientifically minded authors
 01/12/2011               www.render-project.eu             9
Overview
   Problem scenarios
   Solution including reuse of R&D results
   Mockup demo
   Outlook


   Problem scenarios




01/12/2011                          www.render-project.eu   10
Problem scenarios

Main Goal:
    • Improvement of the quality, the value and the
      trustworthiness of Wikipedia by supporting Wikipedia
      users (readers and editors)


Use Case Scenarios:
    • UC1: Display warnings to the reader when detecting bias
    • UC2: Notify authors that an article needs to be updated
    • UC3: Lower the barrier for readers to extend and/or
           correct articles


 01/12/2011               www.render-project.eu              11
UC1: Bias detection and notification



o A regular visitor to Wikipedia opts into a tool that will display
  warnings whenever an article is shown with detected bias.

o The user is given a summary of the detected bias and
  detailed information on how the bias warning was triggered.

o The user is now in a position to engage in article editing to
  fix or amend the article in order to improve its quality and
  remove the biased parts.




 01/12/2011                 www.render-project.eu               12
UC2: Notification framework



o A retired professor of linguistics with advanced Wikipedia
  expertise and good standing as an author has committed
  herself into maintaining the entire topic in Wikipedia.

o Using a dedicated tool, she is given a list of articles that
  show signs of being outdated, incomplete or biased.

o The professor is now able to maximise impact of her work,
  focussing on the most deserving articles in her field of
  expertise.



 01/12/2011                www.render-project.eu                 13
UC3: Lowering the barrier



o A student has a long time history of passively reading
  Wikipedia articles in the course of his studies.

o Dedicated tools are now providing him with information to
  understand which facts are missing in the article and are
  offering him resources that contain the missing information

o The user is now given a clear path to turn passive
  involvement into active and productive participation




 01/12/2011               www.render-project.eu            14
Overview
   Problem scenarios
   Solution including reuse of R&D results
   Mockup demo
   Outlook


   Overview of solutions including
   reuse of R&D results



01/12/2011                       www.render-project.eu   15
Overview of solutions including reuse of R&D results




01/12/2011                 www.render-project.eu             16
Metrics and validation (1)



                                    Evaluation of results from
                                    Enrycher, behavioral analysis
                                    and others




 2 approaches to use Wikipedia’s assessment expertise:
       • assessment survey with Wikipedia users
       • analysis of templates and the results of WMF
         Article Feedback Tool

01/12/2011                www.render-project.eu                 17
Metrics and validation (2) - Content

o Analysis of Wikipedia’s content development and quality

    • Fact coverage/ completeness:
           Article length (number of words) compared to articles in other
            language versions
           Number of articles which
               have a bigger fact coverage compared to other language versions
               have a lack of facts compared to at least one external source like a
                news article
    • Timeliness
    • Objectivity



 01/12/2011                        www.render-project.eu                         18
Metrics and validation (2) - Content

o Analysis of Wikipedia’s content development and quality

    • Fact coverage/ completeness
    • Timeliness:
           Number of edits per day compared to the average
           Number of articles which are
               not reverted during the last day/week
               without reverts but high editing in at least five language versions
               out‐dated compared to publishing dates of external sources (at
                least five days older)
    • Objectivity


 01/12/2011                        www.render-project.eu                              19
Metrics and validation (2) - Content

o Analysis of Wikipedia’s content development and quality

    • Fact coverage/ completeness:
    • Timeliness
    • Objectivity:
           Number of articles:
               containing subjective words or expressions
               identified as opinionated by JSI’s algorithms
               classified as opinionated by containing biased references




 01/12/2011                       www.render-project.eu                     20
Metrics and validation (3) -
                              Editor behavior

o Analysis of article-based editor behavior patterns and their
  development

    • Existence of editing patterns indicating bias:
           Measured based on
               Correlations of the chance of an edit getting reverted with editor,
                edit and article features
               Social editor network metrics like centrality, clustering, density, etc.
               Specific combination of behavioral mechanisms detected




 01/12/2011                         www.render-project.eu                            21
Overview of solutions including reuse of R&D results




01/12/2011                 www.render-project.eu             22
Diversity mining services – import pipeline



    Example for the import pipeline procedure:

                API request: “Kalmar”




01/12/2011                         www.render-project.eu   23
Diversity mining services – import pipeline


    '''Kalmar''' is a [[cities of Sweden|city]] in [[Småland]] in the south-east
    of [[Sweden]], situated by the [[Baltic Sea]]. It had 62,767 inhabitants
    in 2010<ref name="scb" /> and is the seat of [[Kalmar Municipality]]. It
    is also the capital of [[Kalmar County]], which comprises 12
    municipalities with a total of 233,776 inhabitants (2006).
    ...




01/12/2011                         www.render-project.eu                           24
Diversity mining services – import pipeline


     Kalmar is a city in Småland in the south-east of Sweden, situated by
     the Baltic Sea. It had 62,767 inhabitants in 2010 and is the seat of
     Kalmar Municipality. It is also the capital of Kalmar County, which
     comprises 12 municipalities with a total of 233,776 inhabitants (2006).
     ....




01/12/2011                       www.render-project.eu                         25
Diversity mining services – import pipeline
     ...
     <rdf:Description rdf:about="urn:document-3cbc5995-1679-4dad-9d2c-87bffb9bb69f">
     <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Document"/>
     <dmoz:topic rdf:resource="http://www.dmoz.org/Top/Regional/Europe/Sweden/Kalmar_Co
     unty/Localities/Kalmar"/>
     <dmoz:topic rdf:resource="http://www.dmoz.org/Top/Reference/Museums/Transportation/
     Maritime/Europe/Sweden"/>
     <dmoz:topic rdf:resource="http://www.dmoz.org/Top/Regional/Europe/Sweden/Maps_and
     _Views"/>
     ...




01/12/2011                            www.render-project.eu                                26
Diversity mining services – import pipeline
     ...
     <rdf:Description rdf:about="urn:document-3cbc5995-1679-4dad-9d2c-87bffb9bb69f">
     <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Document"/>
     <dmoz:topic rdf:resource="http://www.dmoz.org/Top/Regional/Europe/Sweden/Kalmar_Co
     unty/Localities/Kalmar"/>
     <dmoz:topic rdf:resource="http://www.dmoz.org/Top/Reference/Museums/Transportation/
     Maritime/Europe/Sweden"/>
     <dmoz:topic rdf:resource="http://www.dmoz.org/Top/Regional/Europe/Sweden/Maps_and
     _Views"/>
     ...




01/12/2011                            www.render-project.eu                                27
Overview of solutions including reuse of R&D results




01/12/2011                 www.render-project.eu             28
Overview
   Problem scenarios
   Solution including reuse of R&D results
   Mockup demo
   Outlook


   Opinionated Wikipedia Articles

     JSI


01/12/2011                       www.render-project.eu   29
Opinionated Wikipedia articles

o Analyzing opinionated Wikipedia articles
   • With the neutrality template set
   • 2 versions of the article (old and new in the dataset)
   • Condition: neutrality template not reset for 7 days after
     deletion
   • ~ 2GB xml file with opinionated / non-opinionated articles

    • 20,630 opinionated articles
    • 18,719,338 articles in total
    (these counts exclude Wikipedia infrastructure articles)


 01/12/2011               www.render-project.eu                30
Opinionated Wikipedia articles

                          Wikipedia Neutrality Articles
           20000
           18000
                                                 17845
           16000
           14000
Articles




           12000
                                                          Number of articles
           10000
                                                          with neutrality tag
           8000                                           changed
           6000
           4000
           2000                          2540
                   1      25     219
               0
                   38    20--34 10--20   4--10     2
01/12/2011              Neutrality tag changes                                  31
Top 20 Articles Based on the Neutrality Tag Change




                                                                        Opinionated Wikipedia articles
    September 11 attacks - 38              Muhammad - 24

    George W. Bush - 34                    Jesus - 24

    Intelligent design - 32                Israel - 24

    Global warming - 32                    Arab Israeli conflict - 24

    Race and intelligence - 28             Zionism - 22

    Circumcision - 28                      Srebrenica massacre - 22

    Armenian Genocide – 28                 Islamophobia - 22
    Macedonians (ethnic group)
                                           Homeopathy - 22
    - 26
    Iraq War - 26                          Holocaust denial - 22

    The Holocaust - 24                     Evolution - 22


01/12/2011                       www.render-project.eu
Learning opinionated Wikipedia articles




                                Opinionated
                                                    Article
                                Topics (DMOZ)
                                                    Reference
                                                    Changes
                                                                Learning
                                                                Features
Wikipedia
Opinionated                          Entities
Dataset                                             Links to
                                                    Other
                                                    Articles

                              Part of Speech Tags      …
01/12/2011                  www.render-project.eu                 33
Learning opinionated Wikipedia articles




                                        o ~ 10,000 opinionated
                                          articles
                                        o Applied Enrycher services:
Wikipedia                                  • POS tagging
                          Opinionated
Opinionated               Topics (DMOZ)    • DMOZ
Dataset
                                                       Topics
                                                       Keywords




  01/12/2011                  www.render-project.eu                34
Opinionated topics

                                      Article Changes for Top Topics

                                                                         'Regional North_America United_States'


                                                                         'Regional Europe United_Kingdom'
                             2%2%1%
                         2%2%
                       2%                           20%
                     2%                                                  'Society Religion_and_Spirituality
                   2%                                                    Christianity'
                 2%
               2%                                                        'Reference Education
              2%                                                         Colleges_and_Universities'
             2%                                                          'Regional Asia India'
             3%
                                                                         'Society'
             3%
                                                              13%
             3%                                                          'Arts'

              3%
                                                                         'Regional North_America Canada'
                  3%

                       4%                              10%               'Society Religion_and_Spirituality Islam'
                            4%
                                 4%         6%                           'Society Issues Warfare_and_Conflict'
                                       4%

                                                                         'Society History By_Time_Period'


      * Keywords for which the number of article changes is greater than 500
01/12/2011                                       www.render-project.eu                                               35
Opinionated keywords

                                    Article Changes for Top Keywords

                                                                                'Regional'
                                                                                'Society'
                                         2%   2% 2%2%                           'North_America'
                                    2%                       15%
                           2%                                                   'United_States'
                         2%
                       2%                                                       'Europe'
                     2%                                                         'Society_and_Culture'
                 2%                                                             'Arts'
                                                                         11%
                2%
                                                                                'Religion_and_Spirituality'
               2%
                                                                                'Science'
                3%                                                              'United_Kingdom'
                3%                                                       7%     'Business'
                     3%                                                         'Computers'

                          3%                                        6%          'History'

                               3%                                               'Asia'
                                     4%                       5%
                                                                                'Education'
                                                5%      5%
                                                                                'Issues'
                                                                                'Government'


     * Keywords for which the number of article changes is greater than 500
01/12/2011                                              www.render-project.eu                                 36
Examples of changes within a topic

o keyword-based changes:
   • article - Megleno-Romanians
   • topics - Regional, Europe, Romania,
     Society_and_Culture, Organizations
   • […] Vlahi is a disputed exonym […]
o reference additions:
   • article - Soviet occupation of Romania
   • topics - Regional, Europe, Romania
   • […] Sergiu Verona, "Military Occupation and Diplomacy:
     Soviet Troops in Romania, 1944-1958", Duke University
     Press […]

 01/12/2011                www.render-project.eu         37
Learning opinionated Wikipedia articles

o Next steps in learning opinionated articles:
   • extract the remaining learning features – entities, article
     references, article links, etc.
o Dealing with scale:
   • process whole Wikipedia articles
        ~ 30 TB of data
   • extract Wikipedia social network
   • obtain a Wikipedia static and dynamic profile for each
     contributor/community
Overview
   Problem scenarios
   Solution including reuse of R&D results
   Mockup demo
   Outlook


   Detecting bias-inducing editor behavior
   to generate warnings

   KIT

01/12/2011                       www.render-project.eu   39
Bias detection via editing behavior in Wikipedia



o Relevant use case:
   • UC1: Displaying warnings when detecting patterns of
     bias

o Intention:
   • Help in understanding
     and curing behavioral
     causes for bias

o Needed:
   • Understanding and prediction of socio-technical
     mechanisms leading to biases
 01/12/2011                      www.render-project.eu           40
Bias warning examples

Examples for behavioural-pattern-based warnings presented to users:


•Concentration: 98% of the article were written by 3% of the active editors
in the article. The resulting concentration coefficient is of 9 of 10. Usual
coefficient for similar articles is 5 of 10. Find out what that means and what
you can do to help.


•Homogeneity: We detected a very fractioned editor structure with 80% of
non-vandal edits being reverts and 3 major editor camps. Click here for
explanation and visualization. Find out what that means and what you can
do to help.




 01/12/2011                     www.render-project.eu                      41
Identify crucial editing patterns

o Research review
   • Identified existing patterns of socio-technical
     mechanisms potentially influencing bias and diversity
             Social proof and consolidation
             Ownership behavior
             Opinion camps and editor drop-out
             Lack of boldness and useful conflicts
             etc.

           see paper “Towards a diversity-minded Wikipedia” and extended
           literature survey to be published




 01/12/2011                       www.render-project.eu              42
Identify crucial editing patterns

Example patterns: Opinion Camp drop-out
Extract useful data to enable modeling



o Define metrics to find patterns in the data typical for the
  mechanisms
   • For example: A dense core group of editors in an article’s
     social network structure

    How to get the data for using these metrics?




 01/12/2011                  www.render-project.eu          44
Reverts as the basis for accurately modeling user
                      behavior in Wikipedia

o Most telling: editors’ actions which are related to each other
  • Reverting is undoing; contradicting actions perceived as
    false
  • Inferences possible without knowing meaning
           Example:
                Edit   Content                             Added/deleted content
                No.
                1      “zzz”
                2      “zzz yyy”                           +“yyy”
                3      “zzz”                               -”yyy”

   Foundation for the search for and analysis of most of the
  recurring editing patterns that are typical for biased articles
 01/12/2011                        www.render-project.eu                           45
State-of-the-art revert detection

Simple identity revert method using MD5 hashes
   Edit       Article content     Words                     MD5 Hash       Detected identic and
   Number                         deleted/added             (simplified)   reverted revisions
                                  (actions taken) by
                                  edit
   1          Zero                (ignored for this         Hash1          Like revision 5
                                  example)
   2          Zero Apple Banana   +“Apple” +”Banana”        Hash2          Reverted by revision 5


   3          Zero Apple Banana   +”Coconut” +”Date” Hash3                 Reverted by revision 5
              Coconut Date
   4          Zero Coconut Date   -“Apple”                  Hash4          Reverted by revision 5
                                  - “Banana”
   5          Zero                -“Coconut”                Hash1          Like revision 1 
                                  - “Date”                                 revert of revisions
                                                                           2,3,4

 01/12/2011                         www.render-project.eu                                        46
Deficiencies of the state-of-the-art

o Partly reverts exist
o Reverts do not always produce duplicate revisions
   Edit   Revision content      Words deleted/added           MD5 Hash       Detected identic and
   Number                       (actions taken) by edit       (simplified)   reverted revisions

   1          Zero              (ignored for this             Hash1          Like revision 5
                                example)
   2          Zero Apple Banana +“Apple” +”Banana”            Hash2          Reverted by revision 5


   3          Zero Apple Banana +”Coconut” +”Date”            Hash3          Reverted by revision 5
              Coconut Date
   4          Zero Coconut Date -“Apple”                      Hash4          Reverted by revision 5
                                - “Banana”
   5          Zero              -“Coconut”                    Hash1          Like revision 1 
                                - “Date”                                     revert of revisions
                                                                             2,3,4

 01/12/2011                           www.render-project.eu                                        47
An improved revert detection method



o A revert is defined by Wikipedia as an action of an editor
  “undoing the effects of one or more edits” and “(m)ore
  broadly, reverting may also refer to any action that in whole
  or in part reverses the actions of other editors.”

o Clear definition, taking into account Wikipedia definition,
  known intentional behavior & available data:

         An edit A is reverted if all of the actions of that edit are
         completely undone in one subsequent edit B. Edit B has
         then reverted edit A.

 01/12/2011                   www.render-project.eu               48
Improved method - implementation

Edit Revision content   Words                MD5 Hash       Content list    Content list   Detected
No.                     deleted/added        (simplified)   (contains       differences    reverts
                        (actions taken) by                  revision No.)
                        edit
1    Zero               (ignored for this    Hash1          1               +1
                        example)
2    Zero Apple Banana +“Apple”              Hash2          1;2             +2             Reverted by 4
                       +”Banana”
3    Zero Apple Banana +”Coconut”            Hash3          1;2;3           +3             Reverted by 5
     Coconut Date      +”Date”

4    Zero Coconut Date -“Apple”              Hash4          1;3             -2             Reverting 2
                        - “Banana”

5    Zero               -“Coconut”           Hash1          1               -3             Reverting 3
                        - “Date”
6    Zero Fig           +”Fig”               Hash5          1;6             +6             Reverted by 8

7    Zero Grape         +”Grape”             Hash6          1;6;7           +7             Reverted by 8

8    Zero Huckleberry   -“Fig” -“Grape”      Hash7          1;8             -6; -7         Reverting 6,7
                        +”Huckleberry”                                                                49
Improved method - results

o Survey evaluation: Accuracy is much higher for new method
   • Significantly less false positives
   • Can accurately distinguish between full and partial
     reverts

o 12% more reverts detected with the new method than with
  identity reverts
   • Up to 50% more in short articles

o First revert detection evaluated to work according to the
  Wikipedia definition and to editors’ idea of a revert  better
  reflects actual behavior and relations  key to precisely
  modeling the social editing dynamics
 01/12/2011               www.render-project.eu               50
Overview
   Problem scenarios
   Solution including reuse of R&D results
   Mockup demo
   Outlook


   Mockup demo




01/12/2011                               www.render-project.eu   51
Mock-ups – tools for the case study



Tools to support readers/ editors/ administrators:

o Quality overview of Wikipedia articles for readers
o Generation of working lists
  • for Wikipedia editors concerning problems of the content
  • Wikipedia administrators concerning editor behaviour
    and interaction




 01/12/2011                www.render-project.eu         52
Mock-ups (1) - QAO




01/12/2011        www.render-project.eu   53
Mock-ups (2) - working lists




01/12/2011             www.render-project.eu   54
Overview
   Problem scenarios
   Solution including reuse of R&D results
   Mockup demo
   Outlook


   Outlook




01/12/2011                               www.render-project.eu   55
Outlook (1)



o Testing and evaluation of R&D results for Wikipedia
o Development of prototypes for user supporting tools using
  these results
o Evaluating and testing of these prototypes with Wikipedia
  users, in the first step of the German community
o Collecting feedback and building up guidelines




 01/12/2011              www.render-project.eu            56
Outlook (2)

Roadmap to develop supporting tools for Wikipedia users:


    M 18      • Development of prototypes



                M 20     • Testing the prototypes with a small user
                           group like a German WikiProject


                              M 22      • Adjusting and finalising



                                                           • Apply to other language
                                              M 24 +         versions


 01/12/2011                        www.render-project.eu                               57
Questions & comments
       Thanks

Contenu connexe

Similaire à Wiki case study - Review year 1

Enrichment of multilingual Wikipedia based on quality analysis
Enrichment of multilingual Wikipedia based on quality analysisEnrichment of multilingual Wikipedia based on quality analysis
Enrichment of multilingual Wikipedia based on quality analysisWłodzimierz Lewoniewski
 
Summary of Day 1
Summary of Day 1Summary of Day 1
Summary of Day 1Europeana
 
Semtech web-protege-tutorial
Semtech web-protege-tutorialSemtech web-protege-tutorial
Semtech web-protege-tutorialmatthewhorridge
 
Designing Infrastructures Allowing Higher Education Teachers to Reuse, Adapt ...
Designing Infrastructures Allowing Higher Education Teachers to Reuse, Adapt ...Designing Infrastructures Allowing Higher Education Teachers to Reuse, Adapt ...
Designing Infrastructures Allowing Higher Education Teachers to Reuse, Adapt ...NadineSchrder3
 
Innovation for Europeana - Europeana v2.0 WP7
Innovation for Europeana - Europeana v2.0 WP7Innovation for Europeana - Europeana v2.0 WP7
Innovation for Europeana - Europeana v2.0 WP7Max Kaiser
 
IMC2022_Wikipedia for Science_for weADAPT.pptx
IMC2022_Wikipedia for Science_for weADAPT.pptxIMC2022_Wikipedia for Science_for weADAPT.pptx
IMC2022_Wikipedia for Science_for weADAPT.pptxweADAPT
 
Doc.next - The Future of the Documentation Project
Doc.next - The Future of the Documentation ProjectDoc.next - The Future of the Documentation Project
Doc.next - The Future of the Documentation ProjectAlexandro Colorado
 
Open access at_the_world_bank-unlinks
Open access at_the_world_bank-unlinksOpen access at_the_world_bank-unlinks
Open access at_the_world_bank-unlinksBeta-Research.org
 
The Avalon Media System: Implementation and Community
The Avalon Media System: Implementation and CommunityThe Avalon Media System: Implementation and Community
The Avalon Media System: Implementation and CommunityAvalon Media System
 
The Europeana Newspapers Presentation - Cyberspace 2012
The Europeana Newspapers Presentation - Cyberspace 2012The Europeana Newspapers Presentation - Cyberspace 2012
The Europeana Newspapers Presentation - Cyberspace 2012Europeana Newspapers
 
2014-02-27 Wikidata talk Cambridge
2014-02-27 Wikidata talk Cambridge2014-02-27 Wikidata talk Cambridge
2014-02-27 Wikidata talk CambridgeMagnus Manske
 
Non-software OSS projects
Non-software OSS projectsNon-software OSS projects
Non-software OSS projectsguest214454
 
Predicting and Preparing For Emerging Learning Technologies
Predicting and Preparing For Emerging Learning TechnologiesPredicting and Preparing For Emerging Learning Technologies
Predicting and Preparing For Emerging Learning Technologies lisbk
 
Building Bridges Not Walls - Wikipedia's new Content Translation tool
Building Bridges Not Walls - Wikipedia's new Content Translation toolBuilding Bridges Not Walls - Wikipedia's new Content Translation tool
Building Bridges Not Walls - Wikipedia's new Content Translation toolEwan McAndrew
 
Publishing Scientific Research & How to Write High-Impact Research Papers
Publishing Scientific Research & How to Write High-Impact Research Papers Publishing Scientific Research & How to Write High-Impact Research Papers
Publishing Scientific Research & How to Write High-Impact Research Papers jjuhlrich
 
Wikipedia: Training the Trainer
Wikipedia: Training the TrainerWikipedia: Training the Trainer
Wikipedia: Training the Trainerlisbk
 
20101112 librinnovando liquidpub
20101112 librinnovando liquidpub20101112 librinnovando liquidpub
20101112 librinnovando liquidpubAliaksandr Birukou
 
Librinnovando 2010: "L’editoria Scientifica fra digitale, OA, valutazione, nu...
Librinnovando 2010: "L’editoria Scientifica fra digitale, OA, valutazione, nu...Librinnovando 2010: "L’editoria Scientifica fra digitale, OA, valutazione, nu...
Librinnovando 2010: "L’editoria Scientifica fra digitale, OA, valutazione, nu...Nicola Cavalli
 
Sw 3 bizer etal-d bpedia-crystallization-point-jws-preprint
Sw 3 bizer etal-d bpedia-crystallization-point-jws-preprintSw 3 bizer etal-d bpedia-crystallization-point-jws-preprint
Sw 3 bizer etal-d bpedia-crystallization-point-jws-preprintokeee
 

Similaire à Wiki case study - Review year 1 (20)

Enrichment of multilingual Wikipedia based on quality analysis
Enrichment of multilingual Wikipedia based on quality analysisEnrichment of multilingual Wikipedia based on quality analysis
Enrichment of multilingual Wikipedia based on quality analysis
 
Summary of Day 1
Summary of Day 1Summary of Day 1
Summary of Day 1
 
Semtech web-protege-tutorial
Semtech web-protege-tutorialSemtech web-protege-tutorial
Semtech web-protege-tutorial
 
Designing Infrastructures Allowing Higher Education Teachers to Reuse, Adapt ...
Designing Infrastructures Allowing Higher Education Teachers to Reuse, Adapt ...Designing Infrastructures Allowing Higher Education Teachers to Reuse, Adapt ...
Designing Infrastructures Allowing Higher Education Teachers to Reuse, Adapt ...
 
Innovation for Europeana - Europeana v2.0 WP7
Innovation for Europeana - Europeana v2.0 WP7Innovation for Europeana - Europeana v2.0 WP7
Innovation for Europeana - Europeana v2.0 WP7
 
IMC2022_Wikipedia for Science_for weADAPT.pptx
IMC2022_Wikipedia for Science_for weADAPT.pptxIMC2022_Wikipedia for Science_for weADAPT.pptx
IMC2022_Wikipedia for Science_for weADAPT.pptx
 
Doc.next - The Future of the Documentation Project
Doc.next - The Future of the Documentation ProjectDoc.next - The Future of the Documentation Project
Doc.next - The Future of the Documentation Project
 
XC
XC XC
XC
 
Open access at_the_world_bank-unlinks
Open access at_the_world_bank-unlinksOpen access at_the_world_bank-unlinks
Open access at_the_world_bank-unlinks
 
The Avalon Media System: Implementation and Community
The Avalon Media System: Implementation and CommunityThe Avalon Media System: Implementation and Community
The Avalon Media System: Implementation and Community
 
The Europeana Newspapers Presentation - Cyberspace 2012
The Europeana Newspapers Presentation - Cyberspace 2012The Europeana Newspapers Presentation - Cyberspace 2012
The Europeana Newspapers Presentation - Cyberspace 2012
 
2014-02-27 Wikidata talk Cambridge
2014-02-27 Wikidata talk Cambridge2014-02-27 Wikidata talk Cambridge
2014-02-27 Wikidata talk Cambridge
 
Non-software OSS projects
Non-software OSS projectsNon-software OSS projects
Non-software OSS projects
 
Predicting and Preparing For Emerging Learning Technologies
Predicting and Preparing For Emerging Learning TechnologiesPredicting and Preparing For Emerging Learning Technologies
Predicting and Preparing For Emerging Learning Technologies
 
Building Bridges Not Walls - Wikipedia's new Content Translation tool
Building Bridges Not Walls - Wikipedia's new Content Translation toolBuilding Bridges Not Walls - Wikipedia's new Content Translation tool
Building Bridges Not Walls - Wikipedia's new Content Translation tool
 
Publishing Scientific Research & How to Write High-Impact Research Papers
Publishing Scientific Research & How to Write High-Impact Research Papers Publishing Scientific Research & How to Write High-Impact Research Papers
Publishing Scientific Research & How to Write High-Impact Research Papers
 
Wikipedia: Training the Trainer
Wikipedia: Training the TrainerWikipedia: Training the Trainer
Wikipedia: Training the Trainer
 
20101112 librinnovando liquidpub
20101112 librinnovando liquidpub20101112 librinnovando liquidpub
20101112 librinnovando liquidpub
 
Librinnovando 2010: "L’editoria Scientifica fra digitale, OA, valutazione, nu...
Librinnovando 2010: "L’editoria Scientifica fra digitale, OA, valutazione, nu...Librinnovando 2010: "L’editoria Scientifica fra digitale, OA, valutazione, nu...
Librinnovando 2010: "L’editoria Scientifica fra digitale, OA, valutazione, nu...
 
Sw 3 bizer etal-d bpedia-crystallization-point-jws-preprint
Sw 3 bizer etal-d bpedia-crystallization-point-jws-preprintSw 3 bizer etal-d bpedia-crystallization-point-jws-preprint
Sw 3 bizer etal-d bpedia-crystallization-point-jws-preprint
 

Plus de RENDER project

Diversiweb2011 02 Opening- Devika P. Madalli
Diversiweb2011 02 Opening- Devika P. MadalliDiversiweb2011 02 Opening- Devika P. Madalli
Diversiweb2011 02 Opening- Devika P. MadalliRENDER project
 
Diversiweb2011 08 Mining Diverse Views from Related Articles - Ravali Pochamp...
Diversiweb2011 08 Mining Diverse Views from Related Articles - Ravali Pochamp...Diversiweb2011 08 Mining Diverse Views from Related Articles - Ravali Pochamp...
Diversiweb2011 08 Mining Diverse Views from Related Articles - Ravali Pochamp...RENDER project
 
Diversiweb2011 07 Approximate subgraph matching - Mitja Trampus
Diversiweb2011 07 Approximate subgraph matching - Mitja TrampusDiversiweb2011 07 Approximate subgraph matching - Mitja Trampus
Diversiweb2011 07 Approximate subgraph matching - Mitja TrampusRENDER project
 
Diversiweb2011 06 Faceted Approach To Diverse Query Processing - Devika P. Ma...
Diversiweb2011 06 Faceted Approach To Diverse Query Processing - Devika P. Ma...Diversiweb2011 06 Faceted Approach To Diverse Query Processing - Devika P. Ma...
Diversiweb2011 06 Faceted Approach To Diverse Query Processing - Devika P. Ma...RENDER project
 
Diversiweb2011 05 Scalable Detection of Sentiment-Based Contradictions - Mika...
Diversiweb2011 05 Scalable Detection of Sentiment-Based Contradictions - Mika...Diversiweb2011 05 Scalable Detection of Sentiment-Based Contradictions - Mika...
Diversiweb2011 05 Scalable Detection of Sentiment-Based Contradictions - Mika...RENDER project
 
Diversiweb2011 04 Expressing Opinion Diversity - Delia Rusu
Diversiweb2011 04 Expressing Opinion Diversity - Delia RusuDiversiweb2011 04 Expressing Opinion Diversity - Delia Rusu
Diversiweb2011 04 Expressing Opinion Diversity - Delia RusuRENDER project
 
Diversiweb2011 03 Towards a Knowledge Diversity Model - Denny Vrandecic
Diversiweb2011 03 Towards a Knowledge Diversity Model - Denny VrandecicDiversiweb2011 03 Towards a Knowledge Diversity Model - Denny Vrandecic
Diversiweb2011 03 Towards a Knowledge Diversity Model - Denny VrandecicRENDER project
 
Diversiweb2011 01 Opening - Elena Simperl
Diversiweb2011 01 Opening - Elena SimperlDiversiweb2011 01 Opening - Elena Simperl
Diversiweb2011 01 Opening - Elena SimperlRENDER project
 
Data Collection and Integration, Linked Data Management
Data Collection and Integration, Linked Data ManagementData Collection and Integration, Linked Data Management
Data Collection and Integration, Linked Data ManagementRENDER project
 
Render Project introduction and overview
Render Project introduction and overviewRender Project introduction and overview
Render Project introduction and overviewRENDER project
 

Plus de RENDER project (12)

Diversiweb2011 02 Opening- Devika P. Madalli
Diversiweb2011 02 Opening- Devika P. MadalliDiversiweb2011 02 Opening- Devika P. Madalli
Diversiweb2011 02 Opening- Devika P. Madalli
 
Diversiweb2011 08 Mining Diverse Views from Related Articles - Ravali Pochamp...
Diversiweb2011 08 Mining Diverse Views from Related Articles - Ravali Pochamp...Diversiweb2011 08 Mining Diverse Views from Related Articles - Ravali Pochamp...
Diversiweb2011 08 Mining Diverse Views from Related Articles - Ravali Pochamp...
 
Diversiweb2011 07 Approximate subgraph matching - Mitja Trampus
Diversiweb2011 07 Approximate subgraph matching - Mitja TrampusDiversiweb2011 07 Approximate subgraph matching - Mitja Trampus
Diversiweb2011 07 Approximate subgraph matching - Mitja Trampus
 
Diversiweb2011 06 Faceted Approach To Diverse Query Processing - Devika P. Ma...
Diversiweb2011 06 Faceted Approach To Diverse Query Processing - Devika P. Ma...Diversiweb2011 06 Faceted Approach To Diverse Query Processing - Devika P. Ma...
Diversiweb2011 06 Faceted Approach To Diverse Query Processing - Devika P. Ma...
 
Diversiweb2011 05 Scalable Detection of Sentiment-Based Contradictions - Mika...
Diversiweb2011 05 Scalable Detection of Sentiment-Based Contradictions - Mika...Diversiweb2011 05 Scalable Detection of Sentiment-Based Contradictions - Mika...
Diversiweb2011 05 Scalable Detection of Sentiment-Based Contradictions - Mika...
 
Diversiweb2011 04 Expressing Opinion Diversity - Delia Rusu
Diversiweb2011 04 Expressing Opinion Diversity - Delia RusuDiversiweb2011 04 Expressing Opinion Diversity - Delia Rusu
Diversiweb2011 04 Expressing Opinion Diversity - Delia Rusu
 
Diversiweb2011 03 Towards a Knowledge Diversity Model - Denny Vrandecic
Diversiweb2011 03 Towards a Knowledge Diversity Model - Denny VrandecicDiversiweb2011 03 Towards a Knowledge Diversity Model - Denny Vrandecic
Diversiweb2011 03 Towards a Knowledge Diversity Model - Denny Vrandecic
 
Diversiweb2011 01 Opening - Elena Simperl
Diversiweb2011 01 Opening - Elena SimperlDiversiweb2011 01 Opening - Elena Simperl
Diversiweb2011 01 Opening - Elena Simperl
 
Data Collection and Integration, Linked Data Management
Data Collection and Integration, Linked Data ManagementData Collection and Integration, Linked Data Management
Data Collection and Integration, Linked Data Management
 
Diversity toolkit
Diversity toolkitDiversity toolkit
Diversity toolkit
 
Defining Diversity
Defining DiversityDefining Diversity
Defining Diversity
 
Render Project introduction and overview
Render Project introduction and overviewRender Project introduction and overview
Render Project introduction and overview
 

Dernier

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 

Dernier (20)

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 

Wiki case study - Review year 1

  • 1. Enhancing diversity in the content creation process at Wikipedia WP5 Mathias Schindler (WIKI), Delia Rusu (JSI), Fabian Flöck (KIT) Project review Y1 Luxembourg, 1st of December, 2011
  • 2. Agenda 1. Overview 2. Problem scenarios 3. Solution including reuse of R&D results 4. Mockup demo 5. Outlook 01/12/2011 www.render-project.eu 2
  • 3. Overview Problem scenarios Solution including reuse of R&D results Mockup demo Outlook Overview 01/12/2011 www.render-project.eu 3
  • 4. Overview o The goal of Wikimedia’s case study is to support Wikipedia editors in maintaining and improving the site, and to support readers in understanding the quality and biases of a given article. o We are creating tools and extensions to support editors in the management, understanding, and decision-making about complex and heated controversies on Wikipedia. o We want Wikipedia to offer high quality articles on both highly visible and as well as on more obscure topics. 01/12/2011 www.render-project.eu 4
  • 5. Diversity challenge (1) – authorship o Not everyone contributes to Wikipedia o Standard authorship has a strong spot at • Male • Academic background • 25-50 years of age • Developed countries o Bias is a vicious circle o Wikimedia Foundation targets for 2015 include: „ Support healthy diversity in the editing community by doubling the percentage of female editors to 25 percent and increasing the percentage of Global South editors to 37 percent“ 01/12/2011 www.render-project.eu 5
  • 6. Diversity challenge (2) – lemma selection o Defining the scope of Wikipedia boils down to editorial decisions such as: An article for every episode and character and location of "The Simpsons" vs. a single article on the entire TV series. o These decisions will influence: • the audience composition • the authorship recruitment • the internal and external perception of the project 01/12/2011 www.render-project.eu 6
  • 7. Diversity challenge (3) – data inconsistency o Variations in the number of „people of Muslim faith“ Lang Lemma Figure (in bn) EN Islam Over 1.5 EN Major religious groups 1.3 – 1.65 EN Claims to be fastest-growing religion 1.57 HE Islam 1.4 LB Islam 1.1 – 1.5 ID Islam 1.25 – 1.4 o Variations in borders drawn on maps in various Wikipedia language editions (Kashmir, Sea of Japan) 01/12/2011 www.render-project.eu 7
  • 8. Diversity challenge (4) – editor behaviour o Certain editing behaviours can lead to biased articles • e.g. a dominant editor group in an article that wins an edit war, 'pushing out' minority views o Newcomers and 'outsiders' to an article can encounter problems adding content, especially History flow project, IBM 2005 mayor changes 01/12/2011 www.render-project.eu 8
  • 9. Work done in the first year o Definition of use case scenarios o Collection of existing approaches in quantitative metrics for quality assessment o Collection of bias-inducing editor behavior patterns and development of methods to the detect them o Metric definition in order to evaluate the development of Wikipedia article quality o Engagement with the Wikipedian and Wikimedian community to explain the scope and the public benefit of the RENDER project o Participation in Wikimedia community events to outline the current state of the RENDER project and to invite feedback from scientifically minded authors 01/12/2011 www.render-project.eu 9
  • 10. Overview Problem scenarios Solution including reuse of R&D results Mockup demo Outlook Problem scenarios 01/12/2011 www.render-project.eu 10
  • 11. Problem scenarios Main Goal: • Improvement of the quality, the value and the trustworthiness of Wikipedia by supporting Wikipedia users (readers and editors) Use Case Scenarios: • UC1: Display warnings to the reader when detecting bias • UC2: Notify authors that an article needs to be updated • UC3: Lower the barrier for readers to extend and/or correct articles 01/12/2011 www.render-project.eu 11
  • 12. UC1: Bias detection and notification o A regular visitor to Wikipedia opts into a tool that will display warnings whenever an article is shown with detected bias. o The user is given a summary of the detected bias and detailed information on how the bias warning was triggered. o The user is now in a position to engage in article editing to fix or amend the article in order to improve its quality and remove the biased parts. 01/12/2011 www.render-project.eu 12
  • 13. UC2: Notification framework o A retired professor of linguistics with advanced Wikipedia expertise and good standing as an author has committed herself into maintaining the entire topic in Wikipedia. o Using a dedicated tool, she is given a list of articles that show signs of being outdated, incomplete or biased. o The professor is now able to maximise impact of her work, focussing on the most deserving articles in her field of expertise. 01/12/2011 www.render-project.eu 13
  • 14. UC3: Lowering the barrier o A student has a long time history of passively reading Wikipedia articles in the course of his studies. o Dedicated tools are now providing him with information to understand which facts are missing in the article and are offering him resources that contain the missing information o The user is now given a clear path to turn passive involvement into active and productive participation 01/12/2011 www.render-project.eu 14
  • 15. Overview Problem scenarios Solution including reuse of R&D results Mockup demo Outlook Overview of solutions including reuse of R&D results 01/12/2011 www.render-project.eu 15
  • 16. Overview of solutions including reuse of R&D results 01/12/2011 www.render-project.eu 16
  • 17. Metrics and validation (1) Evaluation of results from Enrycher, behavioral analysis and others 2 approaches to use Wikipedia’s assessment expertise: • assessment survey with Wikipedia users • analysis of templates and the results of WMF Article Feedback Tool 01/12/2011 www.render-project.eu 17
  • 18. Metrics and validation (2) - Content o Analysis of Wikipedia’s content development and quality • Fact coverage/ completeness:  Article length (number of words) compared to articles in other language versions  Number of articles which  have a bigger fact coverage compared to other language versions  have a lack of facts compared to at least one external source like a news article • Timeliness • Objectivity 01/12/2011 www.render-project.eu 18
  • 19. Metrics and validation (2) - Content o Analysis of Wikipedia’s content development and quality • Fact coverage/ completeness • Timeliness:  Number of edits per day compared to the average  Number of articles which are  not reverted during the last day/week  without reverts but high editing in at least five language versions  out‐dated compared to publishing dates of external sources (at least five days older) • Objectivity 01/12/2011 www.render-project.eu 19
  • 20. Metrics and validation (2) - Content o Analysis of Wikipedia’s content development and quality • Fact coverage/ completeness: • Timeliness • Objectivity:  Number of articles:  containing subjective words or expressions  identified as opinionated by JSI’s algorithms  classified as opinionated by containing biased references 01/12/2011 www.render-project.eu 20
  • 21. Metrics and validation (3) - Editor behavior o Analysis of article-based editor behavior patterns and their development • Existence of editing patterns indicating bias:  Measured based on  Correlations of the chance of an edit getting reverted with editor, edit and article features  Social editor network metrics like centrality, clustering, density, etc.  Specific combination of behavioral mechanisms detected 01/12/2011 www.render-project.eu 21
  • 22. Overview of solutions including reuse of R&D results 01/12/2011 www.render-project.eu 22
  • 23. Diversity mining services – import pipeline Example for the import pipeline procedure: API request: “Kalmar” 01/12/2011 www.render-project.eu 23
  • 24. Diversity mining services – import pipeline '''Kalmar''' is a [[cities of Sweden|city]] in [[Småland]] in the south-east of [[Sweden]], situated by the [[Baltic Sea]]. It had 62,767 inhabitants in 2010<ref name="scb" /> and is the seat of [[Kalmar Municipality]]. It is also the capital of [[Kalmar County]], which comprises 12 municipalities with a total of 233,776 inhabitants (2006). ... 01/12/2011 www.render-project.eu 24
  • 25. Diversity mining services – import pipeline Kalmar is a city in Småland in the south-east of Sweden, situated by the Baltic Sea. It had 62,767 inhabitants in 2010 and is the seat of Kalmar Municipality. It is also the capital of Kalmar County, which comprises 12 municipalities with a total of 233,776 inhabitants (2006). .... 01/12/2011 www.render-project.eu 25
  • 26. Diversity mining services – import pipeline ... <rdf:Description rdf:about="urn:document-3cbc5995-1679-4dad-9d2c-87bffb9bb69f"> <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Document"/> <dmoz:topic rdf:resource="http://www.dmoz.org/Top/Regional/Europe/Sweden/Kalmar_Co unty/Localities/Kalmar"/> <dmoz:topic rdf:resource="http://www.dmoz.org/Top/Reference/Museums/Transportation/ Maritime/Europe/Sweden"/> <dmoz:topic rdf:resource="http://www.dmoz.org/Top/Regional/Europe/Sweden/Maps_and _Views"/> ... 01/12/2011 www.render-project.eu 26
  • 27. Diversity mining services – import pipeline ... <rdf:Description rdf:about="urn:document-3cbc5995-1679-4dad-9d2c-87bffb9bb69f"> <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Document"/> <dmoz:topic rdf:resource="http://www.dmoz.org/Top/Regional/Europe/Sweden/Kalmar_Co unty/Localities/Kalmar"/> <dmoz:topic rdf:resource="http://www.dmoz.org/Top/Reference/Museums/Transportation/ Maritime/Europe/Sweden"/> <dmoz:topic rdf:resource="http://www.dmoz.org/Top/Regional/Europe/Sweden/Maps_and _Views"/> ... 01/12/2011 www.render-project.eu 27
  • 28. Overview of solutions including reuse of R&D results 01/12/2011 www.render-project.eu 28
  • 29. Overview Problem scenarios Solution including reuse of R&D results Mockup demo Outlook Opinionated Wikipedia Articles JSI 01/12/2011 www.render-project.eu 29
  • 30. Opinionated Wikipedia articles o Analyzing opinionated Wikipedia articles • With the neutrality template set • 2 versions of the article (old and new in the dataset) • Condition: neutrality template not reset for 7 days after deletion • ~ 2GB xml file with opinionated / non-opinionated articles • 20,630 opinionated articles • 18,719,338 articles in total (these counts exclude Wikipedia infrastructure articles) 01/12/2011 www.render-project.eu 30
  • 31. Opinionated Wikipedia articles Wikipedia Neutrality Articles 20000 18000 17845 16000 14000 Articles 12000 Number of articles 10000 with neutrality tag 8000 changed 6000 4000 2000 2540 1 25 219 0 38 20--34 10--20 4--10 2 01/12/2011 Neutrality tag changes 31
  • 32. Top 20 Articles Based on the Neutrality Tag Change Opinionated Wikipedia articles September 11 attacks - 38 Muhammad - 24 George W. Bush - 34 Jesus - 24 Intelligent design - 32 Israel - 24 Global warming - 32 Arab Israeli conflict - 24 Race and intelligence - 28 Zionism - 22 Circumcision - 28 Srebrenica massacre - 22 Armenian Genocide – 28 Islamophobia - 22 Macedonians (ethnic group) Homeopathy - 22 - 26 Iraq War - 26 Holocaust denial - 22 The Holocaust - 24 Evolution - 22 01/12/2011 www.render-project.eu
  • 33. Learning opinionated Wikipedia articles Opinionated Article Topics (DMOZ) Reference Changes Learning Features Wikipedia Opinionated Entities Dataset Links to Other Articles Part of Speech Tags … 01/12/2011 www.render-project.eu 33
  • 34. Learning opinionated Wikipedia articles o ~ 10,000 opinionated articles o Applied Enrycher services: Wikipedia • POS tagging Opinionated Opinionated Topics (DMOZ) • DMOZ Dataset  Topics  Keywords 01/12/2011 www.render-project.eu 34
  • 35. Opinionated topics Article Changes for Top Topics 'Regional North_America United_States' 'Regional Europe United_Kingdom' 2%2%1% 2%2% 2% 20% 2% 'Society Religion_and_Spirituality 2% Christianity' 2% 2% 'Reference Education 2% Colleges_and_Universities' 2% 'Regional Asia India' 3% 'Society' 3% 13% 3% 'Arts' 3% 'Regional North_America Canada' 3% 4% 10% 'Society Religion_and_Spirituality Islam' 4% 4% 6% 'Society Issues Warfare_and_Conflict' 4% 'Society History By_Time_Period' * Keywords for which the number of article changes is greater than 500 01/12/2011 www.render-project.eu 35
  • 36. Opinionated keywords Article Changes for Top Keywords 'Regional' 'Society' 2% 2% 2%2% 'North_America' 2% 15% 2% 'United_States' 2% 2% 'Europe' 2% 'Society_and_Culture' 2% 'Arts' 11% 2% 'Religion_and_Spirituality' 2% 'Science' 3% 'United_Kingdom' 3% 7% 'Business' 3% 'Computers' 3% 6% 'History' 3% 'Asia' 4% 5% 'Education' 5% 5% 'Issues' 'Government' * Keywords for which the number of article changes is greater than 500 01/12/2011 www.render-project.eu 36
  • 37. Examples of changes within a topic o keyword-based changes: • article - Megleno-Romanians • topics - Regional, Europe, Romania, Society_and_Culture, Organizations • […] Vlahi is a disputed exonym […] o reference additions: • article - Soviet occupation of Romania • topics - Regional, Europe, Romania • […] Sergiu Verona, "Military Occupation and Diplomacy: Soviet Troops in Romania, 1944-1958", Duke University Press […] 01/12/2011 www.render-project.eu 37
  • 38. Learning opinionated Wikipedia articles o Next steps in learning opinionated articles: • extract the remaining learning features – entities, article references, article links, etc. o Dealing with scale: • process whole Wikipedia articles  ~ 30 TB of data • extract Wikipedia social network • obtain a Wikipedia static and dynamic profile for each contributor/community
  • 39. Overview Problem scenarios Solution including reuse of R&D results Mockup demo Outlook Detecting bias-inducing editor behavior to generate warnings KIT 01/12/2011 www.render-project.eu 39
  • 40. Bias detection via editing behavior in Wikipedia o Relevant use case: • UC1: Displaying warnings when detecting patterns of bias o Intention: • Help in understanding and curing behavioral causes for bias o Needed: • Understanding and prediction of socio-technical mechanisms leading to biases 01/12/2011 www.render-project.eu 40
  • 41. Bias warning examples Examples for behavioural-pattern-based warnings presented to users: •Concentration: 98% of the article were written by 3% of the active editors in the article. The resulting concentration coefficient is of 9 of 10. Usual coefficient for similar articles is 5 of 10. Find out what that means and what you can do to help. •Homogeneity: We detected a very fractioned editor structure with 80% of non-vandal edits being reverts and 3 major editor camps. Click here for explanation and visualization. Find out what that means and what you can do to help. 01/12/2011 www.render-project.eu 41
  • 42. Identify crucial editing patterns o Research review • Identified existing patterns of socio-technical mechanisms potentially influencing bias and diversity  Social proof and consolidation  Ownership behavior  Opinion camps and editor drop-out  Lack of boldness and useful conflicts  etc.  see paper “Towards a diversity-minded Wikipedia” and extended literature survey to be published 01/12/2011 www.render-project.eu 42
  • 43. Identify crucial editing patterns Example patterns: Opinion Camp drop-out
  • 44. Extract useful data to enable modeling o Define metrics to find patterns in the data typical for the mechanisms • For example: A dense core group of editors in an article’s social network structure How to get the data for using these metrics? 01/12/2011 www.render-project.eu 44
  • 45. Reverts as the basis for accurately modeling user behavior in Wikipedia o Most telling: editors’ actions which are related to each other • Reverting is undoing; contradicting actions perceived as false • Inferences possible without knowing meaning  Example: Edit Content Added/deleted content No. 1 “zzz” 2 “zzz yyy” +“yyy” 3 “zzz” -”yyy”  Foundation for the search for and analysis of most of the recurring editing patterns that are typical for biased articles 01/12/2011 www.render-project.eu 45
  • 46. State-of-the-art revert detection Simple identity revert method using MD5 hashes Edit Article content Words MD5 Hash Detected identic and Number deleted/added (simplified) reverted revisions (actions taken) by edit 1 Zero (ignored for this Hash1 Like revision 5 example) 2 Zero Apple Banana +“Apple” +”Banana” Hash2 Reverted by revision 5 3 Zero Apple Banana +”Coconut” +”Date” Hash3 Reverted by revision 5 Coconut Date 4 Zero Coconut Date -“Apple” Hash4 Reverted by revision 5 - “Banana” 5 Zero -“Coconut” Hash1 Like revision 1  - “Date” revert of revisions 2,3,4 01/12/2011 www.render-project.eu 46
  • 47. Deficiencies of the state-of-the-art o Partly reverts exist o Reverts do not always produce duplicate revisions Edit Revision content Words deleted/added MD5 Hash Detected identic and Number (actions taken) by edit (simplified) reverted revisions 1 Zero (ignored for this Hash1 Like revision 5 example) 2 Zero Apple Banana +“Apple” +”Banana” Hash2 Reverted by revision 5 3 Zero Apple Banana +”Coconut” +”Date” Hash3 Reverted by revision 5 Coconut Date 4 Zero Coconut Date -“Apple” Hash4 Reverted by revision 5 - “Banana” 5 Zero -“Coconut” Hash1 Like revision 1  - “Date” revert of revisions 2,3,4 01/12/2011 www.render-project.eu 47
  • 48. An improved revert detection method o A revert is defined by Wikipedia as an action of an editor “undoing the effects of one or more edits” and “(m)ore broadly, reverting may also refer to any action that in whole or in part reverses the actions of other editors.” o Clear definition, taking into account Wikipedia definition, known intentional behavior & available data: An edit A is reverted if all of the actions of that edit are completely undone in one subsequent edit B. Edit B has then reverted edit A. 01/12/2011 www.render-project.eu 48
  • 49. Improved method - implementation Edit Revision content Words MD5 Hash Content list Content list Detected No. deleted/added (simplified) (contains differences reverts (actions taken) by revision No.) edit 1 Zero (ignored for this Hash1 1 +1 example) 2 Zero Apple Banana +“Apple” Hash2 1;2 +2 Reverted by 4 +”Banana” 3 Zero Apple Banana +”Coconut” Hash3 1;2;3 +3 Reverted by 5 Coconut Date +”Date” 4 Zero Coconut Date -“Apple” Hash4 1;3 -2 Reverting 2 - “Banana” 5 Zero -“Coconut” Hash1 1 -3 Reverting 3 - “Date” 6 Zero Fig +”Fig” Hash5 1;6 +6 Reverted by 8 7 Zero Grape +”Grape” Hash6 1;6;7 +7 Reverted by 8 8 Zero Huckleberry -“Fig” -“Grape” Hash7 1;8 -6; -7 Reverting 6,7 +”Huckleberry” 49
  • 50. Improved method - results o Survey evaluation: Accuracy is much higher for new method • Significantly less false positives • Can accurately distinguish between full and partial reverts o 12% more reverts detected with the new method than with identity reverts • Up to 50% more in short articles o First revert detection evaluated to work according to the Wikipedia definition and to editors’ idea of a revert  better reflects actual behavior and relations  key to precisely modeling the social editing dynamics 01/12/2011 www.render-project.eu 50
  • 51. Overview Problem scenarios Solution including reuse of R&D results Mockup demo Outlook Mockup demo 01/12/2011 www.render-project.eu 51
  • 52. Mock-ups – tools for the case study Tools to support readers/ editors/ administrators: o Quality overview of Wikipedia articles for readers o Generation of working lists • for Wikipedia editors concerning problems of the content • Wikipedia administrators concerning editor behaviour and interaction 01/12/2011 www.render-project.eu 52
  • 53. Mock-ups (1) - QAO 01/12/2011 www.render-project.eu 53
  • 54. Mock-ups (2) - working lists 01/12/2011 www.render-project.eu 54
  • 55. Overview Problem scenarios Solution including reuse of R&D results Mockup demo Outlook Outlook 01/12/2011 www.render-project.eu 55
  • 56. Outlook (1) o Testing and evaluation of R&D results for Wikipedia o Development of prototypes for user supporting tools using these results o Evaluating and testing of these prototypes with Wikipedia users, in the first step of the German community o Collecting feedback and building up guidelines 01/12/2011 www.render-project.eu 56
  • 57. Outlook (2) Roadmap to develop supporting tools for Wikipedia users: M 18 • Development of prototypes M 20 • Testing the prototypes with a small user group like a German WikiProject M 22 • Adjusting and finalising • Apply to other language M 24 + versions 01/12/2011 www.render-project.eu 57