Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Crowdsourcing Linked Data management

656 vues

Publié le

Publié dans : Formation, Technologie
  • Soyez le premier à commenter

Crowdsourcing Linked Data management

  2. 2. HUMAN COMPUTATION Outsourcing tasks that machines find difficult to solve to humans (accuracy, efficiency, costs)
  3. 3. SEMANTIC TECHNOLOGIES ARE ALL ABOUT AUTOMATION …but many tasks rely on human input • Modeling a domain • Integrating data sources originating from different contexts • Producing semantic markup for various types of digital artifacts • ... 3 1st PRELIDA workshop
  4. 4. DIMENSIONS OF HUMAN COMPUTATION SYSTEMS What Tasks that require basic human skills How Distribution Coordination Aggregation Quality Closed vs open answers Ground truth Quantitative vs qualitative Who is the evaluator? Optimize! Incentives Reduce problem size Task assignment 7/18/2013 1st PRELIDA workshop 4
  5. 5. GAMES WITH A PURPOSE (GWAP) Human computation disguised as casual games Tasks are divided into parallelizable atomic units (challenges) solved (consensually) by players Game models • Single vs. multi-player • Selection agreement vs. input agreement vs. inversion- problem games 7/18/2013 5
  6. 6. MICROTASK CROWDSOURCING Similar types of tasks, but different incentives model (monetary reward, PPP) Successfully applied to transcription, classification, and content generation, data collection, image tagging, website feedback, usability tests… 7/18/2013 1st PRELIDA workshop 6
  7. 7. THE SAME, BUT DIFFERENT • Tasks leveraging common human skills, appealing to large audiences • Selection of domain and task more constrained in games to create typical UX • Tasks decomposed into smaller units of work to be solved independently • Complex workflows • Creating a casual game experience vs. patterns in microtasks • Quality assurance • Synchronous interaction in games • Levels of difficulty and near-real-time feedback in games • Many methods applied in both cases (redundancy, votes, statistical techniques) • Different set of incentives and motivators 7/18/2013 1st PRELIDA workshop 7
  8. 8. Physical World (people and devices) HYBRID SYSTEMS Design and composition Participation and data supply Model of social interaction Virtual world (Network of social interactions) Dave Robertson
  9. 9. Not sure EXAMPLE: HYBRID DATA INTEGRATION paper conf Data integration VLDB-01 Data mining SIGMOD-02 title author email OLAP Mike mike@a Social media Jane jane@b Generate plausible matches – paper = title, paper = author, paper = email, paper = venue – conf = title, conf = author, conf = email, conf = venue Ask users to verify paper conf Data integration VLDB-01 Data mining SIGMOD-02 title author email venue OLAP Mike mike@a ICDE-02 Social media Jane jane@b PODS-05 Does attribute paper match attribute author? NoYes [McCann, Shen, Doan, ICDE 2008] 9
  11. 11. WHAT IS DIFFERENT ABOUT SEMANTIC SYSTEMS? Semantic Web tools vs. applications • Intelligent (specialized) Web sites (portals) with improved (local) search based on vocabularies and ontologies • X2X integration (often combined with Web services) • Knowledge representation, communication and exchange 7/18/2013 1st PRELIDA workshop
  12. 12. TASKS NAMED IN METHODOLOGIES ARE TOO HIGH- LEVEL Crowdsource very specific tasks that are (highly) divisible • Labeling (in different languages) • Finding relationships • Populating the ontology • Aligning and interlinking • Ontology-based annotation • Validating the results of automatic methods • … Think about the context of the application (social structure) and about how to hide tasks behind existing practices and tools 12 7/18/2013 Tutorial@ESWC2013
  13. 13. TASTE IT! TRY IT! • Restaurant review Android app developed in the Insemtives project • Uses Dbpedia concepts to generate structured reviews • Uses mechanism design/gamification to configure incentives • User study • 2274 reviews by 180 reviewers referring to 900 restaurants, using 5667 DPpedia concepts 7/18/2013 1st PRELIDA workshop 13 https://play.google.com/store/apps/details?id=insemtives.android&hl=en 0 500 1000 1500 2000 2500 CAFE FASTFOOD PUB RESTAURANT Numer of reviews Number of semantic annotations (type of cuisine) Number of semantic annotations (dishes)
  14. 14. LODREFINE 7/18/2013 1st PRELIDA workshop 14 http://research.zemanta.com/crowds-to-the-rescue/
  15. 15. DBPEDIA CURATION 7/18/2013 1st PRELIDA workshop 15 http://aksw.org/Projects/TripleCheckMate.html
  16. 16. CROWDMAP Experiments using MTurk, CrowdFlower and established benchmarks Enhancing the results of automatic techniques Fast, accurate, cost-effective [Sarasua, Simperl, Noy, ISWC2012] 16 CartP 301-304 100R50P Edas-Iasted 100R50P Ekaw-Iasted 100R50P Cmt-Ekaw 100R50P ConfOf-Ekaw Imp 301-304 PRECISION 0.53 0.8 1.0 1.0 0.93 0.73 RECALL 1.0 0.42 0.7 0.75 0.65 1.0
  17. 17. ONTOLOGY POPULATION 7/18/2013 1st PRELIDA workshop 17
  18. 18. LINKED DATA CURATION 7/18/2013 1st PRELIDA workshop 18
  19. 19. PROBLEMS AND CHALLENGES •What is feasible and how can tasks be optimally translated into microtasks? • Examples: data quality assessment for technical and contextual features; subjective vs objective tasks (also in modeling); open-ended questions •What to show to users • Natural language descriptions of Linked Data/SPARQL • How much context • What form of rendering • How about links? •How to combine with automatic tools • Which results to validate • Low precision (no fun for gamers...) • Low recall (vs all possible questions) •How to embed it into an existing application • Tasks are fine granular, perceived as additional burden to the actual functionality •What to do with the resulting data? • Integration into existing practices • Vocabularies! 7/18/2013 1st PRELIDA workshop 19