Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

#CrowdTruth: Biomedical Data Mining, Modeling & Semantic Integration (BDM2I 2015) @ISWC2015

801 vues

Publié le

Achieving Expert-Level Annotation Quality with CrowdTruth: The Case of Medical Relation Extraction. Anca Dumitrache, Lora Aroyo and Chris Welty. ==> http://ceur-ws.org/Vol-1428/

Publié dans : Technologie
  • Soyez le premier à commenter

#CrowdTruth: Biomedical Data Mining, Modeling & Semantic Integration (BDM2I 2015) @ISWC2015

  1. 1. Anca Dumitrache, Lora Aroyo, Chris Welty http://CrowdTruth.org Achieving Expert-Level Annotation Quality with the Crowd The Case of Medical Relation Extraction Biomedical Data Mining, Modeling & Semantic Integration @ ISWC2015 #CrowdTruth @anouk_anca @laroyo @cawelty #BDM2I
  2. 2. •  Annotator disagreement is signal, not noise. •  It is indicative of the variation in human semantic interpretation of signs •  It can indicate ambiguity, vagueness, similarity, over- generality, etc, as well as quality CrowdTruth   http://CrowdTruth.org
  3. 3. •  Goals: collecting a relation extraction gold standard improve the performance of a relation extraction classifier •  Approach: crowdsource 900 medical sentences measure disagreement with CrowdTruth metrics train & evaluate classifier with CrowdTruth score CrowdTruth  for   medical  rela2on   extrac2on   http://CrowdTruth.org
  4. 4. RelEx  TASK  in  CrowdFlower   Pa2ents  with  ACUTE  FEVER  and  nausea  could  be  suffering   from  INFLUENZA  AH1N1   Is  ACUTE  FEVER  –  related  to  →  INFLUENZA  AH1N1?   h"p://CrowdTruth.org    
  5. 5. 1 1 1 Worker  Vector   h"p://CrowdTruth.org    
  6. 6. 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 0 4 3 0 0 5 1 0 Sentence  Vector   h"p://CrowdTruth.org    
  7. 7. 0.907,  p  =  0:007   0.844   Annota2on  Quality     of  Expert  vs.  Crowd  Annota2ons   h"p://CrowdTruth.org    
  8. 8. 0.907,  p  =  0:007   0.844   [0.6  -­‐  0.8]  crowd  significantly  out-­‐performs  expert     with  max  in  0.907  F1  @  0.7  threshold   Annota2on  Quality     of  Expert  vs.  Crowd  Annota2ons   h"p://CrowdTruth.org    
  9. 9. 0.642,  p  =  0:016     0.638   Relex  CAUSE  Classifier  F1     for  Crowd  vs.  Expert  Annota2ons   h"p://CrowdTruth.org    
  10. 10. 0.642,  p  =  0:016     0.638   crowd  provides  training  data  that  is  at  least  as  good   if  not  beEer  than  experts   Relex  CAUSE  Classifier  F1     for  Crowd  vs.  Expert  Annota2ons   h"p://CrowdTruth.org    
  11. 11. (crowd  with  pos./neg.  threshold  at  0.5)   h"p://CrowdTruth.org     Learning  Curves  
  12. 12. Learning  Curves   (crowd  with  pos./neg.  threshold  at  0.5)   above  400  sent.:  crowd  consistently  over  baseline  &  single   above  600  sent.:  crowd  out-­‐performs  experts   h"p://CrowdTruth.org    
  13. 13. Learning  Curves  Extended   (crowd  with  pos./neg.  threshold  at  0.5)   h"p://CrowdTruth.org    
  14. 14. Learning  Curves  Extended   (crowd  with  pos./neg.  threshold  at  0.5)   h"p://CrowdTruth.org     crowd  consistently  performs  beEer  than  baseline  
  15. 15. #  of  Workers:  Impact  on  Sentence-­‐Rela2on  Score   h"p://CrowdTruth.org    
  16. 16. #  of  Workers:  Impact  on  Annota2on  Quality   only  54  sent.  had  15  or  more  workers   h"p://CrowdTruth.org    
  17. 17. Experts  vs.  Crowd     in  Human  Annota2on   Overall  Comparison   •  91% of expert annotations covered by the crowd •  expert annotators reach agreement only in 30% •  most popular crowd vote covers 95% of this expert annotation agreement   h"p://CrowdTruth.org    
  18. 18. F1 Cost per sentence CrowdTruth 0.642 $0.66 Expert Annotator 0.638 $2.00 Single Annotator 0.492 $0.08 h"p://CrowdTruth.org     Expert  vs.  Crowd     in  Human  Annota2on   Cost  Comparison  
  19. 19. •  crowd performs just as well as medical experts •  crowd is also cheaper •  crowd is always available •  using only a few annotators for ground truth is faulty •  min 10 workers/sentence are needed for highest quality annotations •  CrowdTruth = a solution to Clinical NLP Challenge: •  lack of ground truth for training & benchmarking Experiments proved  that:   http://CrowdTruth.org
  20. 20. #CrowdTruth @anouk_anca @laroyo @cawelty #BDM2I #ISWC2015 CrowdTruth.org http://data.CrowdTruth.org/medical-relex

×