SlideShare une entreprise Scribd logo
1  sur  22
Télécharger pour lire hors ligne
[Freedman+ EMNLP11] Extreme
Extraction – Machine Reading in a
              Week

                23 Dec 2011
      Nakatani Shuyo @ Cybozu labs, Inc
               twitter : @shuyo
Abstract
• Target:
  – Rapid construction of concept and relation
    extraction system
• Method:
  – Extend an existing ACE system for new relation
  – in short time with minimum training data
     • in a Week (<50 person hours) with <20 example pairs
  – Evaluate by question answering task
Phases
1. Ontology and resources
2. Extending system for new ontology
3. Extracting relations
4. Evaluation
1. Ontology and resources
• possibleTreatment( Substance, Condition )
   – SSRIs(S) are effective treatments for depression(C)
• expectedDateOnMarket( Substance , Date )
   – More drugs for type 2(S) expected on market soon(D)
• responsibleForTreatment( Substance, Agent )
   – Officials(A) Responsible for Treatment of War Dead(S)
• studiesDisease( Agent , Condition )                       not
                                                           sure
   – cancer(C) researcher Dr. Henri Joyeux(A)
• hasSideEffect( Substance, Condition )
2. Extending system for new
               ontology
• Add new relation/class detectors into “our”
  extraction system for ACE task
  – Details of the system are not clear...
     • Class detectors with unsupervised word clustering
     • Bootstrap relation learner with a template and seeds
     • Pattern learning for relation extraction

• Annotate words for 4 classes
• Coreference
Bootstrap relation learner
• DAP(Double-Anchored Pattern) (Kozareva+ 08)
  – Web search with a query based on “<CLASS>
    such as <SEED> and *”
  – Add words at the position “*” in snippet into the
    class member as new seeds
  – Repeat “the bootstraping loop” while seeds are
    available
Relation detection with DAP
• CLASS = disease / SEED = cold
• Web search = “disease such as cold and”
Relation detection with DAP
• CLASS = disease / SEED = cold
• Web search = “disease such as cold and”
  – disease such as cold and flu (9). ...
  – disease such as cold and heat, external ...
  – disease such as cold and pneumonia. ...
  – disease (such as cold and hot diseases), ...
  – disease such as cold and flu viruses. ...
  – disease such as cold and food poisoning. ...
Four classes to annotate
• Substance-Name
  – medicine name
• Substance-Description
  – e.g. “new drags”
• Condition-Name
  – name of disease
• Condition-Description
  – e.g. “the illness”
Annotation
• Name tagging with active learning(Miller+ 04)
  – Unsupervised word clustering on binary tree
    (Brown+ 90)
  – Tagging with clustering information
     • Averaged Perceptron (Collins 02)

  – Request annotation for selected sentence based on
    “confidence score”
     • score = (highest perceptron score) - (second one)

                                       !?
Results of Class Detection
            What’s
       GS(GoldStandard)?




                                         from [Freedman+ 11]
• substances & conditions
   – -Name / -Description respectively
• without/with lists of known substances and conditions
Coreference
• It took the most time(20 of 43 hours)
• But its detail is not clear...
  – domain independent heuristics
  – appositive linking
3. Extracting relations
• Learned Patterns vs. Handwritten Patterns




                from [Freedman+ 11]
from [Freedman+ 11]
4. Evaluation
• Question Answering with extracted
  information


• Query examples
  – Find possible treatments for diabetes
  – What is expected date to market for Abilify?
Answer Example
• ACME produces a wide range of drugs
  including treatments for malaria and
  athletes foot
  – responsibleForTreatment(drugs, ACME)
  – possibleTreatment(drugs, malaria)
  – possibleTreatment(drugs, athletes foot)
from [Freedman+ 11]

• useful = answering complex query
When non-useful answers are removed




                                           from [Freedman+ 11]
•   annotator’s recall (A)
•   using combining both (C)
•   using only handwritten rules (H, HW)
•   using only learned patterns (L)
from [Freedman+ 11]
Discussion




 from [Freedman+ 11]
Conclusions
• The combination system can achieve
  F1 of 0.51 in a new domain in a week.
• It requires so little training data.
• The effectiveness of learning algorithms is
  still not competitive with handwritten
  patterns.
References
• [Freedman+ 11] Extreme Extraction – Machine
  Reading in a Week
• [Kozareva+ 08] Semantic Class Learning from the
  Web with Hyponym Pattern Linkage
• [Miller+ 04] Name Tagging with Word Cluster and
  Discriminative Training
   – [Brown+ 90] Class-based n-gram models of natural
     language
   – [Collins 02] Discriminative Training Methods for Hidden
     Markov Models: Theory and Experiments with Perceptron
     Algorithm

Contenu connexe

Similaire à Extreme Extraction - Machine Reading in a Week

Studying ppl scientifically nb 913
Studying ppl scientifically nb 913Studying ppl scientifically nb 913
Studying ppl scientifically nb 913
Jim Forde
 
Soc. Unit I, Packet 2
Soc. Unit I, Packet 2Soc. Unit I, Packet 2
Soc. Unit I, Packet 2
NHSDAnderson
 
Nursingnotes.info nursing-research-review
Nursingnotes.info nursing-research-reviewNursingnotes.info nursing-research-review
Nursingnotes.info nursing-research-review
grey clemente
 
Variations in citation practices across the scientific landscape: Analysis ba...
Variations in citation practices across the scientific landscape: Analysis ba...Variations in citation practices across the scientific landscape: Analysis ba...
Variations in citation practices across the scientific landscape: Analysis ba...
Wout Lamers
 
Studyingpplscientificallynb914
Studyingpplscientificallynb914Studyingpplscientificallynb914
Studyingpplscientificallynb914
Jim Forde
 
Pronunciation App - Research Proposal
Pronunciation App - Research ProposalPronunciation App - Research Proposal
Pronunciation App - Research Proposal
Liza Pesenson
 

Similaire à Extreme Extraction - Machine Reading in a Week (20)

Blueprinting and drafting examination questions, Liz Norman, ANZCVS Exam Writ...
Blueprinting and drafting examination questions, Liz Norman, ANZCVS Exam Writ...Blueprinting and drafting examination questions, Liz Norman, ANZCVS Exam Writ...
Blueprinting and drafting examination questions, Liz Norman, ANZCVS Exam Writ...
 
Studying ppl scientifically nb 913
Studying ppl scientifically nb 913Studying ppl scientifically nb 913
Studying ppl scientifically nb 913
 
R methods 66
R methods 66R methods 66
R methods 66
 
Soc. Unit I, Packet 2
Soc. Unit I, Packet 2Soc. Unit I, Packet 2
Soc. Unit I, Packet 2
 
Nursingnotes.info nursing-research-review
Nursingnotes.info nursing-research-reviewNursingnotes.info nursing-research-review
Nursingnotes.info nursing-research-review
 
Effectiveness of New, Informationist-led Curriculum Changes at the College of...
Effectiveness of New, Informationist-led Curriculum Changes at the College of...Effectiveness of New, Informationist-led Curriculum Changes at the College of...
Effectiveness of New, Informationist-led Curriculum Changes at the College of...
 
Variations in citation practices across the scientific landscape: Analysis ba...
Variations in citation practices across the scientific landscape: Analysis ba...Variations in citation practices across the scientific landscape: Analysis ba...
Variations in citation practices across the scientific landscape: Analysis ba...
 
MELJUN CORTES research seminar_1_the_research_process_coming_to_terms
MELJUN CORTES research seminar_1_the_research_process_coming_to_termsMELJUN CORTES research seminar_1_the_research_process_coming_to_terms
MELJUN CORTES research seminar_1_the_research_process_coming_to_terms
 
Studyingpplscientificallynb914
Studyingpplscientificallynb914Studyingpplscientificallynb914
Studyingpplscientificallynb914
 
Clinical Epidemiology - Systematic PubMed Searching Workshop
Clinical Epidemiology - Systematic PubMed Searching WorkshopClinical Epidemiology - Systematic PubMed Searching Workshop
Clinical Epidemiology - Systematic PubMed Searching Workshop
 
Systematic Reviews and Knowledge Syntheses: What a Librarian Needs to Know
Systematic Reviews and Knowledge Syntheses: What a Librarian Needs to KnowSystematic Reviews and Knowledge Syntheses: What a Librarian Needs to Know
Systematic Reviews and Knowledge Syntheses: What a Librarian Needs to Know
 
Searching for evidence - Paramedicine
Searching for evidence - ParamedicineSearching for evidence - Paramedicine
Searching for evidence - Paramedicine
 
Information retrieval in systematic reviews: a case study of the crime preven...
Information retrieval in systematic reviews: a case study of the crime preven...Information retrieval in systematic reviews: a case study of the crime preven...
Information retrieval in systematic reviews: a case study of the crime preven...
 
Meta analysis_Sharanbasappa
Meta analysis_SharanbasappaMeta analysis_Sharanbasappa
Meta analysis_Sharanbasappa
 
Exercise Science
Exercise ScienceExercise Science
Exercise Science
 
Methodology and research process
Methodology and research processMethodology and research process
Methodology and research process
 
The best research method طرق البحث
The best research method طرق البحثThe best research method طرق البحث
The best research method طرق البحث
 
Podiatry: Searching for Evidence
Podiatry: Searching for EvidencePodiatry: Searching for Evidence
Podiatry: Searching for Evidence
 
Pronunciation App - Research Proposal
Pronunciation App - Research ProposalPronunciation App - Research Proposal
Pronunciation App - Research Proposal
 
Classroom Research
Classroom ResearchClassroom Research
Classroom Research
 

Plus de Shuyo Nakatani

言語処理するのに Python でいいの? #PyDataTokyo
言語処理するのに Python でいいの? #PyDataTokyo言語処理するのに Python でいいの? #PyDataTokyo
言語処理するのに Python でいいの? #PyDataTokyo
Shuyo Nakatani
 
ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickh...
ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickh...ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickh...
ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickh...
Shuyo Nakatani
 
猫に教えてもらうルベーグ可測
猫に教えてもらうルベーグ可測猫に教えてもらうルベーグ可測
猫に教えてもらうルベーグ可測
Shuyo Nakatani
 
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
Shuyo Nakatani
 
Active Learning 入門
Active Learning 入門Active Learning 入門
Active Learning 入門
Shuyo Nakatani
 
数式を綺麗にプログラミングするコツ #spro2013
数式を綺麗にプログラミングするコツ #spro2013数式を綺麗にプログラミングするコツ #spro2013
数式を綺麗にプログラミングするコツ #spro2013
Shuyo Nakatani
 

Plus de Shuyo Nakatani (20)

画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15
画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15
画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
無限関係モデル (続・わかりやすいパターン認識 13章)
無限関係モデル (続・わかりやすいパターン認識 13章)無限関係モデル (続・わかりやすいパターン認識 13章)
無限関係モデル (続・わかりやすいパターン認識 13章)
 
Memory Networks (End-to-End Memory Networks の Chainer 実装)
Memory Networks (End-to-End Memory Networks の Chainer 実装)Memory Networks (End-to-End Memory Networks の Chainer 実装)
Memory Networks (End-to-End Memory Networks の Chainer 実装)
 
人工知能と機械学習の違いって?
人工知能と機械学習の違いって?人工知能と機械学習の違いって?
人工知能と機械学習の違いって?
 
RとStanでクラウドセットアップ時間を分析してみたら #TokyoR
RとStanでクラウドセットアップ時間を分析してみたら #TokyoRRとStanでクラウドセットアップ時間を分析してみたら #TokyoR
RとStanでクラウドセットアップ時間を分析してみたら #TokyoR
 
ドラえもんでわかる統計的因果推論 #TokyoR
ドラえもんでわかる統計的因果推論 #TokyoRドラえもんでわかる統計的因果推論 #TokyoR
ドラえもんでわかる統計的因果推論 #TokyoR
 
[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...
[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...
[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...
 
星野「調査観察データの統計科学」第3章
星野「調査観察データの統計科学」第3章星野「調査観察データの統計科学」第3章
星野「調査観察データの統計科学」第3章
 
星野「調査観察データの統計科学」第1&2章
星野「調査観察データの統計科学」第1&2章星野「調査観察データの統計科学」第1&2章
星野「調査観察データの統計科学」第1&2章
 
言語処理するのに Python でいいの? #PyDataTokyo
言語処理するのに Python でいいの? #PyDataTokyo言語処理するのに Python でいいの? #PyDataTokyo
言語処理するのに Python でいいの? #PyDataTokyo
 
Zipf? (ジップ則のひみつ?) #DSIRNLP
Zipf? (ジップ則のひみつ?) #DSIRNLPZipf? (ジップ則のひみつ?) #DSIRNLP
Zipf? (ジップ則のひみつ?) #DSIRNLP
 
ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickh...
ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickh...ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickh...
ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickh...
 
ソーシャルメディアの多言語判定 #SoC2014
ソーシャルメディアの多言語判定 #SoC2014ソーシャルメディアの多言語判定 #SoC2014
ソーシャルメディアの多言語判定 #SoC2014
 
猫に教えてもらうルベーグ可測
猫に教えてもらうルベーグ可測猫に教えてもらうルベーグ可測
猫に教えてもらうルベーグ可測
 
アラビア語とペルシャ語の見分け方 #DSIRNLP 5
アラビア語とペルシャ語の見分け方 #DSIRNLP 5アラビア語とペルシャ語の見分け方 #DSIRNLP 5
アラビア語とペルシャ語の見分け方 #DSIRNLP 5
 
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
 
Active Learning 入門
Active Learning 入門Active Learning 入門
Active Learning 入門
 
数式を綺麗にプログラミングするコツ #spro2013
数式を綺麗にプログラミングするコツ #spro2013数式を綺麗にプログラミングするコツ #spro2013
数式を綺麗にプログラミングするコツ #spro2013
 
ノンパラベイズ入門の入門
ノンパラベイズ入門の入門ノンパラベイズ入門の入門
ノンパラベイズ入門の入門
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Dernier (20)

Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 

Extreme Extraction - Machine Reading in a Week

  • 1. [Freedman+ EMNLP11] Extreme Extraction – Machine Reading in a Week 23 Dec 2011 Nakatani Shuyo @ Cybozu labs, Inc twitter : @shuyo
  • 2. Abstract • Target: – Rapid construction of concept and relation extraction system • Method: – Extend an existing ACE system for new relation – in short time with minimum training data • in a Week (<50 person hours) with <20 example pairs – Evaluate by question answering task
  • 3. Phases 1. Ontology and resources 2. Extending system for new ontology 3. Extracting relations 4. Evaluation
  • 4. 1. Ontology and resources • possibleTreatment( Substance, Condition ) – SSRIs(S) are effective treatments for depression(C) • expectedDateOnMarket( Substance , Date ) – More drugs for type 2(S) expected on market soon(D) • responsibleForTreatment( Substance, Agent ) – Officials(A) Responsible for Treatment of War Dead(S) • studiesDisease( Agent , Condition ) not sure – cancer(C) researcher Dr. Henri Joyeux(A) • hasSideEffect( Substance, Condition )
  • 5. 2. Extending system for new ontology • Add new relation/class detectors into “our” extraction system for ACE task – Details of the system are not clear... • Class detectors with unsupervised word clustering • Bootstrap relation learner with a template and seeds • Pattern learning for relation extraction • Annotate words for 4 classes • Coreference
  • 6. Bootstrap relation learner • DAP(Double-Anchored Pattern) (Kozareva+ 08) – Web search with a query based on “<CLASS> such as <SEED> and *” – Add words at the position “*” in snippet into the class member as new seeds – Repeat “the bootstraping loop” while seeds are available
  • 7. Relation detection with DAP • CLASS = disease / SEED = cold • Web search = “disease such as cold and”
  • 8. Relation detection with DAP • CLASS = disease / SEED = cold • Web search = “disease such as cold and” – disease such as cold and flu (9). ... – disease such as cold and heat, external ... – disease such as cold and pneumonia. ... – disease (such as cold and hot diseases), ... – disease such as cold and flu viruses. ... – disease such as cold and food poisoning. ...
  • 9. Four classes to annotate • Substance-Name – medicine name • Substance-Description – e.g. “new drags” • Condition-Name – name of disease • Condition-Description – e.g. “the illness”
  • 10. Annotation • Name tagging with active learning(Miller+ 04) – Unsupervised word clustering on binary tree (Brown+ 90) – Tagging with clustering information • Averaged Perceptron (Collins 02) – Request annotation for selected sentence based on “confidence score” • score = (highest perceptron score) - (second one) !?
  • 11. Results of Class Detection What’s GS(GoldStandard)? from [Freedman+ 11] • substances & conditions – -Name / -Description respectively • without/with lists of known substances and conditions
  • 12. Coreference • It took the most time(20 of 43 hours) • But its detail is not clear... – domain independent heuristics – appositive linking
  • 13. 3. Extracting relations • Learned Patterns vs. Handwritten Patterns from [Freedman+ 11]
  • 15. 4. Evaluation • Question Answering with extracted information • Query examples – Find possible treatments for diabetes – What is expected date to market for Abilify?
  • 16. Answer Example • ACME produces a wide range of drugs including treatments for malaria and athletes foot – responsibleForTreatment(drugs, ACME) – possibleTreatment(drugs, malaria) – possibleTreatment(drugs, athletes foot)
  • 17. from [Freedman+ 11] • useful = answering complex query
  • 18. When non-useful answers are removed from [Freedman+ 11] • annotator’s recall (A) • using combining both (C) • using only handwritten rules (H, HW) • using only learned patterns (L)
  • 21. Conclusions • The combination system can achieve F1 of 0.51 in a new domain in a week. • It requires so little training data. • The effectiveness of learning algorithms is still not competitive with handwritten patterns.
  • 22. References • [Freedman+ 11] Extreme Extraction – Machine Reading in a Week • [Kozareva+ 08] Semantic Class Learning from the Web with Hyponym Pattern Linkage • [Miller+ 04] Name Tagging with Word Cluster and Discriminative Training – [Brown+ 90] Class-based n-gram models of natural language – [Collins 02] Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithm