SlideShare une entreprise Scribd logo
1  sur  23
The DERI Reading Group
Ontology-based information extraction:
An Overview & Survey
(2010, Wimalasuriya and Dou)
Tobias Wunner, UNLP Group
 Copyright 2010 Digital Enterprise Research Institute. All rights reserved, Paul Buitelaar
Definition - Motivation
a) Create content for the Semantic Web
 convert existing websites into ontologies
b) Improve quality of existing ontologies
 Test criterion: OBIE task
 OBIE good => ontology good
Overview
 Access to information…
Overview
 Access to information…
Ontologie-based Information Extraction (OBIE):
“A system that processes unstructured or semi-
structured natural language text guided by an
ontology and presents the output in an ontology.
Overview
 ESWC dogfood OBIE-related topics
New!
T
 1. Text only:
 Extract conceptualization and instances
County
building with café
and football table
Building
is-a
1. conceptualization
2. instances
Galway DERI building
Problem – two scenarios
T
County
building with café
and football table
Building
is-a
1. conceptualization
2. instances
Galway DERI building
Problem – two scenarios
 conceptualization can be
too specific / generic
 wrong conceptualization
 1. Text only:
 Extract conceptualization and instances
T
City Buildinglocated
in
Conceptualization
by domain ontology
2. instances
Galway DERI building
Problem – two scenarios
 2. Domain ontology & text:
 extract instances only
T
City Buildinglocated
in
Conceptualization
by domain ontology
2. instances
Galway DERI building
Problem – two scenarios
 2. Domain ontology & text:
 extract instances only
less generic but more
semantic stable
Definition – key characteristics
a) Process structured / unstructured text
b) “guided” by an ontology
c) Present output in ontology
Text
Source
Information
Extractor
Ontology
guided by
Definition – ontology learning or population?
 Ontology population ⊂ OBIE
 “OBIE is Open information extraction” (Etzioni)
 alternative: semantics given by ontology!
 extractors can be inside / outside ontology
Text
Source
Information
Extractor
Ontology
guided by
Methods
 Information extractors
1. Linguistic rules
2. Gazetteer lists
3. Classification (classical / structure-aware)
4. Partial parse trees
5. Structured data analyzers
6. Web querying
Linguistic Rules - Methods
 Regular expressions
 <COMPANY> .* revenue <Number> <currency>
“Tesco’s revenue in 2009 was 3.4 billion GBP.”
 Extraction ontologies
 combination of ontology and lexicon
(Mädche, Embley, Buitelaar)
 manual construction
 High precision
 2. Gazetteer lists
 Phrases / words instead of patterns
 Named-Entity Recognition
 Requirements:
1) Specify what is being extracted
2) Specify sources and avoid manual creation
Gazetteer Methods
Sematic Web
Software
Energy
Supermarket
…
industry
The software giant SAP…
Tesco a UK supermarket …
Siemens energy revenue…
… wind energy company Vestas
 3. Classification techniques
 Break down IE task in a set of binary tasks
Classification Methods
pos
semTag
c1
c2
..
cn
Classifier
features
 Classical
Classification Methods
Galway Germany DERI Siemens
GEIrelandMunich CITEC
missclassification does
not consider structure!
(equal cost 1/6)
DERI
TescoCladdagh
DERI
CountryCity SW Energy
IndustryLocation
W1,6=3
 Structure aware
Classification Methods
Galway Germany Siemens
GEIrelandMunich CITEC
Classifier should
consider taxonomy structure!
TescoCladdagh
DERI
 4. Partial parse trees
 TACITUS, SMES, LTAG
 5. Analyze structured data
 Wikpedia Infoboxes
 6. Web querying
 C-PANKOW
 “Towards the self annotating web
Other methods
Technologies used in implementation
 Shallow NLP (GATE, sProUT, StanfordNLP)
 POS, sentence splitting, regular expression
 Semantic lexicons (WordNet, GermaNet)
 synonym, meronym, hypernym
 Semantic Annotation (OCAT, iDocument,
PIMO)
Missing
 Terminological tools (UMLS, bio
terminologies)
 Thesauri, translation memory
Data sets & evaluation
Data sets (corpora)
1) Message Understanding Conference (MUC-7)
2) Automatic Content Extraction (ACE)
 => more on classical IR, IE, NLP tracks
 => no data set with given semantics (ontology)
Evaluation
 Precision & recall
 Only used for population task
Recent Open IE argument
 Con: Weikum, From Information to Knowledge -
Harvest Web Resources for IE
 Disambiguation
 NL relations are not well defined (well defined
arguments)
 Pro: Weld, Using Wiki to Bootrap Open IE
 Relation targeted:
 learn extractor per relation -> lower recall
 Structural targeted:
 general extraction engine -> lower precision
Conclusion and Outlook
 No established/ agreed methods yet
 Is OBIE also ontology learning?
 Data sets
 Methods for best extractors
 Semantic Web contribution?
 eg. Gazetteers from DBPedia
 Cross-lingual OBIE -> CLOBIE
References
[1] Wimalasuriya, Dou, Ontology-based Information
Extraction: An Introduction and Survey of current
approaches, in Journal of Computer Science, June
2010
[2] Buitelaar et Al., Towards linguistically grounded
ontologies., ESWC, Springer, 200
[3] Weikum et Al, From Information to Knowledge –
Harvesting Entities and Relationships from Web
Sources, Principle Database Systems, 2010
[4] Weld et al., Using Wikipedia to bootstrap open
information extraction, Sigmod Record, 2008

Contenu connexe

Tendances

Ck32985989
Ck32985989Ck32985989
Ck32985989
IJMER
 
Action Research Statement of the Issue
Action Research Statement of the Issue Action Research Statement of the Issue
Action Research Statement of the Issue
Mae Guerra
 
Ontology Mapping
Ontology MappingOntology Mapping
Ontology Mapping
butest
 

Tendances (20)

Ck32985989
Ck32985989Ck32985989
Ck32985989
 
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR  ONTOLOGY APPLICATION ONTOLOGY SERVICE CENTER: A DATAHUB FOR  ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
 
IRJET- Survey for Amazon Fine Food Reviews
IRJET- Survey for Amazon Fine Food ReviewsIRJET- Survey for Amazon Fine Food Reviews
IRJET- Survey for Amazon Fine Food Reviews
 
Dictionary based concept mining an application for turkish
Dictionary based concept mining  an application for turkishDictionary based concept mining  an application for turkish
Dictionary based concept mining an application for turkish
 
Text Mining Framework
Text Mining FrameworkText Mining Framework
Text Mining Framework
 
Issues in Developing Home Based Businesses
Issues in Developing Home Based BusinessesIssues in Developing Home Based Businesses
Issues in Developing Home Based Businesses
 
M045067275
M045067275M045067275
M045067275
 
Ontology Mapping
Ontology MappingOntology Mapping
Ontology Mapping
 
The basics of ontologies
The basics of ontologiesThe basics of ontologies
The basics of ontologies
 
A Novel Approach for Keyword extraction in learning objects using text mining
A Novel Approach for Keyword extraction in learning objects using text miningA Novel Approach for Keyword extraction in learning objects using text mining
A Novel Approach for Keyword extraction in learning objects using text mining
 
EXTRACTING ARABIC RELATIONS FROM THE WEB
EXTRACTING ARABIC RELATIONS FROM THE WEBEXTRACTING ARABIC RELATIONS FROM THE WEB
EXTRACTING ARABIC RELATIONS FROM THE WEB
 
Action Research Statement of the Issue
Action Research Statement of the Issue Action Research Statement of the Issue
Action Research Statement of the Issue
 
IRJET - BOT Virtual Guide
IRJET -  	  BOT Virtual GuideIRJET -  	  BOT Virtual Guide
IRJET - BOT Virtual Guide
 
Named Entity Recognition Using Web Document Corpus
Named Entity Recognition Using Web Document CorpusNamed Entity Recognition Using Web Document Corpus
Named Entity Recognition Using Web Document Corpus
 
Named entity recognition using web document corpus
Named entity recognition using web document corpusNamed entity recognition using web document corpus
Named entity recognition using web document corpus
 
Ontology Mapping
Ontology MappingOntology Mapping
Ontology Mapping
 
A0210110
A0210110A0210110
A0210110
 
Keywords- Based on Arabic Information Retrieval Using Light Stemmer
Keywords- Based on Arabic Information Retrieval Using Light Stemmer Keywords- Based on Arabic Information Retrieval Using Light Stemmer
Keywords- Based on Arabic Information Retrieval Using Light Stemmer
 
Ontology-based Data Integration
Ontology-based Data IntegrationOntology-based Data Integration
Ontology-based Data Integration
 

Similaire à Ontology-based information extraction in the DERI Reading Group

Collaborative Ontology Building Project
Collaborative Ontology Building Project  Collaborative Ontology Building Project
Collaborative Ontology Building Project
Jie Bao
 
Profile-based Dataset Recommendation for RDF Data Linking
Profile-based Dataset Recommendation for RDF Data Linking  Profile-based Dataset Recommendation for RDF Data Linking
Profile-based Dataset Recommendation for RDF Data Linking
Mohamed BEN ELLEFI
 
CNI fall 2009 enhanced publications john_doove-SURFfoundation
CNI fall 2009 enhanced publications john_doove-SURFfoundationCNI fall 2009 enhanced publications john_doove-SURFfoundation
CNI fall 2009 enhanced publications john_doove-SURFfoundation
John Doove
 

Similaire à Ontology-based information extraction in the DERI Reading Group (20)

Building a Semantic search Engine in a library
Building a Semantic search Engine in a libraryBuilding a Semantic search Engine in a library
Building a Semantic search Engine in a library
 
Semantic Web, Ontology, and Ontology Learning: Introduction
Semantic Web, Ontology, and Ontology Learning: IntroductionSemantic Web, Ontology, and Ontology Learning: Introduction
Semantic Web, Ontology, and Ontology Learning: Introduction
 
Towards Ontology Development Based on Relational Database
Towards Ontology Development Based on Relational DatabaseTowards Ontology Development Based on Relational Database
Towards Ontology Development Based on Relational Database
 
44rd CEN WS/LT meeting PT social data
44rd CEN WS/LT meeting PT social data44rd CEN WS/LT meeting PT social data
44rd CEN WS/LT meeting PT social data
 
Semantic annotation of biomedical data
Semantic annotation of biomedical dataSemantic annotation of biomedical data
Semantic annotation of biomedical data
 
Graphic Editor For Multilingual Ontologies
Graphic Editor For Multilingual OntologiesGraphic Editor For Multilingual Ontologies
Graphic Editor For Multilingual Ontologies
 
Larflast
LarflastLarflast
Larflast
 
MIREOT
MIREOTMIREOT
MIREOT
 
Semantic Web in Action: Ontology-driven information search, integration and a...
Semantic Web in Action: Ontology-driven information search, integration and a...Semantic Web in Action: Ontology-driven information search, integration and a...
Semantic Web in Action: Ontology-driven information search, integration and a...
 
Frameworks for the Automatic Indexation of Learning Management Systems Conten...
Frameworks for the Automatic Indexation of Learning Management Systems Conten...Frameworks for the Automatic Indexation of Learning Management Systems Conten...
Frameworks for the Automatic Indexation of Learning Management Systems Conten...
 
The Nature of Information
The Nature of InformationThe Nature of Information
The Nature of Information
 
MULTI-LEARNING SPECIAL SESSION / EDUCON 2018 / EMADRID TEAM
MULTI-LEARNING SPECIAL SESSION / EDUCON 2018 / EMADRID TEAMMULTI-LEARNING SPECIAL SESSION / EDUCON 2018 / EMADRID TEAM
MULTI-LEARNING SPECIAL SESSION / EDUCON 2018 / EMADRID TEAM
 
Chapter 1 Introduction to Information Storage and Retrieval.pdf
Chapter 1 Introduction to Information Storage and Retrieval.pdfChapter 1 Introduction to Information Storage and Retrieval.pdf
Chapter 1 Introduction to Information Storage and Retrieval.pdf
 
Collaborative Ontology Building Project
Collaborative Ontology Building Project  Collaborative Ontology Building Project
Collaborative Ontology Building Project
 
NE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISNE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSIS
 
Keynote at AgroLT 2008
Keynote at AgroLT 2008Keynote at AgroLT 2008
Keynote at AgroLT 2008
 
Profile-based Dataset Recommendation for RDF Data Linking
Profile-based Dataset Recommendation for RDF Data Linking  Profile-based Dataset Recommendation for RDF Data Linking
Profile-based Dataset Recommendation for RDF Data Linking
 
CNI fall 2009 enhanced publications john_doove-SURFfoundation
CNI fall 2009 enhanced publications john_doove-SURFfoundationCNI fall 2009 enhanced publications john_doove-SURFfoundation
CNI fall 2009 enhanced publications john_doove-SURFfoundation
 
B0410206010
B0410206010B0410206010
B0410206010
 
Survey on Existing Text Mining Frameworks and A Proposed Idealistic Framework...
Survey on Existing Text Mining Frameworks and A Proposed Idealistic Framework...Survey on Existing Text Mining Frameworks and A Proposed Idealistic Framework...
Survey on Existing Text Mining Frameworks and A Proposed Idealistic Framework...
 

Dernier

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 

Ontology-based information extraction in the DERI Reading Group

  • 1. The DERI Reading Group Ontology-based information extraction: An Overview & Survey (2010, Wimalasuriya and Dou) Tobias Wunner, UNLP Group  Copyright 2010 Digital Enterprise Research Institute. All rights reserved, Paul Buitelaar
  • 2. Definition - Motivation a) Create content for the Semantic Web  convert existing websites into ontologies b) Improve quality of existing ontologies  Test criterion: OBIE task  OBIE good => ontology good
  • 3. Overview  Access to information…
  • 4. Overview  Access to information… Ontologie-based Information Extraction (OBIE): “A system that processes unstructured or semi- structured natural language text guided by an ontology and presents the output in an ontology.
  • 5. Overview  ESWC dogfood OBIE-related topics New!
  • 6. T  1. Text only:  Extract conceptualization and instances County building with café and football table Building is-a 1. conceptualization 2. instances Galway DERI building Problem – two scenarios
  • 7. T County building with café and football table Building is-a 1. conceptualization 2. instances Galway DERI building Problem – two scenarios  conceptualization can be too specific / generic  wrong conceptualization  1. Text only:  Extract conceptualization and instances
  • 8. T City Buildinglocated in Conceptualization by domain ontology 2. instances Galway DERI building Problem – two scenarios  2. Domain ontology & text:  extract instances only
  • 9. T City Buildinglocated in Conceptualization by domain ontology 2. instances Galway DERI building Problem – two scenarios  2. Domain ontology & text:  extract instances only less generic but more semantic stable
  • 10. Definition – key characteristics a) Process structured / unstructured text b) “guided” by an ontology c) Present output in ontology Text Source Information Extractor Ontology guided by
  • 11. Definition – ontology learning or population?  Ontology population ⊂ OBIE  “OBIE is Open information extraction” (Etzioni)  alternative: semantics given by ontology!  extractors can be inside / outside ontology Text Source Information Extractor Ontology guided by
  • 12. Methods  Information extractors 1. Linguistic rules 2. Gazetteer lists 3. Classification (classical / structure-aware) 4. Partial parse trees 5. Structured data analyzers 6. Web querying
  • 13. Linguistic Rules - Methods  Regular expressions  <COMPANY> .* revenue <Number> <currency> “Tesco’s revenue in 2009 was 3.4 billion GBP.”  Extraction ontologies  combination of ontology and lexicon (Mädche, Embley, Buitelaar)  manual construction  High precision
  • 14.  2. Gazetteer lists  Phrases / words instead of patterns  Named-Entity Recognition  Requirements: 1) Specify what is being extracted 2) Specify sources and avoid manual creation Gazetteer Methods Sematic Web Software Energy Supermarket … industry The software giant SAP… Tesco a UK supermarket … Siemens energy revenue… … wind energy company Vestas
  • 15.  3. Classification techniques  Break down IE task in a set of binary tasks Classification Methods pos semTag c1 c2 .. cn Classifier features
  • 16.  Classical Classification Methods Galway Germany DERI Siemens GEIrelandMunich CITEC missclassification does not consider structure! (equal cost 1/6) DERI TescoCladdagh DERI CountryCity SW Energy IndustryLocation
  • 17. W1,6=3  Structure aware Classification Methods Galway Germany Siemens GEIrelandMunich CITEC Classifier should consider taxonomy structure! TescoCladdagh DERI
  • 18.  4. Partial parse trees  TACITUS, SMES, LTAG  5. Analyze structured data  Wikpedia Infoboxes  6. Web querying  C-PANKOW  “Towards the self annotating web Other methods
  • 19. Technologies used in implementation  Shallow NLP (GATE, sProUT, StanfordNLP)  POS, sentence splitting, regular expression  Semantic lexicons (WordNet, GermaNet)  synonym, meronym, hypernym  Semantic Annotation (OCAT, iDocument, PIMO) Missing  Terminological tools (UMLS, bio terminologies)  Thesauri, translation memory
  • 20. Data sets & evaluation Data sets (corpora) 1) Message Understanding Conference (MUC-7) 2) Automatic Content Extraction (ACE)  => more on classical IR, IE, NLP tracks  => no data set with given semantics (ontology) Evaluation  Precision & recall  Only used for population task
  • 21. Recent Open IE argument  Con: Weikum, From Information to Knowledge - Harvest Web Resources for IE  Disambiguation  NL relations are not well defined (well defined arguments)  Pro: Weld, Using Wiki to Bootrap Open IE  Relation targeted:  learn extractor per relation -> lower recall  Structural targeted:  general extraction engine -> lower precision
  • 22. Conclusion and Outlook  No established/ agreed methods yet  Is OBIE also ontology learning?  Data sets  Methods for best extractors  Semantic Web contribution?  eg. Gazetteers from DBPedia  Cross-lingual OBIE -> CLOBIE
  • 23. References [1] Wimalasuriya, Dou, Ontology-based Information Extraction: An Introduction and Survey of current approaches, in Journal of Computer Science, June 2010 [2] Buitelaar et Al., Towards linguistically grounded ontologies., ESWC, Springer, 200 [3] Weikum et Al, From Information to Knowledge – Harvesting Entities and Relationships from Web Sources, Principle Database Systems, 2010 [4] Weld et al., Using Wikipedia to bootstrap open information extraction, Sigmod Record, 2008