SlideShare une entreprise Scribd logo
1  sur  27
SOFIE - A Unified Approach To Ontology-Based Information Extraction Using Reasonig Tobias Wunner Unit for Natural Language Processing (UNLP) firstname.lastname@deri.org Wednesday,22nd June, 2011 DERI, Reading Group 1
Based On: “SOFIE: A Self-Organizing Framework for Information Extraction” Authors: Fabian Suchanek, Mauro Sozio,                       Gerhard Weikum Published: World Wide Web Conference (WWW)                      Madrid, 2009 2
Overview Introduction SOFIE Model + Rules Excursion: Satisfiability SOFIE Approach Evaluation experiments Conclusion 3
Motivation Classical IE on text   pattern-based  80pc Semistructural approach   Wikipedia infoboxes 95% Idea of Paper: combine  use text (hypotheses)  +  ontology (trusted facts) 4
Example 5 Document1 YAGO ontology familyName(AlbertEinstein, Einstein) bornIn(AlbertEinstein, Germany) attendedSchoolIn( AlbertEinstein, Germany) Einstein attended secondary school in Germany. New Knowledge
General Idea Express extraction patterns as fact Rules to understand usage of terms Add restrictions 6 patternOcc(“X went to school in Y”,Einstein, Switzerland) patternOcc(Pattern,X,Y) and R(X,Y) ⇒ express(Pattern,R)
Contribution Unified approach to Pattern matching Word Sense Disambiguation Reasoning Large Scale On Unstructured Data 7
Pattern extraction with WICs Extract patterns based on ‘interesting’ entities 8 Documents Einstein was born at Ulm in Württemberg, Germany, on March 18, 1879. When Albert was around four, his father gave him a magnetic compass.  When Albert became older, he went to a school in Switzerland. After he graduated, he got a job in the patent office there… Knowledge Base patternOcc(“Einstein was born in Ulm”,Einstein@D1, Ulm@D1) [1] patternOcc(“Ulm is in Württemberg, Germany”,Ulm@D1, Germany@D1) [1] patternOcc(“Albert .. Switzerland”,Albert@D1, Switzerland@D1) [1] WICs (Word in Context)
Grounding Test Rules How? find an instance which satisfies the formulae 9 bornIn(Einstein,Ulm) ⇒ ¬bornIn(Einstein,Timbuktu) studiedIn(Einstein,Ulm) bornIn(X,Ulm) ⇒ ¬bornIn(X,Timbuktu) studiedIn(X,Ulm)
Rules (Hypotheses) Disambiguation disambiguatesAs(Albert@D,AlberEinstein)[?] Expresses a new fact expresses(P, livedIn(Einstein,Switzerland) )[?] New facts CityIn(Ulm,Germany)[?] 10
New fact rule ...with disambiguation 11 “Pattern P expresses  Relation R when     analysis of WICs      are disambiguated” patternOcc( P, WX, WY ) and disambiguatesAs(WX, X) and disambiguatesAs(WY, Y) and R(X,Y) ⇒  express( P, R )
Restrictions Disambiguation  disambiguation prior should influence choice of disambiguation 12 N - any disamb. function disambPrior( W, X, N ) ⇒  disambiguatedAs( W, X ) | words(D1) ∩ rel(AlbertEinstein)| | words(D1) |
Restrictions Functional restrictions 13 R(X,Y) and  type(R, function) and different(Y,Z) ⇒ ¬R(X,Z) “Albert@D1 born in?” Albert@D1 ≠ Albert@D2
SOFIE Rules Framework to test the hypotheses Question   “How to satisfy all them?”  rules      +         trusted facts 14 dismbPrior(Albert@D1, AlbertEinstein, 10) ⇒  disambiguatesAs(Albert@D1, AlbertEinstein) patternOcc( P, X, Y ) and R(X,Y) ⇒  express( P, R ) dismbPrior(Albert@D1, HermannEinstein, 3) ⇒  disambiguatesAs(Albert@D1, HermannEinstein)    Country(Germany) livedIn(AlbertEinstein,Ulm)    …
SAT / MAX SAT SAT (Satisfiability) proove formula can be TRUE Complexity Classes P  Good    example:   Nk NP  Bad                     cN e.g. naive algorithm for 100 variables  2100 x 10-10 ms per row = 4 x 1012 y Not always.. 3SAT in (4/3)N SAT Solver 15 F = (X or Y or Z) and (¬X or Y or Z)        and (¬X or ¬Y or ¬Z) G = (X or Y) and (¬X or ¬Y) and (X) truth table has 23 rows Details Schöning 2010
SAT / MAX SAT SAT (Satisfiability) proove formula can be TRUE Complexity Classes P  Good    example:   Nk NP  Bad                     cN e.g. naive algorithm for 100 variables  2100 x 10-10 ms per row = 4 x 1012 y Not always.. 3SAT in (4/3)N SAT Solver MAX SAT 16 F = (X or Y or Z) and (¬X or Y or Z)        and (¬X or ¬Y or ¬Z) G = (X or Y) and (¬X or ¬Y) and (X) truth table has 23 rows Details Schöning 2010
Weighted MAX SAT in SOFIE ...back to SOFIE this is MAX SAT but with weights 17 rules      +     trusted facts    Country(Germany) livedIn(AlbertEinstein,Ulm)    … dismbPrior(Albert@D1, AlbertEinstein, 10) ⇒  disambiguatesAs(Albert@D1, AlbertEinstein) patternOcc( P, X, Y ) and R(X,Y) ⇒  express( P, R ) dismbPrior(Albert@D1, HermannEinstein, 3) ⇒  disambiguatesAs(Albert@D1, HermannEinstein)
Weighted MAX SAT in SOFIE Weighted MAX SAT is NP hard only approximation algorithms  impractical to find optimal solution SAT Solver Johnson’s algorithm:    2/3  (apprx guarantee)
Weighted MAX SAT in SOFIE Functional MAX SAT Specialized reasoning (support for functional properties) Approximation guarantee 1/2 Propagates dominating unit clauses Considers only unit clauses A  v  B    [w1] A  v  B    [w2] B  v  C    [w3] C                 [w4] A  v  B     [10] A             [10] A                [30] A = true 30 > 10+10
Controlled experiment Corpus from Wikipedia infoboxes 100 articles Semantic is known! 20
Controlled experiment Large-scale: Corpus from Wikipedia articles 2000 articles 13 frequent relations from YAGO Parsing 	 = 87min         Reaoning = 77min 21
Unstructured text sources 150 news paper articles relation under test headquarterOf YAGO (modified with relation seeds) Parsing 87min     WeightedMaxSat 77min disambiguated entries (provenance) could be manually assessed 22 functional relation
Unstructured text sources Large-scale: 10 biographies for each of 400 US senators 5 relationships Disambiguation was not ideal for YAGO (13 James Watson) Parsing 7h    W-MAX-SAT  9h Results 4 good 1 bad (misleading patterns) 23
MAX SAT can’t do OWL per se (Open World Assumption) Reformulate OWL in propositional logic OWL  FOL  Skolem Normal Form  Propositional Logic Might find OWL-inconsistent ontologies due to OW Assumption 24 define a student as a subclass “attends some course” ⇒ ∀ x, ∃ y: attends(x,y), Course(y) -> Student(y) ⇒ ∀ x: attends(x,k), Course(y) -> Student(y); ∃ k ⇒ ¬attends(xi, ki) or ¬Course(xi) or Student(xi); k=x1 .. xn Inferred Ontology { Student(alex), Student(bob),   Student subClassOf attends some Course,                                 attends(alex, SemanticWeb) } Details JMC 2010
Conclusions Ontology-based IE (OBIE) reformulated as weighted MAX SAT problem Approximation algorithm with 1/2 Works and scales (large corpus + YAGO) 25
Limitations Specialized approximation algorithm Accounts for SOFIE rules NOT OWL MAX SAT Restrictions ∈ Prepositional Logic ∉ First-Order Logic Ontology population approach (can’t infer new relations) 26
References 27 F Suchanek et al, SOFIE: a self-organizing framework for information extraction, Proceeding WWW '09 Proceedings of the 18th international conference on World wide web, link John McCrae, Automatic Extraction Of Logically Consistent Ontologies From Text, PhD thesis at National Institute of Informatics, Japan, 2009 link Uwe Schöning: Das SAT-Problem. In Informatik Spektrum 33(5): 479-483, 2010, link F Suchanek, Automated Construction and Growth of a Large Ontology, PhD thesis at Technology of Saarland University. Saarbrücken, Germany, 2009, link

Contenu connexe

Tendances

Gamma sag semi ti spaces in topological spaces
 Gamma sag semi ti spaces in topological spaces Gamma sag semi ti spaces in topological spaces
Gamma sag semi ti spaces in topological spacesAlexander Decker
 
11. gamma sag semi ti spaces in topological spaces
11. gamma sag semi ti spaces in topological spaces11. gamma sag semi ti spaces in topological spaces
11. gamma sag semi ti spaces in topological spacesAlexander Decker
 
Intro to Approximate Bayesian Computation (ABC)
Intro to Approximate Bayesian Computation (ABC)Intro to Approximate Bayesian Computation (ABC)
Intro to Approximate Bayesian Computation (ABC)Umberto Picchini
 
Absolute and Relative Clustering
Absolute and Relative ClusteringAbsolute and Relative Clustering
Absolute and Relative ClusteringToshihiro Kamishima
 
Generative models : VAE and GAN
Generative models : VAE and GANGenerative models : VAE and GAN
Generative models : VAE and GANSEMINARGROOT
 
Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...Umberto Picchini
 
Slides econometrics-2018-graduate-4
Slides econometrics-2018-graduate-4Slides econometrics-2018-graduate-4
Slides econometrics-2018-graduate-4Arthur Charpentier
 
Statistics (1): estimation, Chapter 1: Models
Statistics (1): estimation, Chapter 1: ModelsStatistics (1): estimation, Chapter 1: Models
Statistics (1): estimation, Chapter 1: ModelsChristian Robert
 
Note on closed sets in topological spaces
Note on    closed sets in topological spacesNote on    closed sets in topological spaces
Note on closed sets in topological spacesAlexander Decker
 
Lecture 2 predicates quantifiers and rules of inference
Lecture 2 predicates quantifiers and rules of inferenceLecture 2 predicates quantifiers and rules of inference
Lecture 2 predicates quantifiers and rules of inferenceasimnawaz54
 
Accelerated approximate Bayesian computation with applications to protein fol...
Accelerated approximate Bayesian computation with applications to protein fol...Accelerated approximate Bayesian computation with applications to protein fol...
Accelerated approximate Bayesian computation with applications to protein fol...Umberto Picchini
 
An Overview of Separation Axioms by Nearly Open Sets in Topology.
An Overview of Separation Axioms by Nearly Open Sets in Topology.An Overview of Separation Axioms by Nearly Open Sets in Topology.
An Overview of Separation Axioms by Nearly Open Sets in Topology.IJERA Editor
 
Predicates and Quantifiers
Predicates and Quantifiers Predicates and Quantifiers
Predicates and Quantifiers Istiak Ahmed
 
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...Christian Robert
 
My data are incomplete and noisy: Information-reduction statistical methods f...
My data are incomplete and noisy: Information-reduction statistical methods f...My data are incomplete and noisy: Information-reduction statistical methods f...
My data are incomplete and noisy: Information-reduction statistical methods f...Umberto Picchini
 

Tendances (20)

Gamma sag semi ti spaces in topological spaces
 Gamma sag semi ti spaces in topological spaces Gamma sag semi ti spaces in topological spaces
Gamma sag semi ti spaces in topological spaces
 
11. gamma sag semi ti spaces in topological spaces
11. gamma sag semi ti spaces in topological spaces11. gamma sag semi ti spaces in topological spaces
11. gamma sag semi ti spaces in topological spaces
 
Intro to Approximate Bayesian Computation (ABC)
Intro to Approximate Bayesian Computation (ABC)Intro to Approximate Bayesian Computation (ABC)
Intro to Approximate Bayesian Computation (ABC)
 
Absolute and Relative Clustering
Absolute and Relative ClusteringAbsolute and Relative Clustering
Absolute and Relative Clustering
 
Generative models : VAE and GAN
Generative models : VAE and GANGenerative models : VAE and GAN
Generative models : VAE and GAN
 
Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...
 
Slides econometrics-2018-graduate-4
Slides econometrics-2018-graduate-4Slides econometrics-2018-graduate-4
Slides econometrics-2018-graduate-4
 
Statistics (1): estimation, Chapter 1: Models
Statistics (1): estimation, Chapter 1: ModelsStatistics (1): estimation, Chapter 1: Models
Statistics (1): estimation, Chapter 1: Models
 
Note on closed sets in topological spaces
Note on    closed sets in topological spacesNote on    closed sets in topological spaces
Note on closed sets in topological spaces
 
Lecture 2 predicates quantifiers and rules of inference
Lecture 2 predicates quantifiers and rules of inferenceLecture 2 predicates quantifiers and rules of inference
Lecture 2 predicates quantifiers and rules of inference
 
Verification of Data-Aware Processes at ESSLLI 2017 3/6 - Verification Logics
Verification of Data-Aware Processes at ESSLLI 2017 3/6 - Verification LogicsVerification of Data-Aware Processes at ESSLLI 2017 3/6 - Verification Logics
Verification of Data-Aware Processes at ESSLLI 2017 3/6 - Verification Logics
 
Accelerated approximate Bayesian computation with applications to protein fol...
Accelerated approximate Bayesian computation with applications to protein fol...Accelerated approximate Bayesian computation with applications to protein fol...
Accelerated approximate Bayesian computation with applications to protein fol...
 
An Overview of Separation Axioms by Nearly Open Sets in Topology.
An Overview of Separation Axioms by Nearly Open Sets in Topology.An Overview of Separation Axioms by Nearly Open Sets in Topology.
An Overview of Separation Axioms by Nearly Open Sets in Topology.
 
MarkDrachMeinelThesisFinal
MarkDrachMeinelThesisFinalMarkDrachMeinelThesisFinal
MarkDrachMeinelThesisFinal
 
QMC: Operator Splitting Workshop, Composite Infimal Convolutions - Zev Woodst...
QMC: Operator Splitting Workshop, Composite Infimal Convolutions - Zev Woodst...QMC: Operator Splitting Workshop, Composite Infimal Convolutions - Zev Woodst...
QMC: Operator Splitting Workshop, Composite Infimal Convolutions - Zev Woodst...
 
Predicates and Quantifiers
Predicates and Quantifiers Predicates and Quantifiers
Predicates and Quantifiers
 
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
 
My data are incomplete and noisy: Information-reduction statistical methods f...
My data are incomplete and noisy: Information-reduction statistical methods f...My data are incomplete and noisy: Information-reduction statistical methods f...
My data are incomplete and noisy: Information-reduction statistical methods f...
 
CONTINUITY ON N-ARY SPACES
CONTINUITY ON N-ARY SPACESCONTINUITY ON N-ARY SPACES
CONTINUITY ON N-ARY SPACES
 
SASA 2016
SASA 2016SASA 2016
SASA 2016
 

Similaire à SOFIE - A Unified Approach To Ontology-Based Information Extraction Using Reasonig

Herbrand-satisfiability of a Quantified Set-theoretical Fragment (Cantone, Lo...
Herbrand-satisfiability of a Quantified Set-theoretical Fragment (Cantone, Lo...Herbrand-satisfiability of a Quantified Set-theoretical Fragment (Cantone, Lo...
Herbrand-satisfiability of a Quantified Set-theoretical Fragment (Cantone, Lo...Cristiano Longo
 
20130928 automated theorem_proving_harrison
20130928 automated theorem_proving_harrison20130928 automated theorem_proving_harrison
20130928 automated theorem_proving_harrisonComputer Science Club
 
Cuckoo Search Algorithm: An Introduction
Cuckoo Search Algorithm: An IntroductionCuckoo Search Algorithm: An Introduction
Cuckoo Search Algorithm: An IntroductionXin-She Yang
 
PAGOdA paper
PAGOdA paperPAGOdA paper
PAGOdA paperDBOnto
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?Christian Robert
 
Mechanizing set theory: cardinal arithmetic and the axiom of choice
Mechanizing set theory: cardinal arithmetic and the axiom of choiceMechanizing set theory: cardinal arithmetic and the axiom of choice
Mechanizing set theory: cardinal arithmetic and the axiom of choiceLawrence Paulson
 
FOLBUKCFAIZ.pptx
FOLBUKCFAIZ.pptxFOLBUKCFAIZ.pptx
FOLBUKCFAIZ.pptxFaiz Zeya
 
Regression on gaussian symbols
Regression on gaussian symbolsRegression on gaussian symbols
Regression on gaussian symbolsAxel de Romblay
 
Fosdem 2013 petra selmer flexible querying of graph data
Fosdem 2013 petra selmer   flexible querying of graph dataFosdem 2013 petra selmer   flexible querying of graph data
Fosdem 2013 petra selmer flexible querying of graph dataPetra Selmer
 
Poggi analytics - star - 1a
Poggi   analytics - star - 1aPoggi   analytics - star - 1a
Poggi analytics - star - 1aGaston Liberman
 
Jarrar.lecture notes.aai.2011s.ch7.p logic
Jarrar.lecture notes.aai.2011s.ch7.p logicJarrar.lecture notes.aai.2011s.ch7.p logic
Jarrar.lecture notes.aai.2011s.ch7.p logicPalGov
 
RuleML 2015
RuleML 2015RuleML 2015
RuleML 2015livpre
 
An Implicit Cover Problem In Wild Population Study
An Implicit Cover Problem In Wild Population StudyAn Implicit Cover Problem In Wild Population Study
An Implicit Cover Problem In Wild Population StudyMichele Thomas
 
Introduction to set theory by william a r weiss professor
Introduction to set theory by william a r weiss professorIntroduction to set theory by william a r weiss professor
Introduction to set theory by william a r weiss professormanrak
 
Imprecision in learning: an overview
Imprecision in learning: an overviewImprecision in learning: an overview
Imprecision in learning: an overviewSebastien Destercke
 
My 2hr+ survey talk at the Vector Institute, on our deep learning theorems.
My 2hr+ survey talk at the Vector Institute, on our deep learning theorems.My 2hr+ survey talk at the Vector Institute, on our deep learning theorems.
My 2hr+ survey talk at the Vector Institute, on our deep learning theorems.Anirbit Mukherjee
 

Similaire à SOFIE - A Unified Approach To Ontology-Based Information Extraction Using Reasonig (20)

Herbrand-satisfiability of a Quantified Set-theoretical Fragment (Cantone, Lo...
Herbrand-satisfiability of a Quantified Set-theoretical Fragment (Cantone, Lo...Herbrand-satisfiability of a Quantified Set-theoretical Fragment (Cantone, Lo...
Herbrand-satisfiability of a Quantified Set-theoretical Fragment (Cantone, Lo...
 
20130928 automated theorem_proving_harrison
20130928 automated theorem_proving_harrison20130928 automated theorem_proving_harrison
20130928 automated theorem_proving_harrison
 
Cuckoo Search Algorithm: An Introduction
Cuckoo Search Algorithm: An IntroductionCuckoo Search Algorithm: An Introduction
Cuckoo Search Algorithm: An Introduction
 
Fol
FolFol
Fol
 
Hidden Markov Models
Hidden Markov ModelsHidden Markov Models
Hidden Markov Models
 
PAGOdA paper
PAGOdA paperPAGOdA paper
PAGOdA paper
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
 
Mechanizing set theory: cardinal arithmetic and the axiom of choice
Mechanizing set theory: cardinal arithmetic and the axiom of choiceMechanizing set theory: cardinal arithmetic and the axiom of choice
Mechanizing set theory: cardinal arithmetic and the axiom of choice
 
FOLBUKCFAIZ.pptx
FOLBUKCFAIZ.pptxFOLBUKCFAIZ.pptx
FOLBUKCFAIZ.pptx
 
dma_ppt.pdf
dma_ppt.pdfdma_ppt.pdf
dma_ppt.pdf
 
Regression on gaussian symbols
Regression on gaussian symbolsRegression on gaussian symbols
Regression on gaussian symbols
 
Fosdem 2013 petra selmer flexible querying of graph data
Fosdem 2013 petra selmer   flexible querying of graph dataFosdem 2013 petra selmer   flexible querying of graph data
Fosdem 2013 petra selmer flexible querying of graph data
 
Poggi analytics - star - 1a
Poggi   analytics - star - 1aPoggi   analytics - star - 1a
Poggi analytics - star - 1a
 
Jarrar.lecture notes.aai.2011s.ch7.p logic
Jarrar.lecture notes.aai.2011s.ch7.p logicJarrar.lecture notes.aai.2011s.ch7.p logic
Jarrar.lecture notes.aai.2011s.ch7.p logic
 
RuleML 2015
RuleML 2015RuleML 2015
RuleML 2015
 
An Implicit Cover Problem In Wild Population Study
An Implicit Cover Problem In Wild Population StudyAn Implicit Cover Problem In Wild Population Study
An Implicit Cover Problem In Wild Population Study
 
Introduction to set theory by william a r weiss professor
Introduction to set theory by william a r weiss professorIntroduction to set theory by william a r weiss professor
Introduction to set theory by william a r weiss professor
 
Imprecision in learning: an overview
Imprecision in learning: an overviewImprecision in learning: an overview
Imprecision in learning: an overview
 
My 2hr+ survey talk at the Vector Institute, on our deep learning theorems.
My 2hr+ survey talk at the Vector Institute, on our deep learning theorems.My 2hr+ survey talk at the Vector Institute, on our deep learning theorems.
My 2hr+ survey talk at the Vector Institute, on our deep learning theorems.
 
Theory of computing
Theory of computingTheory of computing
Theory of computing
 

Dernier

microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 

Dernier (20)

microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 

SOFIE - A Unified Approach To Ontology-Based Information Extraction Using Reasonig

  • 1. SOFIE - A Unified Approach To Ontology-Based Information Extraction Using Reasonig Tobias Wunner Unit for Natural Language Processing (UNLP) firstname.lastname@deri.org Wednesday,22nd June, 2011 DERI, Reading Group 1
  • 2. Based On: “SOFIE: A Self-Organizing Framework for Information Extraction” Authors: Fabian Suchanek, Mauro Sozio, Gerhard Weikum Published: World Wide Web Conference (WWW) Madrid, 2009 2
  • 3. Overview Introduction SOFIE Model + Rules Excursion: Satisfiability SOFIE Approach Evaluation experiments Conclusion 3
  • 4. Motivation Classical IE on text pattern-based  80pc Semistructural approach Wikipedia infoboxes 95% Idea of Paper: combine use text (hypotheses) + ontology (trusted facts) 4
  • 5. Example 5 Document1 YAGO ontology familyName(AlbertEinstein, Einstein) bornIn(AlbertEinstein, Germany) attendedSchoolIn( AlbertEinstein, Germany) Einstein attended secondary school in Germany. New Knowledge
  • 6. General Idea Express extraction patterns as fact Rules to understand usage of terms Add restrictions 6 patternOcc(“X went to school in Y”,Einstein, Switzerland) patternOcc(Pattern,X,Y) and R(X,Y) ⇒ express(Pattern,R)
  • 7. Contribution Unified approach to Pattern matching Word Sense Disambiguation Reasoning Large Scale On Unstructured Data 7
  • 8. Pattern extraction with WICs Extract patterns based on ‘interesting’ entities 8 Documents Einstein was born at Ulm in Württemberg, Germany, on March 18, 1879. When Albert was around four, his father gave him a magnetic compass. When Albert became older, he went to a school in Switzerland. After he graduated, he got a job in the patent office there… Knowledge Base patternOcc(“Einstein was born in Ulm”,Einstein@D1, Ulm@D1) [1] patternOcc(“Ulm is in Württemberg, Germany”,Ulm@D1, Germany@D1) [1] patternOcc(“Albert .. Switzerland”,Albert@D1, Switzerland@D1) [1] WICs (Word in Context)
  • 9. Grounding Test Rules How? find an instance which satisfies the formulae 9 bornIn(Einstein,Ulm) ⇒ ¬bornIn(Einstein,Timbuktu) studiedIn(Einstein,Ulm) bornIn(X,Ulm) ⇒ ¬bornIn(X,Timbuktu) studiedIn(X,Ulm)
  • 10. Rules (Hypotheses) Disambiguation disambiguatesAs(Albert@D,AlberEinstein)[?] Expresses a new fact expresses(P, livedIn(Einstein,Switzerland) )[?] New facts CityIn(Ulm,Germany)[?] 10
  • 11. New fact rule ...with disambiguation 11 “Pattern P expresses Relation R when analysis of WICs are disambiguated” patternOcc( P, WX, WY ) and disambiguatesAs(WX, X) and disambiguatesAs(WY, Y) and R(X,Y) ⇒ express( P, R )
  • 12. Restrictions Disambiguation disambiguation prior should influence choice of disambiguation 12 N - any disamb. function disambPrior( W, X, N ) ⇒ disambiguatedAs( W, X ) | words(D1) ∩ rel(AlbertEinstein)| | words(D1) |
  • 13. Restrictions Functional restrictions 13 R(X,Y) and type(R, function) and different(Y,Z) ⇒ ¬R(X,Z) “Albert@D1 born in?” Albert@D1 ≠ Albert@D2
  • 14. SOFIE Rules Framework to test the hypotheses Question “How to satisfy all them?” rules + trusted facts 14 dismbPrior(Albert@D1, AlbertEinstein, 10) ⇒ disambiguatesAs(Albert@D1, AlbertEinstein) patternOcc( P, X, Y ) and R(X,Y) ⇒ express( P, R ) dismbPrior(Albert@D1, HermannEinstein, 3) ⇒ disambiguatesAs(Albert@D1, HermannEinstein) Country(Germany) livedIn(AlbertEinstein,Ulm) …
  • 15. SAT / MAX SAT SAT (Satisfiability) proove formula can be TRUE Complexity Classes P  Good example: Nk NP  Bad cN e.g. naive algorithm for 100 variables  2100 x 10-10 ms per row = 4 x 1012 y Not always.. 3SAT in (4/3)N SAT Solver 15 F = (X or Y or Z) and (¬X or Y or Z) and (¬X or ¬Y or ¬Z) G = (X or Y) and (¬X or ¬Y) and (X) truth table has 23 rows Details Schöning 2010
  • 16. SAT / MAX SAT SAT (Satisfiability) proove formula can be TRUE Complexity Classes P  Good example: Nk NP  Bad cN e.g. naive algorithm for 100 variables  2100 x 10-10 ms per row = 4 x 1012 y Not always.. 3SAT in (4/3)N SAT Solver MAX SAT 16 F = (X or Y or Z) and (¬X or Y or Z) and (¬X or ¬Y or ¬Z) G = (X or Y) and (¬X or ¬Y) and (X) truth table has 23 rows Details Schöning 2010
  • 17. Weighted MAX SAT in SOFIE ...back to SOFIE this is MAX SAT but with weights 17 rules + trusted facts Country(Germany) livedIn(AlbertEinstein,Ulm) … dismbPrior(Albert@D1, AlbertEinstein, 10) ⇒ disambiguatesAs(Albert@D1, AlbertEinstein) patternOcc( P, X, Y ) and R(X,Y) ⇒ express( P, R ) dismbPrior(Albert@D1, HermannEinstein, 3) ⇒ disambiguatesAs(Albert@D1, HermannEinstein)
  • 18. Weighted MAX SAT in SOFIE Weighted MAX SAT is NP hard only approximation algorithms  impractical to find optimal solution SAT Solver Johnson’s algorithm:  2/3 (apprx guarantee)
  • 19. Weighted MAX SAT in SOFIE Functional MAX SAT Specialized reasoning (support for functional properties) Approximation guarantee 1/2 Propagates dominating unit clauses Considers only unit clauses A v B [w1] A v B [w2] B v C [w3] C [w4] A v B [10] A [10] A [30] A = true 30 > 10+10
  • 20. Controlled experiment Corpus from Wikipedia infoboxes 100 articles Semantic is known! 20
  • 21. Controlled experiment Large-scale: Corpus from Wikipedia articles 2000 articles 13 frequent relations from YAGO Parsing = 87min Reaoning = 77min 21
  • 22. Unstructured text sources 150 news paper articles relation under test headquarterOf YAGO (modified with relation seeds) Parsing 87min WeightedMaxSat 77min disambiguated entries (provenance) could be manually assessed 22 functional relation
  • 23. Unstructured text sources Large-scale: 10 biographies for each of 400 US senators 5 relationships Disambiguation was not ideal for YAGO (13 James Watson) Parsing 7h W-MAX-SAT 9h Results 4 good 1 bad (misleading patterns) 23
  • 24. MAX SAT can’t do OWL per se (Open World Assumption) Reformulate OWL in propositional logic OWL  FOL  Skolem Normal Form  Propositional Logic Might find OWL-inconsistent ontologies due to OW Assumption 24 define a student as a subclass “attends some course” ⇒ ∀ x, ∃ y: attends(x,y), Course(y) -> Student(y) ⇒ ∀ x: attends(x,k), Course(y) -> Student(y); ∃ k ⇒ ¬attends(xi, ki) or ¬Course(xi) or Student(xi); k=x1 .. xn Inferred Ontology { Student(alex), Student(bob), Student subClassOf attends some Course, attends(alex, SemanticWeb) } Details JMC 2010
  • 25. Conclusions Ontology-based IE (OBIE) reformulated as weighted MAX SAT problem Approximation algorithm with 1/2 Works and scales (large corpus + YAGO) 25
  • 26. Limitations Specialized approximation algorithm Accounts for SOFIE rules NOT OWL MAX SAT Restrictions ∈ Prepositional Logic ∉ First-Order Logic Ontology population approach (can’t infer new relations) 26
  • 27. References 27 F Suchanek et al, SOFIE: a self-organizing framework for information extraction, Proceeding WWW '09 Proceedings of the 18th international conference on World wide web, link John McCrae, Automatic Extraction Of Logically Consistent Ontologies From Text, PhD thesis at National Institute of Informatics, Japan, 2009 link Uwe Schöning: Das SAT-Problem. In Informatik Spektrum 33(5): 479-483, 2010, link F Suchanek, Automated Construction and Growth of a Large Ontology, PhD thesis at Technology of Saarland University. Saarbrücken, Germany, 2009, link