SlideShare une entreprise Scribd logo
1  sur  15
NEXT: Background
SCID: Semantic Co-reference Inaccuracy Detection - [ INTRODUCTION ]
Filtering Inaccurate Entity Co-references on the Linked Open Data
John Cuzzola, Jelena Jovanovic, Ebrahim Bagheri
bagheri@ryerson.ca
DEXA 2015
The Linked-Open-Data (LOD)
cloud represents hundreds of
available datasets throughout
the Web.
❖ 570 datasets and 2909 linkage
relationships between the datasets.1
1. http://data.dws.informatik.uni-mannheim.de/lodcloud/2014/
NEXT: How are datasets linked?
SCID: Semantic Co-reference Inaccuracy Detection - [ BACKGROUND ]
To utilize the data from
multiple ontologies within
the LOD, “equivalence”
relationships between
concepts is necessary (ie:
the “edges” or linkages of
the LOD must be defined).
570 datasets and 2909 linkage
relationships between the datasets.
?
Equivalence relationships
between DBPedia and
Freebase?
NEXT: The sameAs predicate
SCID: Semantic Co-reference Inaccuracy Detection - [ BACKGROUND ]
The equivalency relationship is often accomplished via the predicate owl:sameAs
<owl:sameAs>
http://rdf.freebase.com/ns/en.doghttp://dbpedia.org/resource/Dog
NEXT: sameAs linkage mistakes
SCID: Semantic Co-reference Inaccuracy Detection - [ BACKGROUND ]
<owl:sameAs>
http://rdf.freebase.com/ns/en.bitchhttp://dbpedia.org/resource/Dog
NOT the same!
X
ns:common.topic.description:
"Bitch, literally meaning a female
dog, is a common slang term in the
English language, especially used
as a denigrating term applied to a
person, commonly a woman”
dbo:abstract:
The domestic dog (Canis lupus
familiaris) is a usually furry,
carnivorous member of the canidae
family.
The Problem: There are many incorrect
LOD linkages using owl:sameAs.
The Effect: Incorrect (embarrassing)
assertions by reasoners that use the LOD.
Example:
(from http://www.sameas.org)
NEXT: SCID
SCID: Semantic Co-reference Inaccuracy Detection - [ PROBLEM / MOTIVATION ]
SCID: Semantic Co-reference Inaccuracy Detection
❖ A method of natural language
analysis for detecting incorrect
owl:sameAs assertions.
1. Construct a baseline comparison vector vb(x,Sx).
2. For each resource (1,2,...) claiming to be the “same”,
construct vectors v1(x1,Sx), v2(x2,Sx) …
3. Compare individual distances from v1(x1,Sx),
v2(x2,Sx) … to baseline vb(x,Sx)
4. Disregard those v1(x1,Sx), v2(x2,Sx) … that are
outside some threshold distance δ.
NEXT: The core functions of SCID.
UPCOMING: How is vb(x,Sx) and v1(x1,Sx), v2(x2,Sx) … made?
SCID: Semantic Co-reference Inaccuracy Detection - [ CONTRIBUTION ]
SCID: Semantic Co-reference Inaccuracy Detection - [ METHOD ]
SCID depends on two key functions:
1. A category distribution function: ρ(t,S) .
Given some natural language text (t) and a set of “suitable” subject categories (S) for t,
compute a distribution vector of how t relates to each subject category of S.
1. A category selection function S(uri).
Given a resource (uri), return a “suitable” set of subject categories (S) that can be used
in ρ(t,S).
NEXT: The category distribution function.
UPCOMING: The category selection function.
SCID: Semantic Co-reference Inaccuracy Detection - [ METHOD ]
The category distribution function:
ρ(t,S) .
Ex: Given input text (t) as shown and three
DBpedia subject categories of S=[Fruit, Oranges,
Color] ρ(t,S) produces output:
ρ(t,[Fruit, Oranges, Color]) = v1(x1,S)
= [ 0.27Fruit, 0.50Oranges, 0.22Color ]
NEXT: The category selection function
UPCOMING: How is baseline vector vb(x,Sx) computed and compared to v1(x1,S)?
● Computes Rx,k defined as the importance of word x to category k for every word in t.
○ uses 5 features: (1) count of x in k, (2) count of x across all k, (3) count of concepts where
word x appears, (4) ratio of x in k to vocabulary of all k, (5) average word frequency of x per
resource in k.
SCID: Semantic Co-reference Inaccuracy Detection - [ METHOD ]
The category selection function: Suri .
⇶ DBpedia contains 656,000+ category:subjects.
How do we select a few suitable for ρ(t,S)?
1. Begin with a candidate resource (uri):
http://dbpedia.org/resource/Orange_(fruit)
2. Find a DBpedia disambiguation page:
http://dbpedia.org/resource/Orange_(disambiguation)
3. Combine (union of) the subject categories for each of these resources.
Suri = [ category: { Optical_Spectrum, Oranges, Citrus_hybrids, Tropical_agriculture,
American_punk_rock, Rock_music, Hellcat_Records } ]
NEXT: How do we compute v1(x1,S) for sameAs inaccuracy filtering?
UPCOMING: How is baseline vector vb(x,Sx) computed and compared to v1(x1,S)?
 dbr:Orange_(colour)
 dbr:Orange_(fruit)
 dbr:Orange_(band)
 category:Optical_Spectrum  category: Oranges
 category:Citrus_hybrids
 category:Tropical_agriculture
 ...
 category: American_punk_rock
 category: Rock_music
 category: Hellcat_Records
SCID: Semantic Co-reference Inaccuracy Detection - [ METHOD ]
NEXT: How is baseline vector vb(x,Sx) computed and compared to v1(x1,S)?
UPCOMING: Experimental results
http://www.sameas.org
● dbr:Port
● www.w3.org:synset-seaport-noun-1
● rdf.freebase.com:en.port
● sw.opencyc.org:Seaport
● rdf.freebase.com:River_port
● dbr:Bad_conduct
● rdf.freebase.com:en.military_discharge
● dbr:IVDP
● rdf.freebase.com:en.port_wine
How do we compute v1(t1,S), v2(t2,S), .. for sameAs inaccuracy filtering?
1. Start with a group of resources that are identified as sameAs:
Ex: http://dbpedia.org/resource/Port (dbr:Port)
2. Collect subject categories Sdbr:Port using category selection function.
3. For each of the sameAs resources, collect natural language text (t)
describing the resource. Collect (t) using dbpedia rdfs:comment,
freebase ns.common.topic.description, www3.org wn20schema:gloss.
4. Compute vectors v1(t1,Sdbr:Port), v2(t2,Sdbr:Port)..., t1= rdfs:comment of
dbr:Port, t2= ns.common.topic.description of rdf.freebase:River_port, …
using category distribution function ρ(t,Sdbr:port).
We now have individual v1,2..(t1,2..,Sdbr:Port) vectors. Only need base
vector vb(tb,S) for comparison.
SCID: Semantic Co-reference Inaccuracy Detection - [ METHOD ]
NEXT: Experimental results
UPCOMING: Conclusion
How is baseline vector vb(x,Sx) computed and compared to v1(x1,S)?
1. Retrieve subject:categories of candidate resource from DBpedia
Ex: http://dbpedia.org/resource/Port (dbr:Port)
2. Find (all) other resources that use the categories of the candidate resource. Concatenate
rdfs:comment from all these resources (t).
3. Compute vb(t,Sdbr:Port) using category distribution function ρ(t,Sdbr:port).
● We now have base vector vb(t,Sdbr:Port) and can be compared to
individual sameAs vectors v1,2(x1,2,Sdbr:Port).
● We use Pearson Correlation Coefficient (PCC) to compare vectors.
● Remove vectors whose PCC less than threshold δ.
http://www.dbpedia.org
 category:Nautical_terms
 category:Ports_and_harbours
SCID: Semantic Co-reference Inaccuracy Detection - [ EXPERIMENTATION ]
NEXT: Experimental results continued.
UPCOMING: Conclusion
● We examined 7,690 resources obtained from www.sameAs.org database of five topics:
○ Animal, City, Person, Color, and Miscellaneous.
● We performed some data cleansing on these resources.
○ removal of: duplicate resources (ie: aliases/redirects), broken links, redundant resources (ie: dbpedialite
is a subset of DBpedia).
● After cleansing 411 unique resources remained with 251 errors identified by human oracle
○ ie: http://dbpedia.org/resource/Dog is not the same as http://rdf.freebase.com/ns/en.bitch
SCID: Semantic Co-reference Inaccuracy Detection - [ EXPERIMENTATION ]
● We computed v411(t,S) individual vectors for all 411 resources with associated baseline comparison
vector.
● We computed Pearson Correlation
between v411(t,S) and baseline.
● Removed identity links based on thresholds
ranging from 0.0 to 0.90. F-score
calculated.for each threshold used.
○ Original 411 resources contained
160 correct / 251 incorrect sameAs
links (0.560 F-score)
○ Threshold (δ) of 0.50 and 0.60 gave
best F-score.
NEXT: Experimental results continued.
UPCOMING: Conclusion
SCID: Semantic Co-reference Inaccuracy Detection - [ EXPERIMENTATION ]
Scatter plot of F-score versus Pearson Correlation Coefficient for oracle-identified right(blue) and
wrong(red) identity links.
PEARSON wrong right
δ
NEXT: Conclusion
SCID: Semantic Co-reference Inaccuracy Detection - [ CONCLUSION ]
-- END --
● In this presentation:
○ we introduce SCID: A technique for discovering inaccuracies in identity links assertion
(owl:sameAs).
○ Experimental results indicate SCID can identify incorrect identity link assertions and improve
precision of an identity database (http://www.sameas.org).
● In the future:
○ Experimentation with identity links other than owl:sameAs (ie: skos:closeMatch,
skos:exactMatch, owl:equivalentClasses).
○ Experimentation with vector comparison methods other than Pearson Correlation (ie: cosine
similarity, euclidean distance, Spearman rank coefficient).

Contenu connexe

Tendances

20130222 Data structures and manipulation in R
20130222 Data structures and manipulation in R20130222 Data structures and manipulation in R
20130222 Data structures and manipulation in R
Kazuki Yoshida
 
Mid-Ontology Learning from Linked Data @JIST2011
Mid-Ontology Learning from Linked Data @JIST2011Mid-Ontology Learning from Linked Data @JIST2011
Mid-Ontology Learning from Linked Data @JIST2011
Lihua Zhao
 

Tendances (20)

RDF Validation Future work and applications
RDF Validation Future work and applicationsRDF Validation Future work and applications
RDF Validation Future work and applications
 
20130222 Data structures and manipulation in R
20130222 Data structures and manipulation in R20130222 Data structures and manipulation in R
20130222 Data structures and manipulation in R
 
File handling CBSE CLASS 12
File handling CBSE CLASS 12File handling CBSE CLASS 12
File handling CBSE CLASS 12
 
Sparql
SparqlSparql
Sparql
 
Data Trajectories: tracking the reuse of published data for transitive credi...
Data Trajectories: tracking the reuse of published datafor transitive credi...Data Trajectories: tracking the reuse of published datafor transitive credi...
Data Trajectories: tracking the reuse of published data for transitive credi...
 
Oshs_9_11_2015
Oshs_9_11_2015Oshs_9_11_2015
Oshs_9_11_2015
 
XSPARQL CrEDIBLE workshop
XSPARQL CrEDIBLE workshopXSPARQL CrEDIBLE workshop
XSPARQL CrEDIBLE workshop
 
Validating RDF data: Challenges and perspectives
Validating RDF data: Challenges and perspectivesValidating RDF data: Challenges and perspectives
Validating RDF data: Challenges and perspectives
 
Database
DatabaseDatabase
Database
 
Detection of Related Semantic Datasets Based on Frequent Subgraph Mining
Detection of Related Semantic Datasets Based on Frequent Subgraph MiningDetection of Related Semantic Datasets Based on Frequent Subgraph Mining
Detection of Related Semantic Datasets Based on Frequent Subgraph Mining
 
Session 17 - Collections - Lists, Sets
Session 17 - Collections - Lists, SetsSession 17 - Collections - Lists, Sets
Session 17 - Collections - Lists, Sets
 
OODB
OODBOODB
OODB
 
Reading Data into R
Reading Data into RReading Data into R
Reading Data into R
 
Oodb
OodbOodb
Oodb
 
Instance-Based Ontological Knowledge Acquisition
Instance-Based Ontological Knowledge AcquisitionInstance-Based Ontological Knowledge Acquisition
Instance-Based Ontological Knowledge Acquisition
 
Mid-Ontology Learning from Linked Data @JIST2011
Mid-Ontology Learning from Linked Data @JIST2011Mid-Ontology Learning from Linked Data @JIST2011
Mid-Ontology Learning from Linked Data @JIST2011
 
Search Me: Using Lucene.Net
Search Me: Using Lucene.NetSearch Me: Using Lucene.Net
Search Me: Using Lucene.Net
 
[Master Thesis]: SPARQL Query Rewriting with Paths
[Master Thesis]: SPARQL Query Rewriting with Paths[Master Thesis]: SPARQL Query Rewriting with Paths
[Master Thesis]: SPARQL Query Rewriting with Paths
 
Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...
 
Learning to assess Linked Data relationships using Genetic Programming
Learning to assess Linked Data relationships using Genetic ProgrammingLearning to assess Linked Data relationships using Genetic Programming
Learning to assess Linked Data relationships using Genetic Programming
 

En vedette

En vedette (9)

Simplicity, Innovation and Entrepreneurship
Simplicity, Innovation and EntrepreneurshipSimplicity, Innovation and Entrepreneurship
Simplicity, Innovation and Entrepreneurship
 
Exploratory Social Network Analysis: Ranking
Exploratory Social Network Analysis: RankingExploratory Social Network Analysis: Ranking
Exploratory Social Network Analysis: Ranking
 
Modeling Semantics of Content on Twitter
Modeling Semantics of Content on TwitterModeling Semantics of Content on Twitter
Modeling Semantics of Content on Twitter
 
Exploratory Social Network Analysis with Pajek: Blockmodels
Exploratory Social Network Analysis with Pajek: BlockmodelsExploratory Social Network Analysis with Pajek: Blockmodels
Exploratory Social Network Analysis with Pajek: Blockmodels
 
Latent Community Analysis: PhD Proposal
Latent Community Analysis: PhD ProposalLatent Community Analysis: PhD Proposal
Latent Community Analysis: PhD Proposal
 
WSDM16: Temporal Formation and Evolution of Online Communities
WSDM16: Temporal Formation and Evolution of Online CommunitiesWSDM16: Temporal Formation and Evolution of Online Communities
WSDM16: Temporal Formation and Evolution of Online Communities
 
Moviesion: Content-based Movie Recommender Fueled by Linked Open Data
Moviesion: Content-based Movie Recommender Fueled by Linked Open DataMoviesion: Content-based Movie Recommender Fueled by Linked Open Data
Moviesion: Content-based Movie Recommender Fueled by Linked Open Data
 
Software Test
Software TestSoftware Test
Software Test
 
Slides ecir2016
Slides ecir2016Slides ecir2016
Slides ecir2016
 

Similaire à Filtering Inaccurate Entity Co-references on the Linked Open Data

Tools for Integrating Heterogeneous Data Sources from a User Perspective
Tools for Integrating Heterogeneous Data Sources from a User PerspectiveTools for Integrating Heterogeneous Data Sources from a User Perspective
Tools for Integrating Heterogeneous Data Sources from a User Perspective
Jie Bao
 
Copy of 10text (2)
Copy of 10text (2)Copy of 10text (2)
Copy of 10text (2)
Uma Se
 
Query Translation for Ontology-extended Data Sources
Query Translation for Ontology-extended Data SourcesQuery Translation for Ontology-extended Data Sources
Query Translation for Ontology-extended Data Sources
Jie Bao
 

Similaire à Filtering Inaccurate Entity Co-references on the Linked Open Data (20)

bridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webbridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the web
 
Ch03 Mining Massive Data Sets stanford
Ch03 Mining Massive Data Sets  stanfordCh03 Mining Massive Data Sets  stanford
Ch03 Mining Massive Data Sets stanford
 
Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)
 
Data Science and Analytics Brown Bag
Data Science and Analytics Brown BagData Science and Analytics Brown Bag
Data Science and Analytics Brown Bag
 
Framester and WFD
Framester and WFD Framester and WFD
Framester and WFD
 
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
 
Quantifying the bias in data links
Quantifying the bias in data linksQuantifying the bias in data links
Quantifying the bias in data links
 
Rdf data-model-and-storage
Rdf data-model-and-storageRdf data-model-and-storage
Rdf data-model-and-storage
 
Tools for Integrating Heterogeneous Data Sources from a User Perspective
Tools for Integrating Heterogeneous Data Sources from a User PerspectiveTools for Integrating Heterogeneous Data Sources from a User Perspective
Tools for Integrating Heterogeneous Data Sources from a User Perspective
 
Sina presentation in IBM
Sina presentation in IBMSina presentation in IBM
Sina presentation in IBM
 
Wi2015 - Clustering of Linked Open Data - the LODeX tool
Wi2015 - Clustering of Linked Open Data - the LODeX toolWi2015 - Clustering of Linked Open Data - the LODeX tool
Wi2015 - Clustering of Linked Open Data - the LODeX tool
 
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph Introduction
 
Copy of 10text (2)
Copy of 10text (2)Copy of 10text (2)
Copy of 10text (2)
 
Web and text
Web and textWeb and text
Web and text
 
Chapter 10 Data Mining Techniques
 Chapter 10 Data Mining Techniques Chapter 10 Data Mining Techniques
Chapter 10 Data Mining Techniques
 
A Semantic Multimedia Web (Part 2)
A Semantic Multimedia Web (Part 2)A Semantic Multimedia Web (Part 2)
A Semantic Multimedia Web (Part 2)
 
Verifying Integrity Constraints of a RDF-based WordNet
Verifying Integrity Constraints of a RDF-based WordNetVerifying Integrity Constraints of a RDF-based WordNet
Verifying Integrity Constraints of a RDF-based WordNet
 
Graph Representation Learning
Graph Representation LearningGraph Representation Learning
Graph Representation Learning
 
Query Translation for Ontology-extended Data Sources
Query Translation for Ontology-extended Data SourcesQuery Translation for Ontology-extended Data Sources
Query Translation for Ontology-extended Data Sources
 

Dernier

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
shivangimorya083
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
shambhavirathore45
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 

Dernier (20)

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 

Filtering Inaccurate Entity Co-references on the Linked Open Data

  • 1. NEXT: Background SCID: Semantic Co-reference Inaccuracy Detection - [ INTRODUCTION ] Filtering Inaccurate Entity Co-references on the Linked Open Data John Cuzzola, Jelena Jovanovic, Ebrahim Bagheri bagheri@ryerson.ca DEXA 2015
  • 2. The Linked-Open-Data (LOD) cloud represents hundreds of available datasets throughout the Web. ❖ 570 datasets and 2909 linkage relationships between the datasets.1 1. http://data.dws.informatik.uni-mannheim.de/lodcloud/2014/ NEXT: How are datasets linked? SCID: Semantic Co-reference Inaccuracy Detection - [ BACKGROUND ]
  • 3. To utilize the data from multiple ontologies within the LOD, “equivalence” relationships between concepts is necessary (ie: the “edges” or linkages of the LOD must be defined). 570 datasets and 2909 linkage relationships between the datasets. ? Equivalence relationships between DBPedia and Freebase? NEXT: The sameAs predicate SCID: Semantic Co-reference Inaccuracy Detection - [ BACKGROUND ]
  • 4. The equivalency relationship is often accomplished via the predicate owl:sameAs <owl:sameAs> http://rdf.freebase.com/ns/en.doghttp://dbpedia.org/resource/Dog NEXT: sameAs linkage mistakes SCID: Semantic Co-reference Inaccuracy Detection - [ BACKGROUND ]
  • 5. <owl:sameAs> http://rdf.freebase.com/ns/en.bitchhttp://dbpedia.org/resource/Dog NOT the same! X ns:common.topic.description: "Bitch, literally meaning a female dog, is a common slang term in the English language, especially used as a denigrating term applied to a person, commonly a woman” dbo:abstract: The domestic dog (Canis lupus familiaris) is a usually furry, carnivorous member of the canidae family. The Problem: There are many incorrect LOD linkages using owl:sameAs. The Effect: Incorrect (embarrassing) assertions by reasoners that use the LOD. Example: (from http://www.sameas.org) NEXT: SCID SCID: Semantic Co-reference Inaccuracy Detection - [ PROBLEM / MOTIVATION ]
  • 6. SCID: Semantic Co-reference Inaccuracy Detection ❖ A method of natural language analysis for detecting incorrect owl:sameAs assertions. 1. Construct a baseline comparison vector vb(x,Sx). 2. For each resource (1,2,...) claiming to be the “same”, construct vectors v1(x1,Sx), v2(x2,Sx) … 3. Compare individual distances from v1(x1,Sx), v2(x2,Sx) … to baseline vb(x,Sx) 4. Disregard those v1(x1,Sx), v2(x2,Sx) … that are outside some threshold distance δ. NEXT: The core functions of SCID. UPCOMING: How is vb(x,Sx) and v1(x1,Sx), v2(x2,Sx) … made? SCID: Semantic Co-reference Inaccuracy Detection - [ CONTRIBUTION ]
  • 7. SCID: Semantic Co-reference Inaccuracy Detection - [ METHOD ] SCID depends on two key functions: 1. A category distribution function: ρ(t,S) . Given some natural language text (t) and a set of “suitable” subject categories (S) for t, compute a distribution vector of how t relates to each subject category of S. 1. A category selection function S(uri). Given a resource (uri), return a “suitable” set of subject categories (S) that can be used in ρ(t,S). NEXT: The category distribution function. UPCOMING: The category selection function.
  • 8. SCID: Semantic Co-reference Inaccuracy Detection - [ METHOD ] The category distribution function: ρ(t,S) . Ex: Given input text (t) as shown and three DBpedia subject categories of S=[Fruit, Oranges, Color] ρ(t,S) produces output: ρ(t,[Fruit, Oranges, Color]) = v1(x1,S) = [ 0.27Fruit, 0.50Oranges, 0.22Color ] NEXT: The category selection function UPCOMING: How is baseline vector vb(x,Sx) computed and compared to v1(x1,S)? ● Computes Rx,k defined as the importance of word x to category k for every word in t. ○ uses 5 features: (1) count of x in k, (2) count of x across all k, (3) count of concepts where word x appears, (4) ratio of x in k to vocabulary of all k, (5) average word frequency of x per resource in k.
  • 9. SCID: Semantic Co-reference Inaccuracy Detection - [ METHOD ] The category selection function: Suri . ⇶ DBpedia contains 656,000+ category:subjects. How do we select a few suitable for ρ(t,S)? 1. Begin with a candidate resource (uri): http://dbpedia.org/resource/Orange_(fruit) 2. Find a DBpedia disambiguation page: http://dbpedia.org/resource/Orange_(disambiguation) 3. Combine (union of) the subject categories for each of these resources. Suri = [ category: { Optical_Spectrum, Oranges, Citrus_hybrids, Tropical_agriculture, American_punk_rock, Rock_music, Hellcat_Records } ] NEXT: How do we compute v1(x1,S) for sameAs inaccuracy filtering? UPCOMING: How is baseline vector vb(x,Sx) computed and compared to v1(x1,S)?  dbr:Orange_(colour)  dbr:Orange_(fruit)  dbr:Orange_(band)  category:Optical_Spectrum  category: Oranges  category:Citrus_hybrids  category:Tropical_agriculture  ...  category: American_punk_rock  category: Rock_music  category: Hellcat_Records
  • 10. SCID: Semantic Co-reference Inaccuracy Detection - [ METHOD ] NEXT: How is baseline vector vb(x,Sx) computed and compared to v1(x1,S)? UPCOMING: Experimental results http://www.sameas.org ● dbr:Port ● www.w3.org:synset-seaport-noun-1 ● rdf.freebase.com:en.port ● sw.opencyc.org:Seaport ● rdf.freebase.com:River_port ● dbr:Bad_conduct ● rdf.freebase.com:en.military_discharge ● dbr:IVDP ● rdf.freebase.com:en.port_wine How do we compute v1(t1,S), v2(t2,S), .. for sameAs inaccuracy filtering? 1. Start with a group of resources that are identified as sameAs: Ex: http://dbpedia.org/resource/Port (dbr:Port) 2. Collect subject categories Sdbr:Port using category selection function. 3. For each of the sameAs resources, collect natural language text (t) describing the resource. Collect (t) using dbpedia rdfs:comment, freebase ns.common.topic.description, www3.org wn20schema:gloss. 4. Compute vectors v1(t1,Sdbr:Port), v2(t2,Sdbr:Port)..., t1= rdfs:comment of dbr:Port, t2= ns.common.topic.description of rdf.freebase:River_port, … using category distribution function ρ(t,Sdbr:port). We now have individual v1,2..(t1,2..,Sdbr:Port) vectors. Only need base vector vb(tb,S) for comparison.
  • 11. SCID: Semantic Co-reference Inaccuracy Detection - [ METHOD ] NEXT: Experimental results UPCOMING: Conclusion How is baseline vector vb(x,Sx) computed and compared to v1(x1,S)? 1. Retrieve subject:categories of candidate resource from DBpedia Ex: http://dbpedia.org/resource/Port (dbr:Port) 2. Find (all) other resources that use the categories of the candidate resource. Concatenate rdfs:comment from all these resources (t). 3. Compute vb(t,Sdbr:Port) using category distribution function ρ(t,Sdbr:port). ● We now have base vector vb(t,Sdbr:Port) and can be compared to individual sameAs vectors v1,2(x1,2,Sdbr:Port). ● We use Pearson Correlation Coefficient (PCC) to compare vectors. ● Remove vectors whose PCC less than threshold δ. http://www.dbpedia.org  category:Nautical_terms  category:Ports_and_harbours
  • 12. SCID: Semantic Co-reference Inaccuracy Detection - [ EXPERIMENTATION ] NEXT: Experimental results continued. UPCOMING: Conclusion ● We examined 7,690 resources obtained from www.sameAs.org database of five topics: ○ Animal, City, Person, Color, and Miscellaneous. ● We performed some data cleansing on these resources. ○ removal of: duplicate resources (ie: aliases/redirects), broken links, redundant resources (ie: dbpedialite is a subset of DBpedia). ● After cleansing 411 unique resources remained with 251 errors identified by human oracle ○ ie: http://dbpedia.org/resource/Dog is not the same as http://rdf.freebase.com/ns/en.bitch
  • 13. SCID: Semantic Co-reference Inaccuracy Detection - [ EXPERIMENTATION ] ● We computed v411(t,S) individual vectors for all 411 resources with associated baseline comparison vector. ● We computed Pearson Correlation between v411(t,S) and baseline. ● Removed identity links based on thresholds ranging from 0.0 to 0.90. F-score calculated.for each threshold used. ○ Original 411 resources contained 160 correct / 251 incorrect sameAs links (0.560 F-score) ○ Threshold (δ) of 0.50 and 0.60 gave best F-score. NEXT: Experimental results continued. UPCOMING: Conclusion
  • 14. SCID: Semantic Co-reference Inaccuracy Detection - [ EXPERIMENTATION ] Scatter plot of F-score versus Pearson Correlation Coefficient for oracle-identified right(blue) and wrong(red) identity links. PEARSON wrong right δ NEXT: Conclusion
  • 15. SCID: Semantic Co-reference Inaccuracy Detection - [ CONCLUSION ] -- END -- ● In this presentation: ○ we introduce SCID: A technique for discovering inaccuracies in identity links assertion (owl:sameAs). ○ Experimental results indicate SCID can identify incorrect identity link assertions and improve precision of an identity database (http://www.sameas.org). ● In the future: ○ Experimentation with identity links other than owl:sameAs (ie: skos:closeMatch, skos:exactMatch, owl:equivalentClasses). ○ Experimentation with vector comparison methods other than Pearson Correlation (ie: cosine similarity, euclidean distance, Spearman rank coefficient).