SlideShare une entreprise Scribd logo
1  sur  17
Télécharger pour lire hors ligne
Emir Muñoz 
Fujitsu (Ireland) Limited 
National University of Ireland Galway 
LD4IE 2014 @ ISWC, Riva del Garda, Trentino, Italy. Oct 20th, 2014 
http://bit.ly/1xYTR6Z 
(@emir_munoz)
2
<subject, predicate, object> Domain(predicate)  ?? Range(predicate)  ?? 
3
select distinct ?obj where 
{?sub <http://dbpedia.org/property/isbn> ?obj} 
Let’s run the following SPARQL query over endpoint… 
And some more ... 
The endpoint response is a table with the values for the isbn property: 
So, what is the correct range for ? 
4 
0 71090 6176526 2 2.7073 140043853 1107020697 2940013968264 0978-02-02+02:00 http://dbpedia.org/resource/N/a "?"@en "ISBN 0-312-85182-0"@en "See text"@en "various"@en 
"ISBN 978-0-465-02656-2, ISBN 0-14-017997-6"@en 
"ISBN 0-553-07875-5 & ISBN 0-553-56166-9"@en 
"The Claiming of Sleeping Beauty: ISBN 0-452-26656-4"@en 
"-2.0"^^<http://dbpedia.org/datatype/second> 
"TBA"@en 
"not available"@en 
"[[#Bibliography"@en
LOV Statistics (by July 7th, 2014): 
446 vocabularies 
10 classes and 20 properties in average 
5 
range of isbn is http://schema.org/Text
…but still, is it what I’m looking for? what is the syntax? 
6
Etymology 
apo- + apsis 
Noun 
apoapsis (plural apoapsides) 
(astronomy) The point of a body's elliptical orbit about the system's centre of mass where the distance between the body and the centre of mass is at its maximum. 
Property: apoapsis 
[http://en.wiktionary.org/wiki/apoapsis] 
Earth 
Satellite 
dbr:17049_Miron dbo:apoapsis 4.01288e+11 
7
8 
https://github.com/dbpedia/extraction-framework/blob/master/ core/src/main/scala/org/dbpedia/extraction/ontology/OntologyDatatypes.scala
<subject, predicate, object> 
1488-07-28+02:00 
"September 2012"@en 
"--08-26+02:00"^^<http://www.w3.org/2001/XMLSchema#gMonthDay> 
1982-05-23+02:00 
"August 2012"@en 
"--01-24+02:00"^^<http://www.w3.org/2001/XMLSchema#gMonthDay> 
2007-04-11+02:00 
"July 2009"@en 
"--06-11+02:00"^^<http://www.w3.org/2001/XMLSchema#gMonthDay> 
Lerman et al. (JAIR 2003) 
First column: 
[NUM-NUM-NUM+NUM:NUM] (plain literal) 
Second column: 
[ALPHA<space>NUM] (plain literal + lang) 
Third column: 
[--NUM-NUM+NUM:NUM] (typed literal) 
<http://dbpedia.org/property/date> 
9
Let be the set of content patterns. 
Lerman et al. (JAIR 2003) 
More specific categories 
For the input set: 
That generates the following patterns: 
Values are decomposed in tokens, and 
each token is represented by a syntactic 
class. 
10
2.4 billion RDF triples 
53,230 properties 
Version 3.9 
Split 
Method 
19.25% plain literals 
18.02% typed literals 
62.73% without lang or datatype (xsd:string) 
11
For apoapsis example, we extracted one pattern 
And we also found some other related properties: 
For date example, we extracted 7 patterns 
http://dbpedia.org/ontology/apoapsis LARGE/FLOAT_NUMBER 1.0 
http://dbpedia.org/ontology/Planet/apoapsis LARGE/FLOAT_NUMBER 1.0 
http://dbpedia.org/ontology/Spacecraft/apoapsis LARGE/FLOAT_NUMBER 1.0 
http://dbpedia.org/property/apoapsis NUMBER 0.9230769230769231 
http://dbpedia.org/property/apoapsis LARGE/FLOAT_NUMBER 0.75213675 
http://dbpedia.org/property/date -- SMALL_NUMBER - SMALL_NUMBER 0.2 
http://dbpedia.org/property/date ALPHANUMERIC MEDIUM_NUMBER 0.166 
http://dbpedia.org/property/date ALPHANUMERIC 2012 0.032 
http://dbpedia.org/property/date ALPHANUMERIC.ALPHANUMERIC 0.012 
And more … 
12
The user has this value: “2014-10-20”. 
What property can he use? 
dbp:dateCreated, dbp:dateOfProduction, dbp:dateOpened, dbp:dateSigned, dbp:dateOfPremiere, dbp:date, among others. 
What is the property dbp:admCtrOf used for? 
"town of republic significance of Meleuz"@en (http://dbpedia.org/resource/Meleuz) 
"town of oblast significance of Oktyabrsk"@en (http://dbpedia.org/resource/Oktyabrsk) 
"town of republic significance of Sortavala"@en (http://dbpedia.org/resource/Sortavala) 
 it is used to declare Administrative Control Of 
13
Check for atypical values (outliers) 
Close look into the most (in)frequent patterns 
Possible errors during automatic extraction 
For the dbp:isbn property we can find the following values: 
"summer or autumn 380"@en 
"Late November"@en 
"Fall 1040"@en 
680 
"December, 67 BC"@en 
"April-July 1799"@en 
http://dbpedia.org/resource/New_Year's_Day 
http://dbpedia.org/resource/Second_Intermediate_Period_of_Egypt 
"New moon day of Kartika, celebrations begin two days prior and end two days after that date"@en 
Are they orvalues? 
14
E-mail: user1@domain.com 
Given name: John 
Surname: Snow 
Birthday: 1986-02-14 
A vCard, may be annotated with microformat hCard 
LD4IE Challenge 2014 
vcard:email mailto : ALPHA PUNCTUATION ALL_LOWERCASE . ALL_LOWERCASE 0.82 
vcard:email mailto : ALPHA PUNCTUATION ALL_LOWERCASE . com 0.69 
vcard:email mailto : ALPHA @ ALPHANUMERIC . ALL_LOWERCASE 0.54 
vcard:email mailto : ALPHA @ ALPHANUMERIC . com 0.46 
vcard:email mailto : ALL_UPPERCASE ****@ ALL_LOWERCASE . ALL_LOWERCASE 0.36 
We can use our database to extract and validate the email: 
vcard:bday NUMBER - SMALL_NUMBER - SMALL_NUMBER 0.5 
vcard:bday MEDIUM_NUMBER - SMALL_NUMBER - SMALL_NUMBER 0.5 
…also the birthday 
15
Extraction of lexico-syntactic patterns from LD datasets 
Different use cases: 
Search for properties 
Validation of values 
Information extraction based on patterns 
Future work: 
Study of consistency analysis of knowledge bases 
Extension of patterns to cover other knowledge bases 
Among others 
16 
500,000 content patterns
http://emunoz.org 
@emir_munoz 
Emir.Munoz@ie.fujistu.com 
https://github.com/emir-munoz/ld-patterns/

Contenu connexe

En vedette

The Philosophical Aspects of Data Modelling
The Philosophical Aspects of Data ModellingThe Philosophical Aspects of Data Modelling
The Philosophical Aspects of Data ModellingEmir Muñoz
 
Reading Group 2014
Reading Group 2014Reading Group 2014
Reading Group 2014Emir Muñoz
 
Sell your code: Announcing the DroopyAppStore
Sell your code: Announcing the DroopyAppStoreSell your code: Announcing the DroopyAppStore
Sell your code: Announcing the DroopyAppStoreRobert Douglass
 
A Linked Data-Based Decision Tree Classifier to Review Movies
A Linked Data-Based Decision Tree Classifier to Review MoviesA Linked Data-Based Decision Tree Classifier to Review Movies
A Linked Data-Based Decision Tree Classifier to Review MoviesEmir Muñoz
 
Soft Cardinality Constraints on XML Data
Soft Cardinality Constraints on XML DataSoft Cardinality Constraints on XML Data
Soft Cardinality Constraints on XML DataEmir Muñoz
 
Why contributing to Drupal is awesome
Why contributing to Drupal is awesomeWhy contributing to Drupal is awesome
Why contributing to Drupal is awesomeRobert Douglass
 
Drupal and Interactive Digital Marketing
Drupal and Interactive Digital MarketingDrupal and Interactive Digital Marketing
Drupal and Interactive Digital MarketingRobert Douglass
 
ApacheSolr presentation from "Do it With Drupal"
ApacheSolr presentation from "Do it With Drupal"ApacheSolr presentation from "Do it With Drupal"
ApacheSolr presentation from "Do it With Drupal"Robert Douglass
 
State-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache SolrState-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache SolrRobert Douglass
 

En vedette (12)

DEXA 2012 Talk
DEXA 2012 TalkDEXA 2012 Talk
DEXA 2012 Talk
 
The Philosophical Aspects of Data Modelling
The Philosophical Aspects of Data ModellingThe Philosophical Aspects of Data Modelling
The Philosophical Aspects of Data Modelling
 
Reading Group 2014
Reading Group 2014Reading Group 2014
Reading Group 2014
 
Sell your code: Announcing the DroopyAppStore
Sell your code: Announcing the DroopyAppStoreSell your code: Announcing the DroopyAppStore
Sell your code: Announcing the DroopyAppStore
 
A Linked Data-Based Decision Tree Classifier to Review Movies
A Linked Data-Based Decision Tree Classifier to Review MoviesA Linked Data-Based Decision Tree Classifier to Review Movies
A Linked Data-Based Decision Tree Classifier to Review Movies
 
Soft Cardinality Constraints on XML Data
Soft Cardinality Constraints on XML DataSoft Cardinality Constraints on XML Data
Soft Cardinality Constraints on XML Data
 
Why contributing to Drupal is awesome
Why contributing to Drupal is awesomeWhy contributing to Drupal is awesome
Why contributing to Drupal is awesome
 
The Business of Drupal
The Business of DrupalThe Business of Drupal
The Business of Drupal
 
Drupal and Interactive Digital Marketing
Drupal and Interactive Digital MarketingDrupal and Interactive Digital Marketing
Drupal and Interactive Digital Marketing
 
ApacheSolr presentation from "Do it With Drupal"
ApacheSolr presentation from "Do it With Drupal"ApacheSolr presentation from "Do it With Drupal"
ApacheSolr presentation from "Do it With Drupal"
 
State-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache SolrState-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache Solr
 
Surface Care Supremacy of Harpic & Road Ahead
Surface Care Supremacy of Harpic & Road AheadSurface Care Supremacy of Harpic & Road Ahead
Surface Care Supremacy of Harpic & Road Ahead
 

Similaire à Learning Content Patterns from Linked Data

Ch03 Mining Massive Data Sets stanford
Ch03 Mining Massive Data Sets  stanfordCh03 Mining Massive Data Sets  stanford
Ch03 Mining Massive Data Sets stanfordSakthivel C R
 
Roman du Mont Saint-Michel: Biblissima's case study with the University of Ca...
Roman du Mont Saint-Michel: Biblissima's case study with the University of Ca...Roman du Mont Saint-Michel: Biblissima's case study with the University of Ca...
Roman du Mont Saint-Michel: Biblissima's case study with the University of Ca...Equipex Biblissima
 
Craig Brown speaks on ElasticSearch
Craig Brown speaks on ElasticSearchCraig Brown speaks on ElasticSearch
Craig Brown speaks on ElasticSearchimarcticblue
 
LuceneRDD for (Geospatial) Search and Entity Linkage
LuceneRDD for (Geospatial) Search and Entity LinkageLuceneRDD for (Geospatial) Search and Entity Linkage
LuceneRDD for (Geospatial) Search and Entity Linkagezouzias
 
Snmp class
Snmp classSnmp class
Snmp classaduitsis
 
Trying SPARQL Anything with MEI
Trying SPARQL Anything with MEITrying SPARQL Anything with MEI
Trying SPARQL Anything with MEIEnrico Daga
 
The ARK Identifier Scheme at Ten Years Old
The ARK Identifier Scheme at Ten Years OldThe ARK Identifier Scheme at Ten Years Old
The ARK Identifier Scheme at Ten Years OldJohn Kunze
 
Elasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlibElasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlibJen Aman
 
DataFrame: Spark's new abstraction for data science by Reynold Xin of Databricks
DataFrame: Spark's new abstraction for data science by Reynold Xin of DatabricksDataFrame: Spark's new abstraction for data science by Reynold Xin of Databricks
DataFrame: Spark's new abstraction for data science by Reynold Xin of DatabricksData Con LA
 
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...Lucidworks
 
Graph databases & data integration v2
Graph databases & data integration v2Graph databases & data integration v2
Graph databases & data integration v2Dimitris Kontokostas
 
R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538
R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538
R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538Krishna Sankar
 
20130912 YTC_Reynold Xin_Spark and Shark
20130912 YTC_Reynold Xin_Spark and Shark20130912 YTC_Reynold Xin_Spark and Shark
20130912 YTC_Reynold Xin_Spark and SharkYahooTechConference
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandrarantav
 
Scala Days San Francisco
Scala Days San FranciscoScala Days San Francisco
Scala Days San FranciscoMartin Odersky
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesPaco Nathan
 

Similaire à Learning Content Patterns from Linked Data (20)

SWT Lecture Session 3 - SPARQL
SWT Lecture Session 3 - SPARQLSWT Lecture Session 3 - SPARQL
SWT Lecture Session 3 - SPARQL
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Ch03 Mining Massive Data Sets stanford
Ch03 Mining Massive Data Sets  stanfordCh03 Mining Massive Data Sets  stanford
Ch03 Mining Massive Data Sets stanford
 
Roman du Mont Saint-Michel: Biblissima's case study with the University of Ca...
Roman du Mont Saint-Michel: Biblissima's case study with the University of Ca...Roman du Mont Saint-Michel: Biblissima's case study with the University of Ca...
Roman du Mont Saint-Michel: Biblissima's case study with the University of Ca...
 
Craig Brown speaks on ElasticSearch
Craig Brown speaks on ElasticSearchCraig Brown speaks on ElasticSearch
Craig Brown speaks on ElasticSearch
 
LuceneRDD for (Geospatial) Search and Entity Linkage
LuceneRDD for (Geospatial) Search and Entity LinkageLuceneRDD for (Geospatial) Search and Entity Linkage
LuceneRDD for (Geospatial) Search and Entity Linkage
 
Snmp class
Snmp classSnmp class
Snmp class
 
Trying SPARQL Anything with MEI
Trying SPARQL Anything with MEITrying SPARQL Anything with MEI
Trying SPARQL Anything with MEI
 
The ARK Identifier Scheme at Ten Years Old
The ARK Identifier Scheme at Ten Years OldThe ARK Identifier Scheme at Ten Years Old
The ARK Identifier Scheme at Ten Years Old
 
Elasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlibElasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlib
 
Scala+data
Scala+dataScala+data
Scala+data
 
04 pig data operations
04 pig data operations04 pig data operations
04 pig data operations
 
DataFrame: Spark's new abstraction for data science by Reynold Xin of Databricks
DataFrame: Spark's new abstraction for data science by Reynold Xin of DatabricksDataFrame: Spark's new abstraction for data science by Reynold Xin of Databricks
DataFrame: Spark's new abstraction for data science by Reynold Xin of Databricks
 
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
 
Graph databases & data integration v2
Graph databases & data integration v2Graph databases & data integration v2
Graph databases & data integration v2
 
R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538
R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538
R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538
 
20130912 YTC_Reynold Xin_Spark and Shark
20130912 YTC_Reynold Xin_Spark and Shark20130912 YTC_Reynold Xin_Spark and Shark
20130912 YTC_Reynold Xin_Spark and Shark
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandra
 
Scala Days San Francisco
Scala Days San FranciscoScala Days San Francisco
Scala Days San Francisco
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communities
 

Dernier

Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx9to5mart
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...amitlee9823
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 

Dernier (20)

Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 

Learning Content Patterns from Linked Data

  • 1. Emir Muñoz Fujitsu (Ireland) Limited National University of Ireland Galway LD4IE 2014 @ ISWC, Riva del Garda, Trentino, Italy. Oct 20th, 2014 http://bit.ly/1xYTR6Z (@emir_munoz)
  • 2. 2
  • 3. <subject, predicate, object> Domain(predicate)  ?? Range(predicate)  ?? 3
  • 4. select distinct ?obj where {?sub <http://dbpedia.org/property/isbn> ?obj} Let’s run the following SPARQL query over endpoint… And some more ... The endpoint response is a table with the values for the isbn property: So, what is the correct range for ? 4 0 71090 6176526 2 2.7073 140043853 1107020697 2940013968264 0978-02-02+02:00 http://dbpedia.org/resource/N/a "?"@en "ISBN 0-312-85182-0"@en "See text"@en "various"@en "ISBN 978-0-465-02656-2, ISBN 0-14-017997-6"@en "ISBN 0-553-07875-5 & ISBN 0-553-56166-9"@en "The Claiming of Sleeping Beauty: ISBN 0-452-26656-4"@en "-2.0"^^<http://dbpedia.org/datatype/second> "TBA"@en "not available"@en "[[#Bibliography"@en
  • 5. LOV Statistics (by July 7th, 2014): 446 vocabularies 10 classes and 20 properties in average 5 range of isbn is http://schema.org/Text
  • 6. …but still, is it what I’m looking for? what is the syntax? 6
  • 7. Etymology apo- + apsis Noun apoapsis (plural apoapsides) (astronomy) The point of a body's elliptical orbit about the system's centre of mass where the distance between the body and the centre of mass is at its maximum. Property: apoapsis [http://en.wiktionary.org/wiki/apoapsis] Earth Satellite dbr:17049_Miron dbo:apoapsis 4.01288e+11 7
  • 9. <subject, predicate, object> 1488-07-28+02:00 "September 2012"@en "--08-26+02:00"^^<http://www.w3.org/2001/XMLSchema#gMonthDay> 1982-05-23+02:00 "August 2012"@en "--01-24+02:00"^^<http://www.w3.org/2001/XMLSchema#gMonthDay> 2007-04-11+02:00 "July 2009"@en "--06-11+02:00"^^<http://www.w3.org/2001/XMLSchema#gMonthDay> Lerman et al. (JAIR 2003) First column: [NUM-NUM-NUM+NUM:NUM] (plain literal) Second column: [ALPHA<space>NUM] (plain literal + lang) Third column: [--NUM-NUM+NUM:NUM] (typed literal) <http://dbpedia.org/property/date> 9
  • 10. Let be the set of content patterns. Lerman et al. (JAIR 2003) More specific categories For the input set: That generates the following patterns: Values are decomposed in tokens, and each token is represented by a syntactic class. 10
  • 11. 2.4 billion RDF triples 53,230 properties Version 3.9 Split Method 19.25% plain literals 18.02% typed literals 62.73% without lang or datatype (xsd:string) 11
  • 12. For apoapsis example, we extracted one pattern And we also found some other related properties: For date example, we extracted 7 patterns http://dbpedia.org/ontology/apoapsis LARGE/FLOAT_NUMBER 1.0 http://dbpedia.org/ontology/Planet/apoapsis LARGE/FLOAT_NUMBER 1.0 http://dbpedia.org/ontology/Spacecraft/apoapsis LARGE/FLOAT_NUMBER 1.0 http://dbpedia.org/property/apoapsis NUMBER 0.9230769230769231 http://dbpedia.org/property/apoapsis LARGE/FLOAT_NUMBER 0.75213675 http://dbpedia.org/property/date -- SMALL_NUMBER - SMALL_NUMBER 0.2 http://dbpedia.org/property/date ALPHANUMERIC MEDIUM_NUMBER 0.166 http://dbpedia.org/property/date ALPHANUMERIC 2012 0.032 http://dbpedia.org/property/date ALPHANUMERIC.ALPHANUMERIC 0.012 And more … 12
  • 13. The user has this value: “2014-10-20”. What property can he use? dbp:dateCreated, dbp:dateOfProduction, dbp:dateOpened, dbp:dateSigned, dbp:dateOfPremiere, dbp:date, among others. What is the property dbp:admCtrOf used for? "town of republic significance of Meleuz"@en (http://dbpedia.org/resource/Meleuz) "town of oblast significance of Oktyabrsk"@en (http://dbpedia.org/resource/Oktyabrsk) "town of republic significance of Sortavala"@en (http://dbpedia.org/resource/Sortavala)  it is used to declare Administrative Control Of 13
  • 14. Check for atypical values (outliers) Close look into the most (in)frequent patterns Possible errors during automatic extraction For the dbp:isbn property we can find the following values: "summer or autumn 380"@en "Late November"@en "Fall 1040"@en 680 "December, 67 BC"@en "April-July 1799"@en http://dbpedia.org/resource/New_Year's_Day http://dbpedia.org/resource/Second_Intermediate_Period_of_Egypt "New moon day of Kartika, celebrations begin two days prior and end two days after that date"@en Are they orvalues? 14
  • 15. E-mail: user1@domain.com Given name: John Surname: Snow Birthday: 1986-02-14 A vCard, may be annotated with microformat hCard LD4IE Challenge 2014 vcard:email mailto : ALPHA PUNCTUATION ALL_LOWERCASE . ALL_LOWERCASE 0.82 vcard:email mailto : ALPHA PUNCTUATION ALL_LOWERCASE . com 0.69 vcard:email mailto : ALPHA @ ALPHANUMERIC . ALL_LOWERCASE 0.54 vcard:email mailto : ALPHA @ ALPHANUMERIC . com 0.46 vcard:email mailto : ALL_UPPERCASE ****@ ALL_LOWERCASE . ALL_LOWERCASE 0.36 We can use our database to extract and validate the email: vcard:bday NUMBER - SMALL_NUMBER - SMALL_NUMBER 0.5 vcard:bday MEDIUM_NUMBER - SMALL_NUMBER - SMALL_NUMBER 0.5 …also the birthday 15
  • 16. Extraction of lexico-syntactic patterns from LD datasets Different use cases: Search for properties Validation of values Information extraction based on patterns Future work: Study of consistency analysis of knowledge bases Extension of patterns to cover other knowledge bases Among others 16 500,000 content patterns
  • 17. http://emunoz.org @emir_munoz Emir.Munoz@ie.fujistu.com https://github.com/emir-munoz/ld-patterns/