SlideShare une entreprise Scribd logo
1  sur  37
Télécharger pour lire hors ligne
Query modeling and information retrieval within
the Web of Data
Cristian LAI
clai@crs4.it

CRS4

september 6, 2012

1 / 37
Outline

G

Motivation

G

UnStructured Data

G

Structured Data

G

Query building

G

Applications

G

Conclusion

september 6, 2012

2 / 37
Context
Semantic Web

http://www.w3.org/2006/Talks/1023-sb-W3CTechSemWeb/
september 6, 2012

3 / 37
Motivation
Search on the Web

http://www.slideshare.net/novaspivack/web-evolution-nova-spivack-twine
september 6, 2012

4 / 37
Outline

G

Motivation

G

UnStructured Data

G

Structured Data

G

Query building

G

Applications

G

Conclusion

september 6, 2012

5 / 37
Wikipedia

G
G

G

G
G

Started in 2001.
Is a multilingual, web-based, free-content encyclopedia project based on
an openly editable model.
Is the 5th site on the web and serves 454 million unique visitors monthly as
of March 2011.
Has fewer than 100 employees.
Wikipedia holds an annual fundraiser instead of accepting advertising. You
may have seen "A personal appeal from Wikipedia founder Jimmy Wales" if
you’ve used the online encyclopedia during the last weeks of 2011. Google
co-founder Sergey Brin and his wife, Anne Wojcicki, has given a 500,000
dollars grant to help Wikipedia fund its 28.3 million dollars annual budget.

september 6, 2012

6 / 37
Wikipedia

G

Pros:
H
H

H

G

Is a highly-efficient not-for-profit organization.
Is the finest example of truly collaborative created content: >19M articles;
>270 languages, >82k active contributors.
Covers many topics and domains, articles are a result of a community
consensus.

Cons:
H

Contains many inconsistencies.
G

H
H

Disclaimer: Wikipedia cannot guarantee the validity of the information found here.

Is not very well integrated with other data sources.
Queries and search are not facilitated due to the lacks of structured
representation.

september 6, 2012

7 / 37
Issues

G
G

UnStructured data, keywords based search.
Simple questions are hard to answer.
H
H
H

G
G

People who were born in Rome before 1900.
Italian musicians with English and French descriptions.
The official websites of companies with more than 500 employees.

The information required to answer these is contained in Wikipedia.
Transforming Wikipedia into a knowledge base.
H
H

To reveal the structure and semantics of Wikipedia content
The DBpedia project.

september 6, 2012

8 / 37
Structure in Wikipedia
G

Wikipedia articles consist mostly of free text, but also contain different
types of structured information, such as infobox templates,categorisation
information, images, geo-coordinates, and links to external Web pages.

G

Title

G

Abstract

G

Infobox Template

G

Geo-coordinates

G

Caegories

G
G

Images
Links
H
H
H
H

other language version
other Wikipedia pages
redirects
disambiguation
september 6, 2012

9 / 37
Structured Information in Wikipedia

september 6, 2012

10 / 37
Structured Information in Wikipedia

september 6, 2012

11 / 37
Structured Information in Wikipedia

september 6, 2012

12 / 37
Outline

G

Motivation

G

UnStructured Data

G

Structured Data

G

Query building

G

Applications

G

Conclusion

september 6, 2012

13 / 37
RDF representation
Knowledge Base

dbp:Cagliari rdf:type dbp:City
dbp:Cagliari dbp:Title "Cagliari"
dbp:Cagliari dbp:Country dbp:Italy
dbp:Cagliari dbp:postalCode 09100
dbp:Cagliari geo:lat "39.246387"xsd:float
dbp:Cagliari geo:long "9.057500"xsd:float
dbp:Cagliari rdf:type yago:MediterraneanPortCitiesAndTownsInItaly
...
G

An environment for collecting and structuring data.

G

Well defined structure of classification.

september 6, 2012

14 / 37
RDF

G
G

Triples: (subject, predicate, object)
Subject and object
H

are both URIs that each identify a resource, or a URI and a string literal
respectively.

H
G

Predicate
H

G

specifies how the subject and object are related, and is also represented by a
URI.

For example:
H
H
H

A knows B
C isAuthorOf D
Two resources linked in this fashion can be drawn from different data sets on
the Web, allowing data in one data source to be linked to that in another,
thereby creating a Web of Data.

september 6, 2012

15 / 37
DBpedia

G
G

G
G

Started in 2007.
Is the result of a community effort to extract structured information from
Wikipedia.
Makes Wikipedia data available as RDF.
Results: The DBpedia Data Set
H

H
H
G

G

describes 3.64 million "things" with over half a billion "facts" (July 2011), 364k
persons, 462k places, 99k music albums, 54k films, 148k organisations;
extraction in 97 different languages;
672M RDF triples

It is maintained by: Universität Leipzig, Freie Universität Berlin, OpenLink
Software, Inc.
See http://wiki.dbpedia.org/Team

september 6, 2012

16 / 37
Nucleus of the Web of Data

G
G

Within the W3C Linking Open Data (LOD) community effort.
Tim Berners-Lee’s Linked Data principles.
H
H
H
H

G

G

URI
HTTP
RDF, SPARQL
Interlinking among data providers

An increasing number of data providers have started to publish and
interlink data on the Web.
Several billion RDF triples and covers domains such as geographic
information, people, companies, online communities, films, music, books
and scientific publications.

september 6, 2012

17 / 37
LOD Datasets

september 6, 2012

18 / 37
LOD Datasets

september 6, 2012

19 / 37
Outline

G

Motivation

G

UnStructured Data

G

Structured Data

G

Query building

G

Applications

G

Conclusion

september 6, 2012

20 / 37
SPARQL Query Language

G

G

G

G

RDF is a directed, labeled graph data format for representing information
(also in the Web).
SPARQL is a language for querying RDF graphs by specifying templates
against which to compare graph components. Data which matches or
satisfies a template is returned from the query.
A triple template contains variables that represent triplet components (e.g.,
?s, ?p, or ?o within a triplet).
Example:
H
H

H

?person ex:age "20"xsd:integer .
Identifies a list of triplet subjects that have an ex:age property of "20".
Analogous to asking "Who has age 20?".
The SPARQL query engine will return a list of the subject component of triples
that satisfy each query through value substitution.

september 6, 2012

21 / 37
SPARQL Queries
SELECT variables_list
FROM < RDF_source_URL >
WHERE {
{ triple_pattern_1 .
. . .
triple_pattern_n . }.
}
SELECT ?person

?person

FROM < http://ex.com >

------------------

WHERE {
?person ex:age "20"xsd:integer .

_p1
_p2
. . .

}

september 6, 2012

22 / 37
The DBpedia SPARQL endpoint

G

G

All data sets are available for queries via the DBpedia SPARQL endpoint
(http://dbpedia.org/sparql).
Querying the data set:
H
H
H
H
H

...
Abstracts of movies starring Tom Cruise, released before 1999.
The official websites of companies with more than 50000 employees.
Cities with more than 2 million habitants.
...

september 6, 2012

23 / 37
Abstracts of movies starring Tom Cruise, released before
1999
SPARQL

SELECT ?subject ?label ?released ?abstract WHERE {
?subject rdf:type <http://dbpedia.org/ontology/Film>.
?subject dbpedia2:starring <http://dbpedia.org/resource/Tom_Cruise>.
?subject rdfs:comment ?abstract.
?subject rdfs:label ?label.
FILTER(lang(?abstract) = "en" && lang(?label) = "en").
?subject <http://dbpedia.org/ontology/releaseDate> ?released.
FILTER(xsd:date(?released) < "2000-01-01"^^xsd:date).
} ORDER BY ?released

september 6, 2012

24 / 37
Outline

G

Motivation

G

UnStructured Data

G

Structured Data

G

Query building

G

Applications

G

Conclusion

september 6, 2012

25 / 37
Linked Data Search Engines and Indexes

G

A number of search engines have been developed that crawl Linked Data
from the Web by following RDF links, and provide query capabilities over
aggregated data.
Tom Heath and Christian Bizer (2011) Linked Data: Evolving the Web into a Global Data Space (1st edition). Synthesis Lectures on the Semantic Web:
Theory and Technology, 1:1, 1-136. Morgan & Claypool.

G

G

Google, Bing and Yahoo! agree to create and support a common
vocabulary for structured data markup on web pages.
Facebook has started to support RDF and Linked Data URIs and now
provides access to parts of its user data via a Linked Data API.

september 6, 2012

26 / 37
Google rich snippets

september 6, 2012

27 / 37
Twitter, #annotations
Twitter API based client

september 6, 2012

28 / 37
Twitter, #annotations
Lookup annotations

september 6, 2012

29 / 37
Twitter, #annotations
Resource #dbpedia:Cagliari

september 6, 2012

30 / 37
Twitter, #annotations
Resource #dbpedia:Cagliari

september 6, 2012

31 / 37
Question answering
Risorsa Cagliari

september 6, 2012

32 / 37
Question answering
Template

september 6, 2012

33 / 37
Question answering
RDF/XML

september 6, 2012

34 / 37
Outline

G

Motivation

G

UnStructured Data

G

Structured Data

G

Query building

G

Applications

G

Conclusion

september 6, 2012

35 / 37
Conclusion

G

G

G
G

Data on the Web is a major challenge; technologies are needed to use
them, to interact with them, to integrate them.
Semantic Web technologies (RDF, SPARQL, etc.) can play a major role in
publishing and using Data on the Web.
Users can largely benefit from the wide world of structured content.
Content providers joining the Linking Open Data project are contributing
to create more meaningful navigation paths not only within websites but
across the whole web.

september 6, 2012

36 / 37
Q&A

september 6, 2012

37 / 37

Contenu connexe

Tendances

One Ontology, One Data Set, Multiple Shapes with SHACL
One Ontology, One Data Set, Multiple Shapes with SHACLOne Ontology, One Data Set, Multiple Shapes with SHACL
One Ontology, One Data Set, Multiple Shapes with SHACL
Connected Data World
 
Self-Service Linked Government Data with dcat and Gridworks
Self-Service Linked Government Data with dcat and GridworksSelf-Service Linked Government Data with dcat and Gridworks
Self-Service Linked Government Data with dcat and Gridworks
Richard Cyganiak
 

Tendances (20)

Industry Ontologies: Case Studies in Creating and Extending Schema.org
Industry Ontologies: Case Studies in Creating and Extending Schema.org Industry Ontologies: Case Studies in Creating and Extending Schema.org
Industry Ontologies: Case Studies in Creating and Extending Schema.org
 
Stephen Buxton: When RDF alone is not enough - triples, documents, and data i...
Stephen Buxton: When RDF alone is not enough - triples, documents, and data i...Stephen Buxton: When RDF alone is not enough - triples, documents, and data i...
Stephen Buxton: When RDF alone is not enough - triples, documents, and data i...
 
Structured Data for the Financial Industry
Structured Data for the Financial Industry Structured Data for the Financial Industry
Structured Data for the Financial Industry
 
One Ontology, One Data Set, Multiple Shapes with SHACL
One Ontology, One Data Set, Multiple Shapes with SHACLOne Ontology, One Data Set, Multiple Shapes with SHACL
One Ontology, One Data Set, Multiple Shapes with SHACL
 
ROI in Linking Content to CRM by Applying the Linked Data Stack
ROI in Linking Content to CRM by Applying the Linked Data StackROI in Linking Content to CRM by Applying the Linked Data Stack
ROI in Linking Content to CRM by Applying the Linked Data Stack
 
FIBO & Schema.org
FIBO & Schema.orgFIBO & Schema.org
FIBO & Schema.org
 
Epiphany: Adaptable RDFa Generation Linking the Web of Documents to the Web o...
Epiphany: Adaptable RDFa Generation Linking the Web of Documents to the Web o...Epiphany: Adaptable RDFa Generation Linking the Web of Documents to the Web o...
Epiphany: Adaptable RDFa Generation Linking the Web of Documents to the Web o...
 
Semantic Web vision and its relevance to Open Digital Data for MGI
Semantic Web vision and its relevance to Open Digital Data for MGISemantic Web vision and its relevance to Open Digital Data for MGI
Semantic Web vision and its relevance to Open Digital Data for MGI
 
Notes for talk on 12th June 2013 to Open Innovation meeting, Glasgow
Notes for talk on 12th June 2013 to Open Innovation meeting, GlasgowNotes for talk on 12th June 2013 to Open Innovation meeting, Glasgow
Notes for talk on 12th June 2013 to Open Innovation meeting, Glasgow
 
Linking Open Data
Linking Open DataLinking Open Data
Linking Open Data
 
How Semantics Solves Big Data Challenges
How Semantics Solves Big Data ChallengesHow Semantics Solves Big Data Challenges
How Semantics Solves Big Data Challenges
 
euBusinessGraph Company and Economic Data
euBusinessGraph Company and Economic DataeuBusinessGraph Company and Economic Data
euBusinessGraph Company and Economic Data
 
Sebastian Hellmann
Sebastian HellmannSebastian Hellmann
Sebastian Hellmann
 
Self-Service Linked Government Data with dcat and Gridworks
Self-Service Linked Government Data with dcat and GridworksSelf-Service Linked Government Data with dcat and Gridworks
Self-Service Linked Government Data with dcat and Gridworks
 
aRangodb, un package per l'utilizzo di ArangoDB con R
aRangodb, un package per l'utilizzo di ArangoDB con RaRangodb, un package per l'utilizzo di ArangoDB con R
aRangodb, un package per l'utilizzo di ArangoDB con R
 
How to Reveal Hidden Relationships in Data and Risk Analytics
How to Reveal Hidden Relationships in Data and Risk AnalyticsHow to Reveal Hidden Relationships in Data and Risk Analytics
How to Reveal Hidden Relationships in Data and Risk Analytics
 
A possible future role of schema.org for business reporting
A possible future role of schema.org for business reportingA possible future role of schema.org for business reporting
A possible future role of schema.org for business reporting
 
Semantics and Machine Learning
Semantics and Machine LearningSemantics and Machine Learning
Semantics and Machine Learning
 
RDAP 16 Poster: Hacking the figshare API to Create Enhanced Metadata Records
RDAP 16 Poster: Hacking the figshare API to Create Enhanced Metadata RecordsRDAP 16 Poster: Hacking the figshare API to Create Enhanced Metadata Records
RDAP 16 Poster: Hacking the figshare API to Create Enhanced Metadata Records
 
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4j
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4jExplicit Semantics in Graph DBs Driving Digital Transformation With Neo4j
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4j
 

En vedette

En vedette (6)

Phd_cristian_lai presentation
Phd_cristian_lai presentationPhd_cristian_lai presentation
Phd_cristian_lai presentation
 
Dart2013_presentation_cristian_lai
Dart2013_presentation_cristian_laiDart2013_presentation_cristian_lai
Dart2013_presentation_cristian_lai
 
Icwe2016 CRS4 Lugano
Icwe2016 CRS4 LuganoIcwe2016 CRS4 Lugano
Icwe2016 CRS4 Lugano
 
La tortuga
La tortugaLa tortuga
La tortuga
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving Cars
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI Explainer
 

Similaire à cristian_lai_webofdata

2011 05-01 linked data
2011 05-01 linked data2011 05-01 linked data
2011 05-01 linked data
vafopoulos
 
First they have to find it: Getting Open Government Data Discovered and Used
First they have to find it: Getting Open Government Data Discovered and UsedFirst they have to find it: Getting Open Government Data Discovered and Used
First they have to find it: Getting Open Government Data Discovered and Used
Rensselaer Polytechnic Institute
 
Semantic Markup
Semantic Markup Semantic Markup
Semantic Markup
R A Akerkar
 
2011 05-02 linked data intro
2011 05-02 linked data intro2011 05-02 linked data intro
2011 05-02 linked data intro
vafopoulos
 

Similaire à cristian_lai_webofdata (20)

How google is using linked data today and vision for tomorrow
How google is using linked data today and vision for tomorrowHow google is using linked data today and vision for tomorrow
How google is using linked data today and vision for tomorrow
 
KEDL DBpedia 2019
KEDL DBpedia  2019KEDL DBpedia  2019
KEDL DBpedia 2019
 
Semantic Web talk TEMPLATE
Semantic Web talk TEMPLATESemantic Web talk TEMPLATE
Semantic Web talk TEMPLATE
 
2011 05-01 linked data
2011 05-01 linked data2011 05-01 linked data
2011 05-01 linked data
 
Open Data and News Analytics Demo
Open Data and News Analytics DemoOpen Data and News Analytics Demo
Open Data and News Analytics Demo
 
Boost your data analytics with open data and public news content
Boost your data analytics with open data and public news contentBoost your data analytics with open data and public news content
Boost your data analytics with open data and public news content
 
schema.org, Linked Data's Gateway Drug
schema.org, Linked Data's Gateway Drugschema.org, Linked Data's Gateway Drug
schema.org, Linked Data's Gateway Drug
 
schema.org: Linked Data's Gateway Drug
schema.org: Linked Data's Gateway Drugschema.org: Linked Data's Gateway Drug
schema.org: Linked Data's Gateway Drug
 
Linked Data
Linked DataLinked Data
Linked Data
 
The Web of data and web data commons
The Web of data and web data commonsThe Web of data and web data commons
The Web of data and web data commons
 
Schema.org Update at ISWC2012
Schema.org Update at ISWC2012Schema.org Update at ISWC2012
Schema.org Update at ISWC2012
 
Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint
Semantische Technologien (nicht nur) für die verbesserte Suche in SharePointSemantische Technologien (nicht nur) für die verbesserte Suche in SharePoint
Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint
 
The technical case for a semantic web
The technical case for a semantic webThe technical case for a semantic web
The technical case for a semantic web
 
First they have to find it: Getting Open Government Data Discovered and Used
First they have to find it: Getting Open Government Data Discovered and UsedFirst they have to find it: Getting Open Government Data Discovered and Used
First they have to find it: Getting Open Government Data Discovered and Used
 
Semantic Markup
Semantic Markup Semantic Markup
Semantic Markup
 
2011 05-02 linked data intro
2011 05-02 linked data intro2011 05-02 linked data intro
2011 05-02 linked data intro
 
2013.05 - LDOW 2013 @ WWW 2013
2013.05 - LDOW 2013 @ WWW 20132013.05 - LDOW 2013 @ WWW 2013
2013.05 - LDOW 2013 @ WWW 2013
 
Bosch, Wackerow: Linked data on the web
Bosch, Wackerow: Linked data on the web Bosch, Wackerow: Linked data on the web
Bosch, Wackerow: Linked data on the web
 
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
 
Linked Data Tutorial
Linked Data TutorialLinked Data Tutorial
Linked Data Tutorial
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Dernier (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 

cristian_lai_webofdata

  • 1. Query modeling and information retrieval within the Web of Data Cristian LAI clai@crs4.it CRS4 september 6, 2012 1 / 37
  • 2. Outline G Motivation G UnStructured Data G Structured Data G Query building G Applications G Conclusion september 6, 2012 2 / 37
  • 4. Motivation Search on the Web http://www.slideshare.net/novaspivack/web-evolution-nova-spivack-twine september 6, 2012 4 / 37
  • 5. Outline G Motivation G UnStructured Data G Structured Data G Query building G Applications G Conclusion september 6, 2012 5 / 37
  • 6. Wikipedia G G G G G Started in 2001. Is a multilingual, web-based, free-content encyclopedia project based on an openly editable model. Is the 5th site on the web and serves 454 million unique visitors monthly as of March 2011. Has fewer than 100 employees. Wikipedia holds an annual fundraiser instead of accepting advertising. You may have seen "A personal appeal from Wikipedia founder Jimmy Wales" if you’ve used the online encyclopedia during the last weeks of 2011. Google co-founder Sergey Brin and his wife, Anne Wojcicki, has given a 500,000 dollars grant to help Wikipedia fund its 28.3 million dollars annual budget. september 6, 2012 6 / 37
  • 7. Wikipedia G Pros: H H H G Is a highly-efficient not-for-profit organization. Is the finest example of truly collaborative created content: >19M articles; >270 languages, >82k active contributors. Covers many topics and domains, articles are a result of a community consensus. Cons: H Contains many inconsistencies. G H H Disclaimer: Wikipedia cannot guarantee the validity of the information found here. Is not very well integrated with other data sources. Queries and search are not facilitated due to the lacks of structured representation. september 6, 2012 7 / 37
  • 8. Issues G G UnStructured data, keywords based search. Simple questions are hard to answer. H H H G G People who were born in Rome before 1900. Italian musicians with English and French descriptions. The official websites of companies with more than 500 employees. The information required to answer these is contained in Wikipedia. Transforming Wikipedia into a knowledge base. H H To reveal the structure and semantics of Wikipedia content The DBpedia project. september 6, 2012 8 / 37
  • 9. Structure in Wikipedia G Wikipedia articles consist mostly of free text, but also contain different types of structured information, such as infobox templates,categorisation information, images, geo-coordinates, and links to external Web pages. G Title G Abstract G Infobox Template G Geo-coordinates G Caegories G G Images Links H H H H other language version other Wikipedia pages redirects disambiguation september 6, 2012 9 / 37
  • 10. Structured Information in Wikipedia september 6, 2012 10 / 37
  • 11. Structured Information in Wikipedia september 6, 2012 11 / 37
  • 12. Structured Information in Wikipedia september 6, 2012 12 / 37
  • 13. Outline G Motivation G UnStructured Data G Structured Data G Query building G Applications G Conclusion september 6, 2012 13 / 37
  • 14. RDF representation Knowledge Base dbp:Cagliari rdf:type dbp:City dbp:Cagliari dbp:Title "Cagliari" dbp:Cagliari dbp:Country dbp:Italy dbp:Cagliari dbp:postalCode 09100 dbp:Cagliari geo:lat "39.246387"xsd:float dbp:Cagliari geo:long "9.057500"xsd:float dbp:Cagliari rdf:type yago:MediterraneanPortCitiesAndTownsInItaly ... G An environment for collecting and structuring data. G Well defined structure of classification. september 6, 2012 14 / 37
  • 15. RDF G G Triples: (subject, predicate, object) Subject and object H are both URIs that each identify a resource, or a URI and a string literal respectively. H G Predicate H G specifies how the subject and object are related, and is also represented by a URI. For example: H H H A knows B C isAuthorOf D Two resources linked in this fashion can be drawn from different data sets on the Web, allowing data in one data source to be linked to that in another, thereby creating a Web of Data. september 6, 2012 15 / 37
  • 16. DBpedia G G G G Started in 2007. Is the result of a community effort to extract structured information from Wikipedia. Makes Wikipedia data available as RDF. Results: The DBpedia Data Set H H H G G describes 3.64 million "things" with over half a billion "facts" (July 2011), 364k persons, 462k places, 99k music albums, 54k films, 148k organisations; extraction in 97 different languages; 672M RDF triples It is maintained by: Universität Leipzig, Freie Universität Berlin, OpenLink Software, Inc. See http://wiki.dbpedia.org/Team september 6, 2012 16 / 37
  • 17. Nucleus of the Web of Data G G Within the W3C Linking Open Data (LOD) community effort. Tim Berners-Lee’s Linked Data principles. H H H H G G URI HTTP RDF, SPARQL Interlinking among data providers An increasing number of data providers have started to publish and interlink data on the Web. Several billion RDF triples and covers domains such as geographic information, people, companies, online communities, films, music, books and scientific publications. september 6, 2012 17 / 37
  • 20. Outline G Motivation G UnStructured Data G Structured Data G Query building G Applications G Conclusion september 6, 2012 20 / 37
  • 21. SPARQL Query Language G G G G RDF is a directed, labeled graph data format for representing information (also in the Web). SPARQL is a language for querying RDF graphs by specifying templates against which to compare graph components. Data which matches or satisfies a template is returned from the query. A triple template contains variables that represent triplet components (e.g., ?s, ?p, or ?o within a triplet). Example: H H H ?person ex:age "20"xsd:integer . Identifies a list of triplet subjects that have an ex:age property of "20". Analogous to asking "Who has age 20?". The SPARQL query engine will return a list of the subject component of triples that satisfy each query through value substitution. september 6, 2012 21 / 37
  • 22. SPARQL Queries SELECT variables_list FROM < RDF_source_URL > WHERE { { triple_pattern_1 . . . . triple_pattern_n . }. } SELECT ?person ?person FROM < http://ex.com > ------------------ WHERE { ?person ex:age "20"xsd:integer . _p1 _p2 . . . } september 6, 2012 22 / 37
  • 23. The DBpedia SPARQL endpoint G G All data sets are available for queries via the DBpedia SPARQL endpoint (http://dbpedia.org/sparql). Querying the data set: H H H H H ... Abstracts of movies starring Tom Cruise, released before 1999. The official websites of companies with more than 50000 employees. Cities with more than 2 million habitants. ... september 6, 2012 23 / 37
  • 24. Abstracts of movies starring Tom Cruise, released before 1999 SPARQL SELECT ?subject ?label ?released ?abstract WHERE { ?subject rdf:type <http://dbpedia.org/ontology/Film>. ?subject dbpedia2:starring <http://dbpedia.org/resource/Tom_Cruise>. ?subject rdfs:comment ?abstract. ?subject rdfs:label ?label. FILTER(lang(?abstract) = "en" && lang(?label) = "en"). ?subject <http://dbpedia.org/ontology/releaseDate> ?released. FILTER(xsd:date(?released) < "2000-01-01"^^xsd:date). } ORDER BY ?released september 6, 2012 24 / 37
  • 25. Outline G Motivation G UnStructured Data G Structured Data G Query building G Applications G Conclusion september 6, 2012 25 / 37
  • 26. Linked Data Search Engines and Indexes G A number of search engines have been developed that crawl Linked Data from the Web by following RDF links, and provide query capabilities over aggregated data. Tom Heath and Christian Bizer (2011) Linked Data: Evolving the Web into a Global Data Space (1st edition). Synthesis Lectures on the Semantic Web: Theory and Technology, 1:1, 1-136. Morgan & Claypool. G G Google, Bing and Yahoo! agree to create and support a common vocabulary for structured data markup on web pages. Facebook has started to support RDF and Linked Data URIs and now provides access to parts of its user data via a Linked Data API. september 6, 2012 26 / 37
  • 28. Twitter, #annotations Twitter API based client september 6, 2012 28 / 37
  • 35. Outline G Motivation G UnStructured Data G Structured Data G Query building G Applications G Conclusion september 6, 2012 35 / 37
  • 36. Conclusion G G G G Data on the Web is a major challenge; technologies are needed to use them, to interact with them, to integrate them. Semantic Web technologies (RDF, SPARQL, etc.) can play a major role in publishing and using Data on the Web. Users can largely benefit from the wide world of structured content. Content providers joining the Linking Open Data project are contributing to create more meaningful navigation paths not only within websites but across the whole web. september 6, 2012 36 / 37