SlideShare une entreprise Scribd logo
1  sur  86
SEMANTIC WEB
UNDERSTANDING IN BRIEF
INTRODUCTION
WEB OF DOCUMENTS VS. WEB OF DATA
4/4/2016Ankur Biswas 2
A Walk Through Brief History of World
Wide Web
• 1969 – ARPANET (Advanced Research Project Agency)
launched
• In 1980, Tim Berners-Lee built ENQUIRE, as a personal
database of people and software models, a way to
play with hypertext; each new page of information in
ENQUIRE had to be linked to an existing page.
• In 1990, Berners-Lee built all the tools necessary for
working Web: HTTP 0.9, HTML, First Web Browser
(Web-Editor), the first HTTP server software (CERN
httpd), the first web server (http://info.cern.ch), and
the first Web pages that described the project itself.
WWW's historical logo designed
by Robert Cailliau
The NeXTcube used by Tim
Berners-Lee at CERN became
the first Web server.
34/4/2016Ankur Biswas
How big is web???
• As per http://www.worldwidewebsize.com/
the Indexed Web contains at least 4.84 billion
pages (Thursday, 25 February, 2016).
• Early estimates suggested that the deep web
is 400 to 550 times larger than the surface
web.
• Since more information and sites are always
being added, it can be assumed that the deep
web is growing exponentially at a rate that
cannot be quantified.
44/4/2016Ankur Biswas
Understanding Information in the WWW
• What is important and how do you know?
• What is information, what is advertisement?
• What does information mean?
• How credible or trustworthy is the information?
• What is redundant?
54/4/2016Ankur Biswas
Understanding the Importance of Meaning
• SEMANTICS: It is part of the linguistics focused on Sense & Meaning of
language or symbols of language.
• It is study of interpretation of sign or symbols as used by agents or
communities within particular circumstances and contexts.
• Semantics asks, how sense and meaning of complex concepts can be
derived from simple concepts based on the rules of syntax.
• The semantics of a message depends of its context and pragmatics†.
†Dealing with things sensibly and realistically in a way that is based on practical rather than
theoretical considerations.
64/4/2016Ankur Biswas
• SYNTAX: In grammatics denotes the study of the principles
and processes by which sentences are constructed in
particular language.
• In formal languages, syntax is just a set of rules, by which
well formed expressions can be created from a fundamental
set of symbols (alphabet).
• In computer science, Syntax defines the normative structure
of data.
Understanding the Importance of Meaning
74/4/2016Ankur Biswas
Understanding the Importance of Meaning
• CONTEXT: It denotes the surrounding expressions (concepts) in an
expressing represents its relationship with surrounding expressions
(concepts) and further related elements.
• Context denotes all elements of any sort of communications that
define the interpretation of the communicated content e.g.
• General contexts: place, time, interrelation of action in message.
• Personal or Social contexts: relation between sender and receiver of a message.
• PRAGMATICS: It reflects the intention by which the language is used to
communicate a message.
• In linguistic pragmatics denotes the study of applying language in
different situations It also denotes the intended purpose of speaker.
Pragmatics studies the ways in which context contributes to meaning
84/4/2016Ankur Biswas
The limits of web
• Traditional key based search leads to many irrelevant results.
• Ex.- From a simple term Jaguar it is not clear if the user mean car or animal or
OS(Mac OS X Jaguar)
• POLYSEMY: If you get some result for your search and get some other
result as well with different meaning having same or similar name.
94/4/2016Ankur Biswas
Problem 1: Information Retrieval
• Jaguar (animal) Panthera Onca
• Traditional keyword-based search doesn’t find all results.
• Synonyms & metaphors (Not always addressed properly which results undesired
results)
Primary objects: documents
Degree of structure in data: fairly low
Implicit semantics of contents
Designed for: human consumption
4/4/2016Ankur Biswas 10
HTML HTML HTML
API/
XML
A B C D
Untyped Links Untyped Links Untyped Links
Problem 2: Information Extraction
• Identifying contents written in other languages e.g. Japanese or
Bengali
• Pictures doesn’t give any information to search engines that what it
shows.
• Example – Google identifies the caption or name of the picture which
is embedded in it and makes it a reference keyword.
4/4/2016Ankur Biswas 11
Problem 2: Information Extraction (Cont.)
4/4/2016Ankur Biswas 12
HTML HTML HTML
API/
XML
A B C D
Untyped Links Untyped Links Untyped Links
Things Things
Are two Documents
talking about same
“Thing”???
?
?
?
?
?
? ?
• Can only be solved, correctly by a human agent
• Heterogeneous distribution and order of information.
• Software agent does not have sufficient:
• Knowledge of contexts
• World knowledge and
• Experience
To solve problem
Hence it will not be able to solve the problem without explicit
semantic available.
Implicit knowledge, i.e. information doesn’t have specified explicitly
but must be derived via logical deductions from available information.
4/4/2016Ankur Biswas 13
Problem 2: Information Extraction (Cont.)
The more complex and voluminous a website is , the more complicated is the
maintenance of the only weakly structured data.
Problems:
 Syntactic consistency error: You have linked your webpage to another
webpage having some related content but now the webpage has moved to
some other place and the link to that address still exist.
 Semantic (link) consistency error: This is even more dangerous where
hyperlinked destinations is consistently changing.
 Correctness: It is tough to maintain correctness over time in automated
manner
 Timeliness: Tracking the changes over time is really tough.
Problem 3: Maintenance
4/4/2016Ankur Biswas 14
http 404 Error: File/Page not found
Problem 4: Personalization
• Adaption of the presented information content to personal
requirements:
User normally password protect their details and hence it becomes tough to access
any such kind of information.
• Problems:
• From where do we get the required (personal) information?
• Personalization vs Data Security
4/4/2016Ankur Biswas 15
INTRODUCTION TO
SEMANTIC WEB TECHNOLOGIES
THE VISION OF THE SEMANTIC WEB
4/4/2016Ankur Biswas 16
The vision of the Semantic Web
4/4/2016Ankur Biswas 17
Precondition:
• Content can be read and
interpreted correctly
(understood) by machines
Natural language Processing
• Technologies of Traditional
Information Retrieval (Search
Engines)
Semantic Web concept was first introduced in 1990’s by
Tim Berners – Lee who is also one of the creator of internet.
Semantic Web
• Natural language web content will
be explicitly annotated with
semantic metadata
• Semantic metadata encode the
Meaning (Semantics) of the
content and can be read and
interpreted correctly by machines.
How Can we Achieve the Semantic Web? –
The Original Vision
• Instead of publishing information to be consumed by
humans, publish machine-processable data and metadata
using terms/languages that can be understood by machines.
• Build machines (agents) that will search for, query, integrate
etc. this data.
• Make sure all agents understand your terms/languages.
4/4/2016Ankur Biswas 18
The Semantic Web and Linked Data Vision
Today
• The Semantic Web is a web of data. There is lots of data we all use
every day, and it is not part of the web.
• The Semantic Web is about two things:
• It is about common formats for integration and combination of data drawn from
diverse sources, where on the original Web mainly concentrated on the
interchange of documents.
• It is also about language for recording how the data relates to real world objects.
• That allows a person, or a machine, to start off in one database, and
then move through an unending set of databases which are connected
not by wires but by being about the same thing.
4/4/2016Ankur Biswas 19
Semantic Web Technology Stack
• Most apps use only a subset of
the stack
• Querying allows fine-grained
data access
• Standardized information
exchange is a key
• Formats are necessary but not
too important
• The semantic web is based on
the web
4/4/2016Ankur Biswas 20
Basic Layer of Semantic Web Technology
Stack
• The foundation of the layer is World Wide Web. Hence we rely on all technologies in
world wide web.
• Semantic version of Wikipedia is DBpedia.
• As Wikipedia is having template hence data is somewhat structured.
• DBpedia extracts data from Wikipedia infoboxes.
• DBpedia is having machine readable language  RDF
• Dbpedia stores & publishes the result in RDF and a few other formats.
• It also hosts a community effort to define extractors for the data, that can be used
well beyond Wikipedia.
• It provides a number of services around the extracted data, like DBpedia mobile, a
SPARQL endpoint, a faceted browser, a number of mappings to external ontologies,
an ontology itself, etc.
4/4/2016Ankur Biswas 21
Semantic Web Technologies
• A set of technologies and frameworks that enable
the Web of Data:
• Resource Description Framework (RDF)
• A variety of data interchange formats (e.g. RDF/XML, N3, Turtle, N-
Triples)
• Notations such as RDF Schema (RDFS) and the Web Ontology
Language (OWL)
• All are intended to provide a formal description of concepts, terms,
and relationships within a given knowledge domain
• Specialized query language (SPARQL) is just like SQL but can be more
complicated and may be based on graph extraction
4/4/2016Ankur Biswas 22
Application in Web of Data
• Linked Data
• Linked Open Data (LOD) denote publicly available (RDF) Data in the web,
identification via URI and accessible via HTTP. Linked data
4/4/2016Ankur Biswas 23
Web of Data:
• >31 billion Facts
• >500 million Links
(Oct 2011)
4/4/2016Ankur Biswas 24
What is so special about BBC Music Website?
• Information is dynamically aggregated from
external, publicly available data (Wikipedia,
Music Brainz,…)
• No Screen Scrapping
• No specialized API
• Data available as Linked Open Data.
• Data access via simple HTTP Request
• Data is always up to date without manual
interaction.
How to build such a site 1.
• Site editors roam the Web for new facts
• may discover further links while roaming
• They update the site manually
• And the site gets soon out-of-date
4/4/2016Ankur Biswas 25
How to build such a site 2.
• Editors roam the Web for new data published on
Web sites
• “Scrape” the sites with a program to extract the
information
• i.e., write some code to incorporate the new data
• Easily get out of date again…
4/4/2016Ankur Biswas 26
How to build such a site 3.
• Editors roam the Web for new data via API-s
• Understand those…
• input, output arguments, datatypes used, etc.
• Write some code to incorporate the new data
• Easily get out of date again…
4/4/2016Ankur Biswas 27
The choice of the BBC
• Use external, public datasets
• Wikipedia, MusicBrainz, …
• They are available as data
• not API-s or hidden on a Web site
• data can be extracted using, e.g., HTTP requests or
standard queries
4/4/2016Ankur Biswas 28
Its all documented
4/4/2016Ankur Biswas 29
Search Engines – Document Retrieval
• General Problems:
• Correct interpretation of query
string ->
• Somehow the context of user has
to be considered
• e.g. what was the query of the user
just before a specific query or their
usual preferences etc.
• Correct identification of entities
• Automatic disambiguation
• Usability
• personalization
4/4/2016Ankur Biswas 30
Intelligent Agents in Semantic Web
WORLD WIDE WEB SEMANTIC WEB
4/4/2016Ankur Biswas 31
USER
Presentation
Service (e.g.
Firefox)
Retrieval Service
(e.g. Google)
USER
Personal
Assistant
www documents
www documents
Intelligent
Infrastructure
Services
3 Generations of Web Documents
4/4/2016Ankur Biswas 32
Static Web
Pages
HTML / CSS
1st Generation
Virtual
Web Pages
Interactive
Web Pages
Java Script/ Applets
Netbots
Information Extraction
Presentation Planning
Database Access
Template Based
Generation
User Model
Machine Learning
Online Layout
Dynamic Web
Pages
Adaptive Web
Pages
2nd Generation 3rd Generation
Toolbox for the Semantic Web
• Standardized Language to express semantic of information content in the
web (XML/XSD, RDF(S), OWL, RIF)
• Tools of semantic information in the web (RDFa, GRDDL,…)
• Various Field of computer science:
• Artificial Intelligence
• Linguistics
• Cryptography
• Database
• Theoretical Computer Science
• Computer Architecture
• Software Engineering
• Systems Theory
• Computer Networks
4/4/2016Ankur Biswas 33
Basic Architecture of Semantic Web - I
• Uniform  Different types of
resource identifiers all
constructed according to
uniform schema.
•Resource  Whatever may be
identified by URI
•Identifier  To distinguish one
resource from another
4/4/2016Ankur Biswas 34
Uniform Resource Identifier (URI)
• A Uniform Resource Identifier (URI) defines a simple and extensible
schema for world wide unique identification of abstract or physical
resources.
• Resources can be every object with a clear identity (according to the context of
the application)
• As e.g. webpages, books, locations, persons, relations among objects, abstract concepts,
etc.
• The concept of URI is already established in various domains as e.g.
• The Web(URL (uniform resource locator), PRN (persistent uniform names), pURL
(persistent uniform resource locator)
• Books & Publications (ISBN, ISSN)
• Digital Object Identifier (DOI)
4/4/2016Ankur Biswas 35
Uniform Resource Identifier (URI)
• URI Combines
• Address (Locator)
• Uniform Resource Locator (URL, RFC
1738)
• Denotes, where a resource can be
found in the web by stating its
primary access mechanism
• Might change during life time.
• Identity (Name)
• Uniform Resource Name (URN, RFC
2141)
• Persistent Identifier for a web
resource
• Remains unchanged during life cycle
• URI Generic Syntax
• Schema: e.g. http, ftp, mailto
• Userinfo: e.g. username; password
• Host: e.g. Domain name, IPv4/IPv6
Address
• Port: e.g. :80 stands for http port
• Path: e.g. path in file system of
WWW server
• Query: e.g. parameters to be passed
over to applications
• Fragment: e.g. determines specific
fragment of a document
4/4/2016Ankur Biswas 36
URI=schema”://”[userinfo”@”]host[:port]
[path][“?”query][“#”fragment]
Data on the Web is not enough…
• We need a proper infrastructure for a real Web of
Data
• data is available on the Web
• accessible via standard Web technologies
• data are interlinked over the Web
• i.e., data can be integrated over the Web
• This is where Semantic Web technologies come in
• We will use a simplistic example to introduce the
main Semantic Web concepts
4/4/2016Ankur Biswas 37
The rough structure of data integration
• Map the various data onto an abstract data
representation
• make the data independent of its internal
representation…
• Merge the resulting representations
• Start making queries on the whole!
• queries not possible on the individual data sets
4/4/2016Ankur Biswas 38
We start with a book...
4/4/2016Ankur Biswas 39
A simplified bookstore data
(dataset “A”)
4/4/2016Ankur Biswas 40
ID Author Title Publisher Year
ISBN 0-00-6511409-X id_xyz The Glass Palace id_qpr 2000
ID Name Homepage
id_xyz Ghosh, Amitav http://www.amitavghosh.com
ID Publisher’s name City
id_qpr Harper Collins London
1st: we export our data as a set of relations
4/4/2016Ankur Biswas 41
http://…isbn/000651409X
Ghosh, Amitav http://www.amitavghosh.com
The Glass Palace
2000
London
Harper Collins
a:name
a:homepage
a:author
Some notes on the exporting the data
• Relations form a graph
• the nodes refer to the “real” data or contain some literal
• how the graph is represented in machine is immaterial for now
• Data export does not necessarily mean physical conversion of the data
• relations can be generated on-the-fly at query time
• via SQL “bridges”
• scraping HTML pages
• extracting data from Excel sheets
• etc.
• One can export part of the data
4/4/2016Ankur Biswas 42
Same book in French…
4/4/2016Ankur Biswas 43
Another bookstore data
(dataset “F”)
4/4/2016Ankur Biswas 44
A B C D
1
ID Titre Traducteur Original
2
ISBN 2020286682 Le Palais des Miroirs $A12$ ISBN 0-00-6511409-X
3
4
5
6
ID Auteur
7
ISBN 0-00-6511409-X $A11$
8
9
10
Nom
11
Ghosh, Amitav
12
Besse, Christianne
2nd: export your second set of data
4/4/2016Ankur Biswas 45
http://…isbn/000651409X
Ghosh, Amitav
Besse, Christianne
Le palais des miroirs
f:nom
f:traducteur
f:auteur
http://…isbn/2020386682
f:nom
3rd: start merging your data
4/4/2016Ankur Biswas 46
http://…isbn/000651409X
Ghosh, Amitav
Besse, Christianne
Le palais des miroirs
f:nom
f:traducteur
f:auteur
http://…isbn/2020386682
f:nom
http://…isbn/000651409X
Ghosh, Amitav
http://www.amitavghosh.com
The Glass Palace
2000
London
Harper Collins
a:name
a:homepage
a:author
Same URI!
3rd: start merging your data
4/4/2016Ankur Biswas 47
Ghosh, Amitav
Besse, Christianne
Le palais des miroirs
f:original
f:nom
f:traducteur
f:auteur
http://…isbn/2020386682
f:nom
Ghosh, Amitav
http://www.amitavghosh.com
The Glass Palace
2000
London
Harper Collins
a:name
a:homepage
a:author
http://…isbn/000651409X
Start making queries…
• User of data “F” can now ask queries like:
• “give me the title of the original”
• well, … « donnes-moi le titre de l’original »
• This information is not in the dataset “F”…
• …but can be retrieved by merging with dataset “A”!
4/4/2016Ankur Biswas 48
However, more can be achieved…
• We “feel” that a:author and f:auteur should be the same
• But an automatic merge does not know that!
• Let us add some extra information to the merged data:
• a:author same as f:auteur
• both identify a “Person”
• a term that a community may have already defined:
• a “Person” is uniquely identified by his/her name and, say, homepage
• it can be used as a “category” for certain type of resources
4/4/2016Ankur Biswas 49
3rd revisited: use the extra knowledge
4/4/2016Ankur Biswas 50
Besse, Christianne
Le palais des miroirs
f:original
f:nom
f:traducteu
r
f:auteur
http://…isbn/2020386682
f:nom
Ghosh, Amitav
http://www.amitavghosh.com
The Glass Palace
2000
London
Harper Collins
a:name
a:homepage
a:author
http://…isbn/000651409X
http://…foaf/Person
r:type
r:type
f:auteur
a:name
a:homepage
f:auteur
a:name
a:homepage
f:original
f:traducteur
f:nom
r:type
f:auteur
a:name
a:homepage
Start making richer queries!
• User of dataset “F” can now query:
• “donnes-moi la page d’accueil de l’auteur de l’original”
• well… “give me the home page of the original’s ‘auteur’”
• The information is not in datasets “F” or “A”…
• …but was made available by:
• merging datasets “A” and datasets “F”
• adding three simple extra statements as an extra “glue”
4/4/2016Ankur Biswas 51
Combine with different datasets
• Using, e.g., the “Person”, the dataset can be combined with
other sources
• For example, data in Wikipedia can be extracted using
dedicated tools
• e.g., the “dbpedia” project can extract the “infobox” information
from Wikipedia already…
4/4/2016Ankur Biswas 52
Merge with Wikipedia data
4/4/2016Ankur Biswas 53
Besse, Christianne
Le palais des miroirs
f:original
f:nom
f:traducteu
r
f:auteur
http://…isbn/2020386682
f:nom
Ghosh, Amitav http://www.amitavghosh.com
The Glass Palace
2000
London
Harper Collins
a:name
a:homepage
a:author
http://…isbn/000651409X
http://…foaf/Person
r:type
r:type
http://dbpedia.org/../Amitav_Ghosh
http://dbpedia.org/../The_Hungry_Tide
http://dbpedia.org/../The_Calcutta_Chromosome
http://dbpedia.org/../The_Glass_Palace
r:type
foaf:name
w:reference
w:author_of
w:author_of
w:isbn
a:author
f:original
f:traducteur
f:nom
r:type
w:isbn
http://dbpedia.org/../Kolkata
w:author_of
w:born_in
w:long
w:lat
Search Engines – Fact
Retrieval
4/4/2016Ankur Biswas 54
Query String: International Space Station - 17th
March 2016
• What is International Space Station?
• Is it orbiting on 17th March 2016?
• How to compute the position of satellite on the
said date
• External Data to be considered:
• Constellation data
• Planet data
• Satellite data
Query String: International Space Station - 17th
March 2016
• What is International Space Station?
• Is it orbiting on 17th March 2016?
• How to compute the position of satellite on the
said date
• External Data to be considered:
• Constellation data
• Planet data
• Satellite data
RDF
RDF stands for
• Resource: pages, dogs, ideas...
everything that can have a URI
• Description: attributes, features, and
relations of the resources
• Framework: model, languages and
syntaxes for these descriptions
•RDF is a triple model i.e. every piece of
knowledge is broken down into
( subject , predicate , object )
4/4/2016Ankur Biswas 55
RDF
4/4/2016Ankur Biswas 56
• doc.html has for author Ankur
and has for theme Research
• doc.html has for author Ankur
doc.html has for theme Research
• ( doc.html , author , Ankur )
( doc.html , theme , Research )
( subject , predicate , object )
4/4/2016Ankur Biswas 57
RDFis also a graph model to link the descriptions of resources
RDF triples can be seen as arcs
of a graph (vertex,edge,vertex)
Ankur
Doc.html
Research
Author Theme
Resource Description Framework (RDF)
• Another Triple Model:
4/4/2016Ankur Biswas 58
Subject Predicate Object
Renee Miller Teaches CSC433
Renee Miller Lives in Toronto
<URI> <URI> <URI> or “Literal”
<http://cs.toronto.edu/~miller> <http://xmlns.com/foaf/spec/#term_based_near> <http://dbpedia.org/resource/Toronto>
<http://cs.toronto.edu/~miller> <http://xmlns.com/foaf/spec/#term_based_near> Toronto
bb: renee-j-miller
dbpedia: Toront0
foaf: Person
Renee J. Miller
rdf: type
foaf: name
foaf: based_near
bb: renee-j-millerbb: renee-j-millerbb: renee-j-miller
foaf: name
bb: renee-j-miller
foaf: Friend of a Friend
A Simple RDF Example (in RDF/XML)
<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:foaf="http://xmlns.com/foaf/spec/#"
xmlns:bb="http://data.bibbase.org/ontology/">
<rdf:Description rdf:about="http://.../author/renee-j-miller/">
<rdf:type rdf:resource="http://xmlns.com/foaf/spec/#term_Person"/>
<foaf:name xml:lang=“en">Renée J. Miller</foaf:name>
<foaf:based_near
rdf:resource="http://dbpedia.org/resource/Toronto"/>
</rdf:Description>
</rdf:RDF>
4/4/2016Ankur Biswas 59
dbpedia: Toront0
foaf: Person
Renee J. Miller
rdf: type
foaf: based_near
foaf: name
bb: renee-j-miller
A Simple RDF Example (in Turtle)
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix foaf: <http://xmlns.com/foaf/spec/#> .
@prefix bb: <http://data.bibbase.org/ontology/> .
<http://data.bibbase.org/author/renee-j-miller/>
rdf:type foaf:person .
foaf:name “Renée J. Miller”@en ;
foaf:based_near <http://dbpedia.org/resource/Toronto>
4/4/2016Ankur Biswas 60
dbpedia: Toront0
foaf: Person
Renee J. Miller
rdf: type
foaf: based_near
foaf: name
bb: renee-j-miller
A Simple RDF Example (in RDFa)
…
<p about="http://.../author/renee-j-miller">The author
“<span property=“foaf:name” lang=“en”>Renée J. Miller</span>”
lives in the city
“<span rel=“foaf:based_near“
resource="http://…/Toronto">Toronto</span>”
</p> .
…
4/4/2016Ankur Biswas 61
dbpedia: Toront0
foaf: Person
Renee J. Miller
rdf: type
foaf: based_near
foaf: name
bb: renee-j-miller
• SPARQL stands for “SPARQL Protocol
and RDF Query Language”.
• It is the standard query language for
RDF data proposed by the W3C.
• It is based on matching graph
patterns against RDF graphs.
• The simplest kind of graph pattern is
a triple pattern.
– A triple pattern is like an RDF
triple, but with the option of a
variable in the subject, predicate or
object positions.
4/4/2016Ankur Biswas 62
Example Dataset
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-
syntax-ns#> .
@prefix foaf: <http://xmlns.com/foaf/spec/#> .
@prefix bb: <http://data.bibbase.org/ontology/> .
<http://data.bibbase.org/author/renee-j-miller/>
rdf:type foaf:person .
foaf:name “Renée J. Miller”@en ;
foaf:based_near [ rdf: type foaf:Place;
foaf:name “Toronto”] .
4/4/2016Ankur Biswas 63
Example SPARQL Query
SELECT ?name
WHERE { ?x foaf:name ?name .
?x rdf:type foaf:Person .
?x foaf:based_near ?y .
?y foaf:name “Toronto” .
}
• Result
4/4/2016Ankur Biswas 64
?name
“Renée J. Miller”
4/4/2016Ankur Biswas 65
Example SPARQL Query
4/4/2016Ankur Biswas 66
SPARQL 1.0 allows
• Extraction of Data as
• URIs, Blank Nodes, typed & un-typed Literals
• RDF Subgraphs
• Exploration of data via Query for unknown relations.
• Execution of complex join operations heterogeneous databases in a
single query
• Transformation of RDF Data from one Vocabulary to another
• Construction of new RDF Graphs based on RDF Query Subgraph
4/4/2016Ankur Biswas 67
SPARQL 1.1 (in progress) allows
• Additional Query Features
• Aggregate function, subqueries, negations, project expressions, property paths,
• Enables logical Entailment for
• RDF, RDFS, OWL Direct & RDFS – Based Semantic entailment and RIF Core
entailment
• Enables update of RDF graphs as a full data manipulation language
• Enables the discovery of information about the SPARQL service
• Enables Federated Queries distributed over different SPARQL.
4/4/2016Ankur Biswas 68
SPARQL usage in practice
• SPARQL is usually used over the network
• Separate documents define the protocol and the result format
• SPARQL Protocol for RDF with HTTP and SOAP bindings
• SPARQL results in XML or JSON formats
• Big datasets often offer “SPARQL endpoints” using this
protocol
• Typical example: SPARQL endpoint to DBpedia
4/4/2016Ankur Biswas 69
SPARQL as a unifying point
4/4/2016Ankur Biswas 70
Applications
SPARQL Processor
RDF Graph
HTML
NLPTechnique
Relational Database
SQL⇔RDF
Database
SPARQLEndpoint
SPARQLEndpoint
Triple Store
Unstructured Text XML/XHTML
Based on presentation by Ivan Herman, available at http://www.w3.org/2010/Talks/0622-SemTech-IH/
Other Semantic Web Technologies
• Web Ontology Language (OWL)
• A family of knowledge representation languages for authoring ontologies for
the Web
• RDF Schema (RDFS)
• RDF Vocabulary Description Language
• http://www.w3.org/TR/rdf-schema/
• How to use RDF to describe RDF vocabularies
• Other RDF Vocabularies
• Simple Knowledge Organization System (SKOS)
• Designed for representation of thesauri, classification schemes, taxonomies,
subject-heading systems, or any other type of structured controlled
vocabulary
• FOAF (Friend of a friend)
• A machine-readable ontology describing persons, their activities and their
relations to other people and object
4/4/2016Ankur Biswas 71
ONTOLOGIES
EXISTING OF BEING
4/4/2016Ankur Biswas 72
Ontologies
• An ontology is a formal, explicit, shared specification of a
conceptualization of a domain (Gruber, 1993).
• Conceptualization: the objects, concepts, and other entities that are
assumed to exist in some area of interest and the relationships that
hold among them. A conceptualization is an abstract, simplified view
of the world that we wish to represent for some purpose.
• The term ontology is borrowed from Philosophy, where ontology is a
systematic account of existence (what things exist, how they can be
differentiated from each other etc.).
• Today the word ontology is a synonym for a shared knowledge base.
4/4/2016Ankur Biswas 73
Ontologies – Components & Models
• Classes, Relations & Instances
• Classes represent concepts
• Classes are described by
attributes
• Attributes are name value pairs
4/4/2016Ankur Biswas 74
The address contains the name, title and
place of address of a person
Semi - Informal Description
Address
 First name <string>
 Family name <string>
 Street <string>
 PIN Code <int>
 City <string>
 …
Informal Description
Learning Ontologies
4/4/2016Ankur Biswas 75
Very Large Ontologies
• Recently there has been a lot of work on developing very large
ontologies that capture various areas of human knowledge and
deploying this knowledge in applications such as search engines or
question answering.
• Example: Watson, IBM’s question answering system that beat humans
in the quiz show Jeopardy (http://www-
03.ibm.com/innovation/us/watson/index.html ).
4/4/2016Ankur Biswas 76
5 Open Data – by Tim Berners-Lee
• Tim Berners-Lee, the inventor of the Web and Linked Data initiator,
suggested a 5-star deployment scheme for Open Data. Here, we give
examples for each step of the stars and explain costs and benefits that
come along with it.
4/4/2016Ankur Biswas 77
BY EXAMPLE …
make your stuff available on the Web (whatever format) under an open
license
make it available as structured data (e.g., Excel instead of image scan of a
table)
make it available in a non-proprietary open format (e.g., CSV as well as of
Excel)
use URIs to denote things, so that people can point at your stuff
link your data to other data to provide context
4/4/2016Ankur Biswas 78
What are the costs & benefits of ★ Web
data?
• As a consumer …
• You can look at it.
• You can print it.
• You can store it locally (on your hard drive or on an USB stick).
• You can enter the data into any other system.
• You can change the data as you wish.
• You can share the data with anyone you like.
• As a publisher …
• It’s simple to publish.
• You do not have explain repeatedly to others that they can use your data.
4/4/2016Ankur Biswas 79
What are the costs & benefits of ★★ Web
data?
• As a consumer, you can do all what you can do with ★ Web
data and additionally:
• You can directly process it with proprietary software to aggregate it,
perform calculations, visualize it, etc.
• You can export it into another (structured) format.
• As a publisher …
• It’s still simple to publish.
4/4/2016Ankur Biswas 80
What are the costs & benefits of ★★★ Web
data?
• As a consumer, you can do all what you can do
with ★★ Web data and additionally:
• You can manipulate the data in any way you like, without the need
to own any proprietary software package.
• As a publisher …
• You might need converters or plug-ins to export the data from the
proprietary format.
• It’s still rather simple to publish.
4/4/2016Ankur Biswas 81
What are the costs & benefits of ★★★★ Web
data?
• As a consumer, you can do all what you can do with ★★★ Web data and additionally:
• You can link to it from any other place (on the Web or locally).
• You can bookmark it.
• You can reuse parts of the data.
• You may be able to reuse existing tools and libraries, even if they only understand parts of the pattern
the publisher used.
• Understanding the structure of an RDF “Graph” of data can be more effort than tabular (Excel/CSV) or
tree (XML/JSON) data.
• You can combine the data safely with other data. URIs are a global scheme so if two things have the
same URI then it’s intentional, and if so that’s well on it’s way to being 5-star data!
• As a publisher …
• You have fine-granular control over the data items and can optimize their access (load balancing,
caching, etc.)
• Other data publishers can now link into your data, promoting it to 5 star!
• You typically invest some time slicing and dicing your data.
• You’ll need to assign URIs to data items and think about how to represent the data.
• You need to either find existing patterns to reuse or create your own.
4/4/2016Ankur Biswas 82
What are the costs & benefits of ★★★★★ Web
data?
• As a consumer, you can do all what you can do with ★★★★ Web data and
additionally:
• You can discover more (related) data while consuming the data.
You can directly learn about the data schema.
• You now have to deal with broken data links, just like 404 errors in web pages.
• Presenting data from an arbitrary link as fact is as risky as letting people include
content from any website in your pages. Caution, trust and common sense are all still
necessary.
• As a publisher …
• You make your data discoverable.
• You increase the value of your data.
• Your own organization will gain the same benefits from the links as the consumers.
• You’ll need to invest resources to link your data to other data on the Web.
• You may need to repair broken or incorrect links.
4/4/2016Ankur Biswas 83
Applications
• Data integration (e.g., see project Optique http://www.optique-
project.eu/)
• E-government (e.g., open data)
• E-commerce
• Tourism
• Medicine
• Biology
• Earth Observation (see the work of my group in projects TELEIOS
http://www.earthobservatory.eu/ and LEO
http://www.linkedeodata.eu/ ).
• …
4/4/2016Ankur Biswas 84
References:
• Books:
• Antoniou, Grigoris, and F. Van Harmelet. "A semantic web premier." England: The MIT Press
Cambridge (2004).
• Segaran, Toby, Colin Evans, and Jamie Taylor. Programming the semantic web. " O'Reilly Media, Inc.", 2009.
• Davies, John, Dieter Fensel, and Frank Van Harmelen. "Towards the semantic web." Ontology-Driven
Knowledge Management. Chichester (2003).
• Scientific Papers:
• Maedche, Alexander. Ontology learning for the semantic web. Vol. 665. Springer Science & Business Media,
2012.
• Schmachtenberg, Max, Christian Bizer, and Heiko Paulheim. "Adoption of the linked data best practices in
different topical domains." The semantic web–ISWC 2014. Springer International Publishing, 2014. 245-260.
• Video Lectures & Slides
• Video lectures on Semantic Web by Dr. Harald Sack, Hasso Plattner Institute, University in Potsdam, Germany
• www.cs.toronto.edu/~oktie/slides/web-of-data-intro.pdf
• https://www.w3.org/2010/Talks/0622-SemTech-IH/
• Websites
• http://dbpedia.org/snorql/
• http://5stardata.info/en/
4/4/2016Ankur Biswas 85
4/4/2016Ankur Biswas 86
Thank You

Contenu connexe

Tendances

The semantic web
The semantic web The semantic web
The semantic web ap
 
Semantic web technology
Semantic web technologySemantic web technology
Semantic web technologyStanley Wang
 
Web 3.0 The Semantic Web
Web 3.0 The Semantic WebWeb 3.0 The Semantic Web
Web 3.0 The Semantic WebHatem Mahmoud
 
The Semantic Web #9 - Web Ontology Language (OWL)
The Semantic Web #9 - Web Ontology Language (OWL)The Semantic Web #9 - Web Ontology Language (OWL)
The Semantic Web #9 - Web Ontology Language (OWL)Myungjin Lee
 
Semantic web
Semantic webSemantic web
Semantic webRehithaP
 
Web ontology language (owl)
Web ontology language (owl)Web ontology language (owl)
Web ontology language (owl)Ameer Sameer
 
Semantic web Document
Semantic web DocumentSemantic web Document
Semantic web Documentap
 
Semantic Web, Ontology, and Ontology Learning: Introduction
Semantic Web, Ontology, and Ontology Learning: IntroductionSemantic Web, Ontology, and Ontology Learning: Introduction
Semantic Web, Ontology, and Ontology Learning: IntroductionKent State University
 
Ontologies and semantic web
Ontologies and semantic webOntologies and semantic web
Ontologies and semantic webStanley Wang
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic WebMarin Dimitrov
 
Introduction to Basic Concepts in Web
Introduction to Basic Concepts in WebIntroduction to Basic Concepts in Web
Introduction to Basic Concepts in WebJussi Pohjolainen
 
Web 3.0 (Presentation)
Web 3.0 (Presentation)Web 3.0 (Presentation)
Web 3.0 (Presentation)Allan Cho
 
Introduction to RDF
Introduction to RDFIntroduction to RDF
Introduction to RDFNarni Rajesh
 
Intro to web scraping with Python
Intro to web scraping with PythonIntro to web scraping with Python
Intro to web scraping with PythonMaris Lemba
 

Tendances (20)

The semantic web
The semantic web The semantic web
The semantic web
 
Semantic web technology
Semantic web technologySemantic web technology
Semantic web technology
 
Semantic Web
Semantic WebSemantic Web
Semantic Web
 
Web 3.0 The Semantic Web
Web 3.0 The Semantic WebWeb 3.0 The Semantic Web
Web 3.0 The Semantic Web
 
The Semantic Web #9 - Web Ontology Language (OWL)
The Semantic Web #9 - Web Ontology Language (OWL)The Semantic Web #9 - Web Ontology Language (OWL)
The Semantic Web #9 - Web Ontology Language (OWL)
 
Web 3.0
Web 3.0Web 3.0
Web 3.0
 
Semantic web
Semantic webSemantic web
Semantic web
 
Semantic web
Semantic webSemantic web
Semantic web
 
Web ontology language (owl)
Web ontology language (owl)Web ontology language (owl)
Web ontology language (owl)
 
Web mining
Web miningWeb mining
Web mining
 
Semantic web Document
Semantic web DocumentSemantic web Document
Semantic web Document
 
web mining
web miningweb mining
web mining
 
Semantic Web, Ontology, and Ontology Learning: Introduction
Semantic Web, Ontology, and Ontology Learning: IntroductionSemantic Web, Ontology, and Ontology Learning: Introduction
Semantic Web, Ontology, and Ontology Learning: Introduction
 
Ontologies and semantic web
Ontologies and semantic webOntologies and semantic web
Ontologies and semantic web
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic Web
 
Introduction to Basic Concepts in Web
Introduction to Basic Concepts in WebIntroduction to Basic Concepts in Web
Introduction to Basic Concepts in Web
 
Structure of Semantic web
Structure of Semantic web Structure of Semantic web
Structure of Semantic web
 
Web 3.0 (Presentation)
Web 3.0 (Presentation)Web 3.0 (Presentation)
Web 3.0 (Presentation)
 
Introduction to RDF
Introduction to RDFIntroduction to RDF
Introduction to RDF
 
Intro to web scraping with Python
Intro to web scraping with PythonIntro to web scraping with Python
Intro to web scraping with Python
 

Similaire à An Introduction to Semantic Web Technology

CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notesAnandh Arumugakan
 
Open Data - Principles and Techniques
Open Data - Principles and TechniquesOpen Data - Principles and Techniques
Open Data - Principles and TechniquesBernhard Haslhofer
 
Usage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosUsage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosEUCLID project
 
CLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage informationCLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage informationEnno Meijers
 
Linked Data for the Masses: The approach and the Software
Linked Data for the Masses: The approach and the SoftwareLinked Data for the Masses: The approach and the Software
Linked Data for the Masses: The approach and the SoftwareIMC Technologies
 
Drupal and Apache Stanbol
Drupal and Apache StanbolDrupal and Apache Stanbol
Drupal and Apache StanbolAlkuvoima
 
Web Introduction
Web IntroductionWeb Introduction
Web Introductionasim78
 
What do we want computers to do for us?
What do we want computers to do for us? What do we want computers to do for us?
What do we want computers to do for us? Andrea Volpini
 
New ICT Trends and Issues of Librarianship
New ICT Trends and Issues of LibrarianshipNew ICT Trends and Issues of Librarianship
New ICT Trends and Issues of LibrarianshipLiaquat Rahoo
 
Introduction to internet technology
Introduction to internet technologyIntroduction to internet technology
Introduction to internet technologyOnline
 
Corrib.org - OpenSource and Research
Corrib.org - OpenSource and ResearchCorrib.org - OpenSource and Research
Corrib.org - OpenSource and Researchadameq
 
BIBFRAME, Linked data, RDA
BIBFRAME, Linked data, RDA BIBFRAME, Linked data, RDA
BIBFRAME, Linked data, RDA robin fay
 
What is New in W3C land?
What is New in W3C land?What is New in W3C land?
What is New in W3C land?Ivan Herman
 
Web 3 final(1)
Web 3 final(1)Web 3 final(1)
Web 3 final(1)Venky Dood
 
Linked data and the future of libraries
Linked data and the future of librariesLinked data and the future of libraries
Linked data and the future of librariesRegan Harper
 

Similaire à An Introduction to Semantic Web Technology (20)

The Web of Data: The W3C Semantic Web Initiative
The Web of Data: The W3C Semantic Web InitiativeThe Web of Data: The W3C Semantic Web Initiative
The Web of Data: The W3C Semantic Web Initiative
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notes
 
Open Data - Principles and Techniques
Open Data - Principles and TechniquesOpen Data - Principles and Techniques
Open Data - Principles and Techniques
 
Usage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosUsage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application Scenarios
 
CLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage informationCLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage information
 
Linked Data for the Masses: The approach and the Software
Linked Data for the Masses: The approach and the SoftwareLinked Data for the Masses: The approach and the Software
Linked Data for the Masses: The approach and the Software
 
Drupal and Apache Stanbol
Drupal and Apache StanbolDrupal and Apache Stanbol
Drupal and Apache Stanbol
 
Web Introduction
Web IntroductionWeb Introduction
Web Introduction
 
What do we want computers to do for us?
What do we want computers to do for us? What do we want computers to do for us?
What do we want computers to do for us?
 
W3 c semantic web activity
W3 c semantic web activityW3 c semantic web activity
W3 c semantic web activity
 
New ICT Trends and Issues of Librarianship
New ICT Trends and Issues of LibrarianshipNew ICT Trends and Issues of Librarianship
New ICT Trends and Issues of Librarianship
 
Introduction to internet technology
Introduction to internet technologyIntroduction to internet technology
Introduction to internet technology
 
BrainSpa Paper
BrainSpa PaperBrainSpa Paper
BrainSpa Paper
 
Corrib.org - OpenSource and Research
Corrib.org - OpenSource and ResearchCorrib.org - OpenSource and Research
Corrib.org - OpenSource and Research
 
BIBFRAME, Linked data, RDA
BIBFRAME, Linked data, RDA BIBFRAME, Linked data, RDA
BIBFRAME, Linked data, RDA
 
Semantic web
Semantic webSemantic web
Semantic web
 
What is New in W3C land?
What is New in W3C land?What is New in W3C land?
What is New in W3C land?
 
Web 3 final(1)
Web 3 final(1)Web 3 final(1)
Web 3 final(1)
 
Linked data and the future of libraries
Linked data and the future of librariesLinked data and the future of libraries
Linked data and the future of libraries
 
Overview of dbms
Overview of dbmsOverview of dbms
Overview of dbms
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 

Dernier (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

An Introduction to Semantic Web Technology

  • 2. INTRODUCTION WEB OF DOCUMENTS VS. WEB OF DATA 4/4/2016Ankur Biswas 2
  • 3. A Walk Through Brief History of World Wide Web • 1969 – ARPANET (Advanced Research Project Agency) launched • In 1980, Tim Berners-Lee built ENQUIRE, as a personal database of people and software models, a way to play with hypertext; each new page of information in ENQUIRE had to be linked to an existing page. • In 1990, Berners-Lee built all the tools necessary for working Web: HTTP 0.9, HTML, First Web Browser (Web-Editor), the first HTTP server software (CERN httpd), the first web server (http://info.cern.ch), and the first Web pages that described the project itself. WWW's historical logo designed by Robert Cailliau The NeXTcube used by Tim Berners-Lee at CERN became the first Web server. 34/4/2016Ankur Biswas
  • 4. How big is web??? • As per http://www.worldwidewebsize.com/ the Indexed Web contains at least 4.84 billion pages (Thursday, 25 February, 2016). • Early estimates suggested that the deep web is 400 to 550 times larger than the surface web. • Since more information and sites are always being added, it can be assumed that the deep web is growing exponentially at a rate that cannot be quantified. 44/4/2016Ankur Biswas
  • 5. Understanding Information in the WWW • What is important and how do you know? • What is information, what is advertisement? • What does information mean? • How credible or trustworthy is the information? • What is redundant? 54/4/2016Ankur Biswas
  • 6. Understanding the Importance of Meaning • SEMANTICS: It is part of the linguistics focused on Sense & Meaning of language or symbols of language. • It is study of interpretation of sign or symbols as used by agents or communities within particular circumstances and contexts. • Semantics asks, how sense and meaning of complex concepts can be derived from simple concepts based on the rules of syntax. • The semantics of a message depends of its context and pragmatics†. †Dealing with things sensibly and realistically in a way that is based on practical rather than theoretical considerations. 64/4/2016Ankur Biswas
  • 7. • SYNTAX: In grammatics denotes the study of the principles and processes by which sentences are constructed in particular language. • In formal languages, syntax is just a set of rules, by which well formed expressions can be created from a fundamental set of symbols (alphabet). • In computer science, Syntax defines the normative structure of data. Understanding the Importance of Meaning 74/4/2016Ankur Biswas
  • 8. Understanding the Importance of Meaning • CONTEXT: It denotes the surrounding expressions (concepts) in an expressing represents its relationship with surrounding expressions (concepts) and further related elements. • Context denotes all elements of any sort of communications that define the interpretation of the communicated content e.g. • General contexts: place, time, interrelation of action in message. • Personal or Social contexts: relation between sender and receiver of a message. • PRAGMATICS: It reflects the intention by which the language is used to communicate a message. • In linguistic pragmatics denotes the study of applying language in different situations It also denotes the intended purpose of speaker. Pragmatics studies the ways in which context contributes to meaning 84/4/2016Ankur Biswas
  • 9. The limits of web • Traditional key based search leads to many irrelevant results. • Ex.- From a simple term Jaguar it is not clear if the user mean car or animal or OS(Mac OS X Jaguar) • POLYSEMY: If you get some result for your search and get some other result as well with different meaning having same or similar name. 94/4/2016Ankur Biswas
  • 10. Problem 1: Information Retrieval • Jaguar (animal) Panthera Onca • Traditional keyword-based search doesn’t find all results. • Synonyms & metaphors (Not always addressed properly which results undesired results) Primary objects: documents Degree of structure in data: fairly low Implicit semantics of contents Designed for: human consumption 4/4/2016Ankur Biswas 10 HTML HTML HTML API/ XML A B C D Untyped Links Untyped Links Untyped Links
  • 11. Problem 2: Information Extraction • Identifying contents written in other languages e.g. Japanese or Bengali • Pictures doesn’t give any information to search engines that what it shows. • Example – Google identifies the caption or name of the picture which is embedded in it and makes it a reference keyword. 4/4/2016Ankur Biswas 11
  • 12. Problem 2: Information Extraction (Cont.) 4/4/2016Ankur Biswas 12 HTML HTML HTML API/ XML A B C D Untyped Links Untyped Links Untyped Links Things Things Are two Documents talking about same “Thing”??? ? ? ? ? ? ? ?
  • 13. • Can only be solved, correctly by a human agent • Heterogeneous distribution and order of information. • Software agent does not have sufficient: • Knowledge of contexts • World knowledge and • Experience To solve problem Hence it will not be able to solve the problem without explicit semantic available. Implicit knowledge, i.e. information doesn’t have specified explicitly but must be derived via logical deductions from available information. 4/4/2016Ankur Biswas 13 Problem 2: Information Extraction (Cont.)
  • 14. The more complex and voluminous a website is , the more complicated is the maintenance of the only weakly structured data. Problems:  Syntactic consistency error: You have linked your webpage to another webpage having some related content but now the webpage has moved to some other place and the link to that address still exist.  Semantic (link) consistency error: This is even more dangerous where hyperlinked destinations is consistently changing.  Correctness: It is tough to maintain correctness over time in automated manner  Timeliness: Tracking the changes over time is really tough. Problem 3: Maintenance 4/4/2016Ankur Biswas 14 http 404 Error: File/Page not found
  • 15. Problem 4: Personalization • Adaption of the presented information content to personal requirements: User normally password protect their details and hence it becomes tough to access any such kind of information. • Problems: • From where do we get the required (personal) information? • Personalization vs Data Security 4/4/2016Ankur Biswas 15
  • 16. INTRODUCTION TO SEMANTIC WEB TECHNOLOGIES THE VISION OF THE SEMANTIC WEB 4/4/2016Ankur Biswas 16
  • 17. The vision of the Semantic Web 4/4/2016Ankur Biswas 17 Precondition: • Content can be read and interpreted correctly (understood) by machines Natural language Processing • Technologies of Traditional Information Retrieval (Search Engines) Semantic Web concept was first introduced in 1990’s by Tim Berners – Lee who is also one of the creator of internet. Semantic Web • Natural language web content will be explicitly annotated with semantic metadata • Semantic metadata encode the Meaning (Semantics) of the content and can be read and interpreted correctly by machines.
  • 18. How Can we Achieve the Semantic Web? – The Original Vision • Instead of publishing information to be consumed by humans, publish machine-processable data and metadata using terms/languages that can be understood by machines. • Build machines (agents) that will search for, query, integrate etc. this data. • Make sure all agents understand your terms/languages. 4/4/2016Ankur Biswas 18
  • 19. The Semantic Web and Linked Data Vision Today • The Semantic Web is a web of data. There is lots of data we all use every day, and it is not part of the web. • The Semantic Web is about two things: • It is about common formats for integration and combination of data drawn from diverse sources, where on the original Web mainly concentrated on the interchange of documents. • It is also about language for recording how the data relates to real world objects. • That allows a person, or a machine, to start off in one database, and then move through an unending set of databases which are connected not by wires but by being about the same thing. 4/4/2016Ankur Biswas 19
  • 20. Semantic Web Technology Stack • Most apps use only a subset of the stack • Querying allows fine-grained data access • Standardized information exchange is a key • Formats are necessary but not too important • The semantic web is based on the web 4/4/2016Ankur Biswas 20
  • 21. Basic Layer of Semantic Web Technology Stack • The foundation of the layer is World Wide Web. Hence we rely on all technologies in world wide web. • Semantic version of Wikipedia is DBpedia. • As Wikipedia is having template hence data is somewhat structured. • DBpedia extracts data from Wikipedia infoboxes. • DBpedia is having machine readable language  RDF • Dbpedia stores & publishes the result in RDF and a few other formats. • It also hosts a community effort to define extractors for the data, that can be used well beyond Wikipedia. • It provides a number of services around the extracted data, like DBpedia mobile, a SPARQL endpoint, a faceted browser, a number of mappings to external ontologies, an ontology itself, etc. 4/4/2016Ankur Biswas 21
  • 22. Semantic Web Technologies • A set of technologies and frameworks that enable the Web of Data: • Resource Description Framework (RDF) • A variety of data interchange formats (e.g. RDF/XML, N3, Turtle, N- Triples) • Notations such as RDF Schema (RDFS) and the Web Ontology Language (OWL) • All are intended to provide a formal description of concepts, terms, and relationships within a given knowledge domain • Specialized query language (SPARQL) is just like SQL but can be more complicated and may be based on graph extraction 4/4/2016Ankur Biswas 22
  • 23. Application in Web of Data • Linked Data • Linked Open Data (LOD) denote publicly available (RDF) Data in the web, identification via URI and accessible via HTTP. Linked data 4/4/2016Ankur Biswas 23 Web of Data: • >31 billion Facts • >500 million Links (Oct 2011)
  • 24. 4/4/2016Ankur Biswas 24 What is so special about BBC Music Website? • Information is dynamically aggregated from external, publicly available data (Wikipedia, Music Brainz,…) • No Screen Scrapping • No specialized API • Data available as Linked Open Data. • Data access via simple HTTP Request • Data is always up to date without manual interaction.
  • 25. How to build such a site 1. • Site editors roam the Web for new facts • may discover further links while roaming • They update the site manually • And the site gets soon out-of-date 4/4/2016Ankur Biswas 25
  • 26. How to build such a site 2. • Editors roam the Web for new data published on Web sites • “Scrape” the sites with a program to extract the information • i.e., write some code to incorporate the new data • Easily get out of date again… 4/4/2016Ankur Biswas 26
  • 27. How to build such a site 3. • Editors roam the Web for new data via API-s • Understand those… • input, output arguments, datatypes used, etc. • Write some code to incorporate the new data • Easily get out of date again… 4/4/2016Ankur Biswas 27
  • 28. The choice of the BBC • Use external, public datasets • Wikipedia, MusicBrainz, … • They are available as data • not API-s or hidden on a Web site • data can be extracted using, e.g., HTTP requests or standard queries 4/4/2016Ankur Biswas 28
  • 30. Search Engines – Document Retrieval • General Problems: • Correct interpretation of query string -> • Somehow the context of user has to be considered • e.g. what was the query of the user just before a specific query or their usual preferences etc. • Correct identification of entities • Automatic disambiguation • Usability • personalization 4/4/2016Ankur Biswas 30
  • 31. Intelligent Agents in Semantic Web WORLD WIDE WEB SEMANTIC WEB 4/4/2016Ankur Biswas 31 USER Presentation Service (e.g. Firefox) Retrieval Service (e.g. Google) USER Personal Assistant www documents www documents Intelligent Infrastructure Services
  • 32. 3 Generations of Web Documents 4/4/2016Ankur Biswas 32 Static Web Pages HTML / CSS 1st Generation Virtual Web Pages Interactive Web Pages Java Script/ Applets Netbots Information Extraction Presentation Planning Database Access Template Based Generation User Model Machine Learning Online Layout Dynamic Web Pages Adaptive Web Pages 2nd Generation 3rd Generation
  • 33. Toolbox for the Semantic Web • Standardized Language to express semantic of information content in the web (XML/XSD, RDF(S), OWL, RIF) • Tools of semantic information in the web (RDFa, GRDDL,…) • Various Field of computer science: • Artificial Intelligence • Linguistics • Cryptography • Database • Theoretical Computer Science • Computer Architecture • Software Engineering • Systems Theory • Computer Networks 4/4/2016Ankur Biswas 33
  • 34. Basic Architecture of Semantic Web - I • Uniform  Different types of resource identifiers all constructed according to uniform schema. •Resource  Whatever may be identified by URI •Identifier  To distinguish one resource from another 4/4/2016Ankur Biswas 34
  • 35. Uniform Resource Identifier (URI) • A Uniform Resource Identifier (URI) defines a simple and extensible schema for world wide unique identification of abstract or physical resources. • Resources can be every object with a clear identity (according to the context of the application) • As e.g. webpages, books, locations, persons, relations among objects, abstract concepts, etc. • The concept of URI is already established in various domains as e.g. • The Web(URL (uniform resource locator), PRN (persistent uniform names), pURL (persistent uniform resource locator) • Books & Publications (ISBN, ISSN) • Digital Object Identifier (DOI) 4/4/2016Ankur Biswas 35
  • 36. Uniform Resource Identifier (URI) • URI Combines • Address (Locator) • Uniform Resource Locator (URL, RFC 1738) • Denotes, where a resource can be found in the web by stating its primary access mechanism • Might change during life time. • Identity (Name) • Uniform Resource Name (URN, RFC 2141) • Persistent Identifier for a web resource • Remains unchanged during life cycle • URI Generic Syntax • Schema: e.g. http, ftp, mailto • Userinfo: e.g. username; password • Host: e.g. Domain name, IPv4/IPv6 Address • Port: e.g. :80 stands for http port • Path: e.g. path in file system of WWW server • Query: e.g. parameters to be passed over to applications • Fragment: e.g. determines specific fragment of a document 4/4/2016Ankur Biswas 36 URI=schema”://”[userinfo”@”]host[:port] [path][“?”query][“#”fragment]
  • 37. Data on the Web is not enough… • We need a proper infrastructure for a real Web of Data • data is available on the Web • accessible via standard Web technologies • data are interlinked over the Web • i.e., data can be integrated over the Web • This is where Semantic Web technologies come in • We will use a simplistic example to introduce the main Semantic Web concepts 4/4/2016Ankur Biswas 37
  • 38. The rough structure of data integration • Map the various data onto an abstract data representation • make the data independent of its internal representation… • Merge the resulting representations • Start making queries on the whole! • queries not possible on the individual data sets 4/4/2016Ankur Biswas 38
  • 39. We start with a book... 4/4/2016Ankur Biswas 39
  • 40. A simplified bookstore data (dataset “A”) 4/4/2016Ankur Biswas 40 ID Author Title Publisher Year ISBN 0-00-6511409-X id_xyz The Glass Palace id_qpr 2000 ID Name Homepage id_xyz Ghosh, Amitav http://www.amitavghosh.com ID Publisher’s name City id_qpr Harper Collins London
  • 41. 1st: we export our data as a set of relations 4/4/2016Ankur Biswas 41 http://…isbn/000651409X Ghosh, Amitav http://www.amitavghosh.com The Glass Palace 2000 London Harper Collins a:name a:homepage a:author
  • 42. Some notes on the exporting the data • Relations form a graph • the nodes refer to the “real” data or contain some literal • how the graph is represented in machine is immaterial for now • Data export does not necessarily mean physical conversion of the data • relations can be generated on-the-fly at query time • via SQL “bridges” • scraping HTML pages • extracting data from Excel sheets • etc. • One can export part of the data 4/4/2016Ankur Biswas 42
  • 43. Same book in French… 4/4/2016Ankur Biswas 43
  • 44. Another bookstore data (dataset “F”) 4/4/2016Ankur Biswas 44 A B C D 1 ID Titre Traducteur Original 2 ISBN 2020286682 Le Palais des Miroirs $A12$ ISBN 0-00-6511409-X 3 4 5 6 ID Auteur 7 ISBN 0-00-6511409-X $A11$ 8 9 10 Nom 11 Ghosh, Amitav 12 Besse, Christianne
  • 45. 2nd: export your second set of data 4/4/2016Ankur Biswas 45 http://…isbn/000651409X Ghosh, Amitav Besse, Christianne Le palais des miroirs f:nom f:traducteur f:auteur http://…isbn/2020386682 f:nom
  • 46. 3rd: start merging your data 4/4/2016Ankur Biswas 46 http://…isbn/000651409X Ghosh, Amitav Besse, Christianne Le palais des miroirs f:nom f:traducteur f:auteur http://…isbn/2020386682 f:nom http://…isbn/000651409X Ghosh, Amitav http://www.amitavghosh.com The Glass Palace 2000 London Harper Collins a:name a:homepage a:author Same URI!
  • 47. 3rd: start merging your data 4/4/2016Ankur Biswas 47 Ghosh, Amitav Besse, Christianne Le palais des miroirs f:original f:nom f:traducteur f:auteur http://…isbn/2020386682 f:nom Ghosh, Amitav http://www.amitavghosh.com The Glass Palace 2000 London Harper Collins a:name a:homepage a:author http://…isbn/000651409X
  • 48. Start making queries… • User of data “F” can now ask queries like: • “give me the title of the original” • well, … « donnes-moi le titre de l’original » • This information is not in the dataset “F”… • …but can be retrieved by merging with dataset “A”! 4/4/2016Ankur Biswas 48
  • 49. However, more can be achieved… • We “feel” that a:author and f:auteur should be the same • But an automatic merge does not know that! • Let us add some extra information to the merged data: • a:author same as f:auteur • both identify a “Person” • a term that a community may have already defined: • a “Person” is uniquely identified by his/her name and, say, homepage • it can be used as a “category” for certain type of resources 4/4/2016Ankur Biswas 49
  • 50. 3rd revisited: use the extra knowledge 4/4/2016Ankur Biswas 50 Besse, Christianne Le palais des miroirs f:original f:nom f:traducteu r f:auteur http://…isbn/2020386682 f:nom Ghosh, Amitav http://www.amitavghosh.com The Glass Palace 2000 London Harper Collins a:name a:homepage a:author http://…isbn/000651409X http://…foaf/Person r:type r:type f:auteur a:name a:homepage f:auteur a:name a:homepage f:original f:traducteur f:nom r:type f:auteur a:name a:homepage
  • 51. Start making richer queries! • User of dataset “F” can now query: • “donnes-moi la page d’accueil de l’auteur de l’original” • well… “give me the home page of the original’s ‘auteur’” • The information is not in datasets “F” or “A”… • …but was made available by: • merging datasets “A” and datasets “F” • adding three simple extra statements as an extra “glue” 4/4/2016Ankur Biswas 51
  • 52. Combine with different datasets • Using, e.g., the “Person”, the dataset can be combined with other sources • For example, data in Wikipedia can be extracted using dedicated tools • e.g., the “dbpedia” project can extract the “infobox” information from Wikipedia already… 4/4/2016Ankur Biswas 52
  • 53. Merge with Wikipedia data 4/4/2016Ankur Biswas 53 Besse, Christianne Le palais des miroirs f:original f:nom f:traducteu r f:auteur http://…isbn/2020386682 f:nom Ghosh, Amitav http://www.amitavghosh.com The Glass Palace 2000 London Harper Collins a:name a:homepage a:author http://…isbn/000651409X http://…foaf/Person r:type r:type http://dbpedia.org/../Amitav_Ghosh http://dbpedia.org/../The_Hungry_Tide http://dbpedia.org/../The_Calcutta_Chromosome http://dbpedia.org/../The_Glass_Palace r:type foaf:name w:reference w:author_of w:author_of w:isbn a:author f:original f:traducteur f:nom r:type w:isbn http://dbpedia.org/../Kolkata w:author_of w:born_in w:long w:lat
  • 54. Search Engines – Fact Retrieval 4/4/2016Ankur Biswas 54 Query String: International Space Station - 17th March 2016 • What is International Space Station? • Is it orbiting on 17th March 2016? • How to compute the position of satellite on the said date • External Data to be considered: • Constellation data • Planet data • Satellite data Query String: International Space Station - 17th March 2016 • What is International Space Station? • Is it orbiting on 17th March 2016? • How to compute the position of satellite on the said date • External Data to be considered: • Constellation data • Planet data • Satellite data
  • 55. RDF RDF stands for • Resource: pages, dogs, ideas... everything that can have a URI • Description: attributes, features, and relations of the resources • Framework: model, languages and syntaxes for these descriptions •RDF is a triple model i.e. every piece of knowledge is broken down into ( subject , predicate , object ) 4/4/2016Ankur Biswas 55
  • 56. RDF 4/4/2016Ankur Biswas 56 • doc.html has for author Ankur and has for theme Research • doc.html has for author Ankur doc.html has for theme Research • ( doc.html , author , Ankur ) ( doc.html , theme , Research ) ( subject , predicate , object )
  • 57. 4/4/2016Ankur Biswas 57 RDFis also a graph model to link the descriptions of resources RDF triples can be seen as arcs of a graph (vertex,edge,vertex) Ankur Doc.html Research Author Theme
  • 58. Resource Description Framework (RDF) • Another Triple Model: 4/4/2016Ankur Biswas 58 Subject Predicate Object Renee Miller Teaches CSC433 Renee Miller Lives in Toronto <URI> <URI> <URI> or “Literal” <http://cs.toronto.edu/~miller> <http://xmlns.com/foaf/spec/#term_based_near> <http://dbpedia.org/resource/Toronto> <http://cs.toronto.edu/~miller> <http://xmlns.com/foaf/spec/#term_based_near> Toronto bb: renee-j-miller dbpedia: Toront0 foaf: Person Renee J. Miller rdf: type foaf: name foaf: based_near bb: renee-j-millerbb: renee-j-millerbb: renee-j-miller foaf: name bb: renee-j-miller foaf: Friend of a Friend
  • 59. A Simple RDF Example (in RDF/XML) <?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:foaf="http://xmlns.com/foaf/spec/#" xmlns:bb="http://data.bibbase.org/ontology/"> <rdf:Description rdf:about="http://.../author/renee-j-miller/"> <rdf:type rdf:resource="http://xmlns.com/foaf/spec/#term_Person"/> <foaf:name xml:lang=“en">Renée J. Miller</foaf:name> <foaf:based_near rdf:resource="http://dbpedia.org/resource/Toronto"/> </rdf:Description> </rdf:RDF> 4/4/2016Ankur Biswas 59 dbpedia: Toront0 foaf: Person Renee J. Miller rdf: type foaf: based_near foaf: name bb: renee-j-miller
  • 60. A Simple RDF Example (in Turtle) @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix foaf: <http://xmlns.com/foaf/spec/#> . @prefix bb: <http://data.bibbase.org/ontology/> . <http://data.bibbase.org/author/renee-j-miller/> rdf:type foaf:person . foaf:name “Renée J. Miller”@en ; foaf:based_near <http://dbpedia.org/resource/Toronto> 4/4/2016Ankur Biswas 60 dbpedia: Toront0 foaf: Person Renee J. Miller rdf: type foaf: based_near foaf: name bb: renee-j-miller
  • 61. A Simple RDF Example (in RDFa) … <p about="http://.../author/renee-j-miller">The author “<span property=“foaf:name” lang=“en”>Renée J. Miller</span>” lives in the city “<span rel=“foaf:based_near“ resource="http://…/Toronto">Toronto</span>” </p> . … 4/4/2016Ankur Biswas 61 dbpedia: Toront0 foaf: Person Renee J. Miller rdf: type foaf: based_near foaf: name bb: renee-j-miller
  • 62. • SPARQL stands for “SPARQL Protocol and RDF Query Language”. • It is the standard query language for RDF data proposed by the W3C. • It is based on matching graph patterns against RDF graphs. • The simplest kind of graph pattern is a triple pattern. – A triple pattern is like an RDF triple, but with the option of a variable in the subject, predicate or object positions. 4/4/2016Ankur Biswas 62
  • 63. Example Dataset @prefix rdf: <http://www.w3.org/1999/02/22-rdf- syntax-ns#> . @prefix foaf: <http://xmlns.com/foaf/spec/#> . @prefix bb: <http://data.bibbase.org/ontology/> . <http://data.bibbase.org/author/renee-j-miller/> rdf:type foaf:person . foaf:name “Renée J. Miller”@en ; foaf:based_near [ rdf: type foaf:Place; foaf:name “Toronto”] . 4/4/2016Ankur Biswas 63
  • 64. Example SPARQL Query SELECT ?name WHERE { ?x foaf:name ?name . ?x rdf:type foaf:Person . ?x foaf:based_near ?y . ?y foaf:name “Toronto” . } • Result 4/4/2016Ankur Biswas 64 ?name “Renée J. Miller”
  • 67. SPARQL 1.0 allows • Extraction of Data as • URIs, Blank Nodes, typed & un-typed Literals • RDF Subgraphs • Exploration of data via Query for unknown relations. • Execution of complex join operations heterogeneous databases in a single query • Transformation of RDF Data from one Vocabulary to another • Construction of new RDF Graphs based on RDF Query Subgraph 4/4/2016Ankur Biswas 67
  • 68. SPARQL 1.1 (in progress) allows • Additional Query Features • Aggregate function, subqueries, negations, project expressions, property paths, • Enables logical Entailment for • RDF, RDFS, OWL Direct & RDFS – Based Semantic entailment and RIF Core entailment • Enables update of RDF graphs as a full data manipulation language • Enables the discovery of information about the SPARQL service • Enables Federated Queries distributed over different SPARQL. 4/4/2016Ankur Biswas 68
  • 69. SPARQL usage in practice • SPARQL is usually used over the network • Separate documents define the protocol and the result format • SPARQL Protocol for RDF with HTTP and SOAP bindings • SPARQL results in XML or JSON formats • Big datasets often offer “SPARQL endpoints” using this protocol • Typical example: SPARQL endpoint to DBpedia 4/4/2016Ankur Biswas 69
  • 70. SPARQL as a unifying point 4/4/2016Ankur Biswas 70 Applications SPARQL Processor RDF Graph HTML NLPTechnique Relational Database SQL⇔RDF Database SPARQLEndpoint SPARQLEndpoint Triple Store Unstructured Text XML/XHTML Based on presentation by Ivan Herman, available at http://www.w3.org/2010/Talks/0622-SemTech-IH/
  • 71. Other Semantic Web Technologies • Web Ontology Language (OWL) • A family of knowledge representation languages for authoring ontologies for the Web • RDF Schema (RDFS) • RDF Vocabulary Description Language • http://www.w3.org/TR/rdf-schema/ • How to use RDF to describe RDF vocabularies • Other RDF Vocabularies • Simple Knowledge Organization System (SKOS) • Designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary • FOAF (Friend of a friend) • A machine-readable ontology describing persons, their activities and their relations to other people and object 4/4/2016Ankur Biswas 71
  • 73. Ontologies • An ontology is a formal, explicit, shared specification of a conceptualization of a domain (Gruber, 1993). • Conceptualization: the objects, concepts, and other entities that are assumed to exist in some area of interest and the relationships that hold among them. A conceptualization is an abstract, simplified view of the world that we wish to represent for some purpose. • The term ontology is borrowed from Philosophy, where ontology is a systematic account of existence (what things exist, how they can be differentiated from each other etc.). • Today the word ontology is a synonym for a shared knowledge base. 4/4/2016Ankur Biswas 73
  • 74. Ontologies – Components & Models • Classes, Relations & Instances • Classes represent concepts • Classes are described by attributes • Attributes are name value pairs 4/4/2016Ankur Biswas 74 The address contains the name, title and place of address of a person Semi - Informal Description Address  First name <string>  Family name <string>  Street <string>  PIN Code <int>  City <string>  … Informal Description
  • 76. Very Large Ontologies • Recently there has been a lot of work on developing very large ontologies that capture various areas of human knowledge and deploying this knowledge in applications such as search engines or question answering. • Example: Watson, IBM’s question answering system that beat humans in the quiz show Jeopardy (http://www- 03.ibm.com/innovation/us/watson/index.html ). 4/4/2016Ankur Biswas 76
  • 77. 5 Open Data – by Tim Berners-Lee • Tim Berners-Lee, the inventor of the Web and Linked Data initiator, suggested a 5-star deployment scheme for Open Data. Here, we give examples for each step of the stars and explain costs and benefits that come along with it. 4/4/2016Ankur Biswas 77
  • 78. BY EXAMPLE … make your stuff available on the Web (whatever format) under an open license make it available as structured data (e.g., Excel instead of image scan of a table) make it available in a non-proprietary open format (e.g., CSV as well as of Excel) use URIs to denote things, so that people can point at your stuff link your data to other data to provide context 4/4/2016Ankur Biswas 78
  • 79. What are the costs & benefits of ★ Web data? • As a consumer … • You can look at it. • You can print it. • You can store it locally (on your hard drive or on an USB stick). • You can enter the data into any other system. • You can change the data as you wish. • You can share the data with anyone you like. • As a publisher … • It’s simple to publish. • You do not have explain repeatedly to others that they can use your data. 4/4/2016Ankur Biswas 79
  • 80. What are the costs & benefits of ★★ Web data? • As a consumer, you can do all what you can do with ★ Web data and additionally: • You can directly process it with proprietary software to aggregate it, perform calculations, visualize it, etc. • You can export it into another (structured) format. • As a publisher … • It’s still simple to publish. 4/4/2016Ankur Biswas 80
  • 81. What are the costs & benefits of ★★★ Web data? • As a consumer, you can do all what you can do with ★★ Web data and additionally: • You can manipulate the data in any way you like, without the need to own any proprietary software package. • As a publisher … • You might need converters or plug-ins to export the data from the proprietary format. • It’s still rather simple to publish. 4/4/2016Ankur Biswas 81
  • 82. What are the costs & benefits of ★★★★ Web data? • As a consumer, you can do all what you can do with ★★★ Web data and additionally: • You can link to it from any other place (on the Web or locally). • You can bookmark it. • You can reuse parts of the data. • You may be able to reuse existing tools and libraries, even if they only understand parts of the pattern the publisher used. • Understanding the structure of an RDF “Graph” of data can be more effort than tabular (Excel/CSV) or tree (XML/JSON) data. • You can combine the data safely with other data. URIs are a global scheme so if two things have the same URI then it’s intentional, and if so that’s well on it’s way to being 5-star data! • As a publisher … • You have fine-granular control over the data items and can optimize their access (load balancing, caching, etc.) • Other data publishers can now link into your data, promoting it to 5 star! • You typically invest some time slicing and dicing your data. • You’ll need to assign URIs to data items and think about how to represent the data. • You need to either find existing patterns to reuse or create your own. 4/4/2016Ankur Biswas 82
  • 83. What are the costs & benefits of ★★★★★ Web data? • As a consumer, you can do all what you can do with ★★★★ Web data and additionally: • You can discover more (related) data while consuming the data. You can directly learn about the data schema. • You now have to deal with broken data links, just like 404 errors in web pages. • Presenting data from an arbitrary link as fact is as risky as letting people include content from any website in your pages. Caution, trust and common sense are all still necessary. • As a publisher … • You make your data discoverable. • You increase the value of your data. • Your own organization will gain the same benefits from the links as the consumers. • You’ll need to invest resources to link your data to other data on the Web. • You may need to repair broken or incorrect links. 4/4/2016Ankur Biswas 83
  • 84. Applications • Data integration (e.g., see project Optique http://www.optique- project.eu/) • E-government (e.g., open data) • E-commerce • Tourism • Medicine • Biology • Earth Observation (see the work of my group in projects TELEIOS http://www.earthobservatory.eu/ and LEO http://www.linkedeodata.eu/ ). • … 4/4/2016Ankur Biswas 84
  • 85. References: • Books: • Antoniou, Grigoris, and F. Van Harmelet. "A semantic web premier." England: The MIT Press Cambridge (2004). • Segaran, Toby, Colin Evans, and Jamie Taylor. Programming the semantic web. " O'Reilly Media, Inc.", 2009. • Davies, John, Dieter Fensel, and Frank Van Harmelen. "Towards the semantic web." Ontology-Driven Knowledge Management. Chichester (2003). • Scientific Papers: • Maedche, Alexander. Ontology learning for the semantic web. Vol. 665. Springer Science & Business Media, 2012. • Schmachtenberg, Max, Christian Bizer, and Heiko Paulheim. "Adoption of the linked data best practices in different topical domains." The semantic web–ISWC 2014. Springer International Publishing, 2014. 245-260. • Video Lectures & Slides • Video lectures on Semantic Web by Dr. Harald Sack, Hasso Plattner Institute, University in Potsdam, Germany • www.cs.toronto.edu/~oktie/slides/web-of-data-intro.pdf • https://www.w3.org/2010/Talks/0622-SemTech-IH/ • Websites • http://dbpedia.org/snorql/ • http://5stardata.info/en/ 4/4/2016Ankur Biswas 85