The document provides an overview of semantic technologies and discusses their increasing mainstream adoption. It notes that Microsoft purchased Powerset in 2008, Apple purchased Siri in 2010, and Google bought Metaweb and released semantic search in 2013. It discusses how semantic technologies allow for interoperability through shared representations and reasoning. Examples are given of early semantic search applications from 1999-2002 and an operational semantic electronic medical record application deployed in 2006.
1. 1
Semantic Web: intro & overview
A conversation with students – 1 Sept 2015
Amit Sheth http://knoesis.org/amit
Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing
Wright State University, Dayton, OH, USA
2. What are the most important
recent software/Internet
success stories?
5. Semantic technologies in the mainstream
• Microsoft purchased Powerset in 2008
• Apple purchased Siri [Apr 2010]
– “Once Again The Back Story Is About Semantic Web”
• Google buys Metaweb [June 2010]...” Google Snaps
Up Metaweb in Semantic Web Play” and releases Semantic search in 2013
– Now see: “Google Knowledge Graph Could Change Search Forever”
• Facebook OpenGraph, Twitter annotation
…”another example of semantic web going mainstream”
“Google, Twitter and Facebook build the semantic web”
5
6. • RDFa adoption ….Search engines (esp Bing)
started using domain models and (all) use of
background knowledge/structured databases
with large entity bases (these are part of
Knowledge Graph and equivalent)
• Bing, Yahoo! and Google are using schema.org
in a big way
7. A bit of history
• Semantics with metadata and ontologies for heterogeneous
documents and multiple repositories of data including the
Web was discussed in 1990s (semantic information brokering,
faceted search, InfoHarness, SIMS, Ariadne, OBSERVER, SHOE,
MREF, InfoQuilt, …). Also DAML and OIL.
• Tim Berners-Lee used “Semantic Web” in his 1999 book
• I had founded a company Taalee in 1999, gave a keynote on
Semantic Web & commercialization in 2000 and filed for a
patent in 2000 (awarded 2001).
• Well known TBL, Hendler, Lassila paper in Scientific American
took AI-ish approach (agents,…) to Semantic Web
• First 5 years saw too much of AI/DL, but more
practical/applied work has dominated recently
8. Different foci
• TBL – focus on data: Data Web (“In a way, the Semantic
Web is a bit like having all the databases out there as one
big database.”)
• Others focus on reasoning and intelligent processing
• But the biggest current use seems to be about Search:
– 15 years of Semantic Search and Ontology-enabled Semantic
Applications
10. 1
• Ontology: Agreement with a common
vocabulary/nomenclature, conceptual models
and domain Knowledge
• Schema + Knowledge base
• Agreement is what enables interoperability
• Formal description - Machine processability is
what leads to automation
11. 2
• Semantic Annotation (Metadata Extraction):
Associating meaning with data, or labeling
data so it is more meaningful to the system
and people.
• Can be manual, semi-automatic (automatic
with human verification), automatic.
12. From Syntax to Semantics
Shallow semantics
Deep semantics
Changing Focus on Interoperability in Information Systems: From System, Syntax, Structure to Semantics
13. SSN
Ontology
2 Interpreted data
(deductive)
[in OWL]
e.g., threshold
1 Annotated Data
[in RDF]
e.g., label
0 Raw Data
[in TEXT]
e.g., number
3 Interpreted data
(abductive)
[in OWL]
e.g., diagnosis
Intellego
“150”
Systolic blood pressure of 150 mmHg
Elevated
Blood
Pressure
Hyperthyroidism
……
13
Levels of Abstraction
15. Semantic Web Stack
• Web of Linked Data
• Introduced by Berners Lee
et. al as next step for
Web of Documents
• Allow “machine
understanding” of data,
• Create “common” models
of domains using formal
language - ontologies
Layer cake image source: http://www.w3.org; see W3C SW publications
Semantic Web Layer Cake
16. Characteristics of Semantic Web
16
Self
Describing
Machine &
Human
Readable
Issued by
a Trusted
Authority
Easy to
Understand
Convertible
Can be
Secured
The Semantic Web:
XML, RDF & Ontology
Adapted from William Ruh (CISCO)
17. • Resource Description Framework – Recommended by
W3C for metadata modeling [RDF]
• A standard common modeling framework – usable by
humans and machine understandable
Resource Description Framework
IBM
Armonk, New York,
United States
Zurich, Switzerland
Location
Company
RDF/OWL slides From: Semantic Web in Health Informatics (thanks: Satya)
18. • RDF Triple
o Subject: The resource that the triple is about
o Predicate: The property of the subject that is described by the triple
o Object: The value of the property
• Web Addressable Resource: Uniform Resource Locator (URL),
Uniform Resource Identifier (URI), Internationalized Resource Identifier (IRI)
• Qualified Namespace: http://www.w3.org/2001/XMLSchema# as
xsd:
o xsd: string instead of
http://www.w3.org/2001/XMLSchema#string
RDF: Triple Structure, IRI, Namespace
IBM Armonk, New York,
United States
Headquarters located in
19. • Two types of property values in a triple
o Web resource
o Typed literal
RDF Representation
IBM Armonk, New York,
United States
Headquarters located in
IBM
Has total employees
“430,000” ^^xsd:integer
• The graph model of RDF: node-arc-node is the
primary representation model
• Secondary notations: Triple notation
o companyExample:IBM companyExample:has-Total-
Employee “430,000”^^xsd:integer .
20. • RDF Schema: Vocabulary for describing groups of
resources [RDFS]
RDF Schema
IBM Armonk, New
York, United States
Headquarters located in
Oracle Redwood Shores,
California, United States
Headquarters located in
Company Geographical Location
Headquarters located in
21. • Property domain (rdfs:domain) and range
(rdfs:range)
RDF Schema
Headquarters located in
Company
Domain Range
Geographical Location
• Class Hierarchy/Taxonomy: rdfs:subClassOf
rdfs:subClassOf
Computer Technology
Company
SubClass (Parent) Class
Company
Banking Company
Insurance Company
22. Ontology: A Working Definition
• Ontologies are shared conceptualizations of a
domain represented in a formal language*
• Ontologies:
o Common representation model - facilitate
interoperability, integration across different
projects, and enforce consistent use of
terminology
o Closely reflect domain-specific details (domain
semantics) essential to answer end user
o Support reasoning to discover implicit knowledge
* Paraphrased from Gruber, 1993
23. Expressiveness Range:Knowledge Representation
and Ontologies
Catalog/ID
General
Logical
constraints
Terms/
glossary
Thesauri
“narrower
term”
relation
Formal
is-a
Frames
(properties)
Informal
is-a
Formal
instance
Value
Restriction
Disjointness,
Inverse,
part of…
Ontology Dimensions After McGuinness and Finin
Simple
Taxonomies
Expressive
Ontologies
Wordnet
CYCRDF DAML
OO
DB Schema RDFS
IEEE SUOOWL
UMLS
GO
KEGG TAMBIS
EcoCyc
BioPAX
GlycOSWETO
Pharma
24. • A language for modeling ontologies [OWL]
• OWL2 is declarative
• An OWL2 ontology (schema) consists of:
o Entities: Company, Person
o Axioms: Company employs Person
o Expressions: A Person Employed by a Company =
CompanyEmployee
• Reasoning: Draw a conclusion given certain
constraints are satisfied
o RDF(S) Entailment
o OWL2 Entailment
OWL2 Web Ontology Language
25. • Class Disjointness: Instance of class A cannot be
instance of class B
• Complex Classes: Combining multiple classes
with set theory operators:
o Union: Parent = ObjectUnionOf (:Mother :Father)
o Logical negation: UnemployedPerson =
ObjectIntersectionOf (:EmployedPerson)
o Intersection: Mother = ObjectIntersectionOf (:Parent
:Woman)
OWL2 Constructs
26. • Property restrictions: defined over property
• Existential Quantification:
o Parent = ObjectSomeValuesFrom (:hasChild :Person)
o To capture incomplete knowledge
• Universal Quantification:
o US President = objectAllValuesFrom (:hasBirthPlace
United States)
• Cardinality Restriction
OWL2 Constructs
27. SPARQL: Querying Semantic Web Data
• A SPARQL query pattern composed of triples
• Triples correspond to RDF triple structure, but
have variable at:
o Subject: ?company ex:hasHeadquaterLocation ex:NewYork.
o Predicate: ex:IBM ?whatislocatedin ex:NewYork.
o Object: ex:IBM ex:hasHeadquaterLocation
?location.
• Result of SPARQL query is list of values – values
can replace variable in query pattern
28. SPARQL: Query Patterns
• An example query pattern
PREFIX ex:<http://www.eecs600.case.edu/>
SELECT ?company ?location WHERE
{?company ex:hasHeadquaterLocation ?location.}
• Query Result
company location
IBM NewYork
Oracle RedwoodCity
MicorosoftCorporation Bellevue
Multiple
Matches
29. SPARQL: Query Forms
• SELECT: Returns the values bound to the variables
• CONSTRUCT: Returns an RDF graph
• DESCRIBE: Returns a description (RDF graph) of a
resource (e.g. IBM)
o The contents of RDF graph is determined by SPARQL
query processor
• ASK: Returns a Boolean
o True
o False
40. Sample applications
• Early Semantic Search, use baby steps of
today’s engines
• Enterprise applications – healthcare & life
sciences, financial, security
• Driving the innovation with new types of data:
sensor (Semantic Sensor Web), social
(Semantic Social Web), semantic IoT/WoT
41. BLENDED BROWSING & QUERYING INTERFACE
ATTRIBUTE & KEYWORD
QUERYING
uniform view of worldwide
distributed assets of similar type
SEMANTIC BROWSING
Targeted e-shopping/e-commerce
assets access
Taalee Semantic/Faceted Search & Browsing (1999-2001)
42. Search for company
‘Commerce One’
Links to news on companies that
compete against Commerce One
Links to news on companies Commerce
One competes against
(To view news on Ariba, click on the link
for Ariba)
Crucial news on Commerce
One’s competitors (Ariba) can
be accessed easily and
automatically
Semantic Search/Browsing/Directory (2001-….)
43. System recognizes ENTITY & CATEGORY
Relevant portion
of the Directory is
automatically
presented.
Semantic Search/Browsing/Directory (2001-….)
46. Semagix Freedom for building
ontology-driven information system
Extracting Semantic Metadata from
Semistructured and Structured Sources (1999 – 2002)
Managing Semantic Content on the Web
48. 2004 SEMAGIX
48
Watch list Organization
Company
Hamas
WorldCom
FBI Watchlist
Ahmed Yaseer
appears on Watchlist
member of organization
works for Company
Ahmed Yaseer:
• Appears on Watchlist
‘FBI’
• Works for Company
‘WorldCom’
• Member of a banned
organization’
Semantic Associations - Connecting the Dots
49. Global Investment Bank
Fraud Prevention application used in
financial services – Related KYC
application is deployed at Majority
of Global Banks
User will be able to navigate
the ontology using a number
of different interfaces
World Wide
Web content
Public
Records
BLOGS,
RSS
Un-structure text, Semi-structured Data
Watch Lists
Law
Enforcement Regulators
Semi-structured Government Data
Scores the entity
based on the
content and entity
relationships
Establishing
New Account
51. Semantic Web + Clinical Practice Informatics =
Active Semantic Electronic Medical Record (ASEMR)
Operationally deployed in January 2006, in use (as of 2012)
52. ASEMR: SW application in use
In daily use at Athens Heart Center
– 28 person staff
• Interventional Cardiologists
• Electrophysiology Cardiologists
– Deployed since January 2006
– 40-60 patients seen daily
– 3000+ active patients
– Serves a population of 250,000 people
53. Information Overload in Clinical
Practice
• New drugs added to market
– Adds interactions with current drugs
– Changes possible procedures to treat an illness
• Insurance Coverage's Change
– Insurance may pay for drug X but not drug Y even
though drug X and Y are equivalent
– Patient may need a certain diagnosis before some
expensive test are run
• Physicians need a system to keep track of ever
changing landscape
54. Active Semantic Document (ASD)
A document (typically in XML) with the following features:
• Semantic annotations
– Linking entities found in a document to ontology
– Linking terms to a specialized lexicon [TR]
• Actionable information
– Rules over semantic annotations
– Violated rules can modify the appearance of the document (Show an
alert)
55. Active Semantic Patient Record
• An application of ASD
• Three Ontologies
– Practice
Information about practice such as patient/physician data
– Drug
Information about drugs, interaction, formularies, etc.
– ICD/CPT
Describes the relationships between CPT and ICD codes
• Medical Records in XML created from database
56. Active Semantic Electronic Medical Record App
In Use Today at Athens Heart Center For Clinical Decision Support since January 2006
Amit P. Sheth, S. Agrawal,Jonathan Lathem, Nicole Oldham, H. Wingate, P. Yadav, and K. Gallagher, Active Semantic
Electronic Medical Record, Proc. of the 5th International Semantic Web Conference, 2006
57. Demo of ASEMR and other
applications
http://knoesis.org/showcase
http://archive.knoesis.org/library/demos/
58. Benefits of ASEMR
• Error prevention (drug interactions, allergy)
– Patient care
– insurance
• Decision Support (formulary, billing)
– Patient satisfaction
– Reimbursement
• Efficiency/time
– Real-time chart completion
– “semantic” and automated linking with billing
59. Using large data sets for Structured
Data on the web:
Linked Open Data – samples from
2005 to 2010
60. Linked Open Data
Publish Open Data Sets in RDF
By 2010, 203 data data sets
25 billion Triples
Image: http://richard.cyganiak.de/2007/10/lod/
61. You publish the raw data…
Semantic Web Adoption and Application
62. … and others can use it
Semantic Web Adoption and Application
63. Using the LOD to build Web site: BBC
Semantic Web Adoption and Application
64. Using the LOD to build Web site:
BBC
Semantic Web Adoption and Application
70. Twitris: Semantic Social Web Mash-up
Select topicSelect date
Topic tree
Spatial Marker
N-gram summaries
Wikipedia articles
Reference newsRelated tweets
Images & Videos
Tweet traffic
Sentiment
Analysis
TWITRIS
71. Web (and associated computing) is
evolving
Web of pages
- text, manually created links
- extensive navigation
2007
1997
Web of databases
- dynamically generated pages
- web query interfaces
Web of resources
- data, service, data, mashups
- 4 billion mobile computing
Web of people, Sensor Web
- social networks, user-created casual content
- 40 billion sensors, 500M+ FB users, 1B tweets/wk
Web as an oracle / assistant / partner
- “ask the Web”: using semantics to leverage text
+ data + services
- Powerset
Computing for Human Experience
Keywords
Patterns
Objects
Situations,
Events
Enhanced Experience,
Tech assimilated in life
72. Structured text
(Scientific
publications /
white papers)
Experimental
Results Clinical Trial Data
Public domain
knowledge
(PubMed)
Metadata Extraction/Semantic Annotations
Ontologies/Dom
ain Models/
Knowledge
Meta data /
Semantic
Annotations
Semantic Search/
Browsing/Personalization/
Analysis, Knowledge
Discovery,
Visualization,
Situational Awareness
Big data
Search and
browsing
Patterns / Inference / Reasoning
2D-3D & Immersive
Visualization, Human
Computer Interfaces
Impacting
bottom line
Knowledge
discovery
Migraine
Stress
Patient
affects
isa
Magnesium
Calcium Channel
Blockers
inhibit
SEMANTICS, MEANING PROCESSING
72
74. Take Home Message (Cont.)
Semantics play a key role in refering
"meaning" behind the data. Requires
progress from keywords -> entities ->
relationships -> events, from raw data to
human-centric abstractions.
75. Take Home Message (Cont.)
Wide variety of semantic models and KBs
(vocabularies, social dictionaries, community created semi-structured
knowledge, domain-specific datasets, ontologies) empower
semantic solutions. This can lead to Semantic
Scalability – scalability that is meaningful to
human activities and decision making.
76. Interested in more?
Kno.e.sis Wiki for the following and more:
• Computing for Human Experience
• Continuous Semantics to Analyze Real-Time Data
• Semantic Modeling for Cloud Computing
• Citizen Sensing, Social Signals, and Enriching Human Experience
• Semantics-Empowered Social Computing
• Semantic Sensor Web
• Traveling the Semantic Web through Space, Theme and Time
• Relationship Web: Blazing Semantic Trails between Web Resources
• SA-REST: Semantically Interoperable and Easier-to-Use Services and Mashups
• Semantically Annotating a Web Service
Tutorials: Semantic Web:Technologies and Applications for the Real-World (WWW2007)
Citizen Sensor Data Mining, Social Media Analytics and Development Centric Web Applications (WWW2011)
Partial Funding: NSF (Semantic Discovery: IIS: 071441, Spatio Temporal Thematic: IIS-0842129), AFRL and
DAGSI (Semantic Sensor Web), Microsoft Research (Semantic Search) and IBM Research (Analysis of Social
Media Content),and HP Researh (Knowledge Extraction from Community-Generated Content).
77. 77
http://knoesis.org
Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing
Wright State University, Dayton, Ohio, USA
Vision Paper: Computing for Human
Experience:http://wiki.knoesis.org/index.php/Computing_For_Human_Experience
Future: Computing for Human Experience
Notes de l'éditeur
RDF: Triple structure
Review types of heterogeneity. Why we need to reconcile data heterogeneityUniform Resource Locator: A network location and used as an identifier for resources on the Web. URL is a specific type of URI. URI can be used to refer to anythingIRI: In addition to ASCII character set, contains Universal Character Set (from RFC 3987)
RDF uses XML Schema datatypes
Allows creation of an abstract representation of domain
Allows creation of an abstract representation of domain
Review types of heterogeneity. Why we need to reconcile data heterogeneity
Review types of heterogeneity. Why we need to reconcile data heterogeneity
Review types of heterogeneity. Why we need to reconcile data heterogeneity
Taalee (subsequently Voquette and Semagix) was founded in 1999 as an Audio/Video Web Search Company (focus on A/V mainly for scalability and market focus reasons, servicename: MediaAnywhere). Domain models/ontologies were created in major areas (many more than what you can find on Bing in 2011) and automatically populated to build knowledge bases (populated ontologies or WorldModel) from a variety of structured and semistructured sources, and periodically kept up to date. This was than used for semantic annotation/metadata extraction to drive semantic search, browsing, etc applications over data crawled from Web sites.
The important thing is that the system knew that Robert Duval is a movie actor, is a different person that David Duval who is a golfer and a sportsperson, and had understanding of a variety of relationships Robert Duval participates in – such as
Obtained from Ivan’s slide
Obtained from Ivan’s slide
Obtained from Ivan’s slide
Obtained from Ivan’s slide
Obtained from Ivan’s slide
Let me give a technological introduction to what our center is about: we all face a fire hose of data-- Pubmed adds 2000 to 4000 citations per day, it is usual to add about 5 gig from a single run of a scientific experiment -- and just imagine how much data created by all the cameras and 40 billion mobile sensors in the world! But even with all the search and browsing tools we have, we face huge information glut. How do we make sense from the data? Just as humans apply their knowledge and experience to understand what they see– we apply domain model or knowledge to attach meaningful labels to these data. Then we can apply computational techniques to visualize, provide situational awareness, discovery nuggets of knowledge of information and insight. For example, from all that biomedical data, what a scientist may be looking for is– how can we treat Migraine? What has Magnesium to do with Migraine? Why does Magnesium deficiency cause Migraine? What is the process by which Magnesium affects Migraine?
Kno.e.sis has 15 faculty in Computer Science, life sciences and health care, cognitive science and business. It has about 50 PhD students and post docs– about 2/3 of these in Computer Science. Its faculty members have 40 labs, and occupies a majority of 50K sqft Joshi Research Center. Its students are highly successful– eg tenure track faculty @ Case Western Reserve Univ or Researcher at IBM Almaden. It has received recent funding from funding from Microsoft Research. IBM Research, HP Labs, Google, and small companies (Janya, EZdi,…) and collaborates with many more (Yahoo! Labs, NLM, …).