Amit Sheth, "Semantic Interoperability and Information Brokering in Global Information Systems," Keynote given at IEEE Meta-Data, Bathesda, MD, April 6 1999.
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Ieee metadata-conf-1999-keynote-amit sheth
1. Bethesda, Maryland, April 6, 1999
Amit Sheth
Large Scale Distributed Information Systems Lab
University of Georgia
http://lsdis.cs.uga.edu
2. Three perspectives to GlobIS
autonomy
Information Integration Perspective
distribution
heterogeneity (terminological,
semantic
contextual)
Information Brokering Perspective meta-data
data
knowledge
information ―Vision‖ Perspective
connectivity computing data
3. Evolving targets and approaches in integrating
data and information (a personal perspective)
a society for ubiquitous exchange of (tradeable)
information in all digital forms of representation;
information anywhere, anytime, any forms
Generation III ADEPT,
DL-II projects
1997... InfoQuilt
Generation II InfoSleuth, KMed, DL-I projects
VisualHarness
Infoscopes, HERMES, SIMS,
1990s InfoHarness Garlic,TSIMMIS,Harvest, RUFUS,...
Generation I Mermaid Multibase, MRDSM, ADDS,
1980s DDTS IISS, Omnibase, ...
4. Generation I
•Data recognized as corporate resource — leverage it!
• Data predominantly in structured databases, different data models,
transitioning from network and hierarchical to relational DBMSs
• Heterogeneity (system, modeling and schematic) as well as need to
support autonomy posed main challenges;
major issues were data access and connectivity
• Information integration through Federated architecture
• Support for corporate IS applications as the primary objective,
update often required, data integrity important
5. Generation I
(heterogeneity in FDBMSs)
Database System
•Semantic Heterogeneity
•Differences in DBMS
• data models
(abstractions, constraints, query languages)
1980s • System level support
(concurrency control, commit, recovery)
C
Operating System
o
• file system m
• naming, file types, operation m
• transaction support u
• IPC n
1970s Hardware/System
i
c
• instruction set a
• data representation/coding t
• configuration i
o
n
6. Generation I
(Federated Database Systems: Schema Architecture)
External External
• Dimensions for
Schema Schema interoperability and
integration:
Federated
... distribution, autonomy
Schema
schema
and heterogeneity
integration
Export Export Export
... Schema
Schema Schema
•Model Heterogeneity:
Component ... Component Common/Canonical
Schema Schema Data Model
schema
translation
Schema Translation
Local ... Local
Schema Schema • Information sharing
while preserving
Component ... Component
autonomy
DBS DBS
7. Generation I
(characterization of schematic conflicts in multidatabase systems)
Schematic
Conflicts
Domain Definition Data Value Abstraction Level Schematic Entity Definition
Incompatibility Incompatibility Incompatibility Discrepancies Incompatibility
Naming Conflicts Known Generalization Data Value Naming
Inconsistency Conflicts Attribute Conflicts
Data Representation
Conflict Database
Conflicts Temporal Aggregation
Inconsistency Conflicts Entity Attribute Identifier
Data Scaling Conflicts
Conflict
Conflicts Acceptable
Inconsistency Data Value Schema
Data Precision Isomorphism
Entity Conflict
Conflicts Conflicts
Default Value Missing Data
Conflicts BUT
Items Conflicts
these techniques for dealing with schematic
Attribute Integrity Sheth & Kashyap, Kim & Seo
Constraint Conflicts heterogeneity do not directly map to dealing
with much larger variety of heterogeneous
media
8. Generation II
• Significant improvements in computing and connectivity (standardization
of protocol, public network, Internet/Web); remote data access as given;
• Increasing diversity in data formats, with focus on variety of textual data
and semi-structured documents
• Many more data sources, heterogeneous information sources,
but not necessarily better understanding of data
• Use of data beyond traditional business applications:
mining + warehousing, marketing, e-commerce
• Web search engines for keyword based querying against HTML pages;
attribute-based querying available in a few search systems
• Use of metadata for information access; early work on ontology support
distribution applied to metadata in some cases
• Mediator architecture for information management
9. Generation II
(limited types of metadata, extractors, mappers, wrappers)
Nexis Digital Videos
UPI
AP
... ...
Documents Data Stores
Global/Enterprise Digital Maps
Web Repositories
...
Digital Images Digital Audios
Find Marketing Manager positions in a
company that is within 15 miles of San
Francisco and whose stock price has
been growing at a rate of at least 25% EXTRACTORS
per year over the last three years
Junglee, SIGMOD Record, Dec. 1997 METADATA
10. Generation II
(a metadata classification: the informartion pyramid)
METADATA STANDARDS
User
General Purpose:
Ontologies
Dublin Core, MCF
Classifications
Move in this Domain Models Domain/industry specific:
direction to Geographic (FGDC, UDK, …),
Domain Specific Metadata
tackle Library (MARC,…)
area, population (Census),
information land-cover, relief (GIS),metadata
overload!! concept descriptions from ontologies
Domain Independent (structural) Metadata
(C++ class-subclass relationships, HTML/SGML
Document Type Definitions, C program structure...)
Direct Content Based Metadata
(inverted lists, document vectors, WAIS, Glimpse, LSI)
Content Dependent Metadata(size, max colors, rows, columns...)
Content Independent Metadata(creation-date, location, type-of-sensor...)
Data(Heterogeneous Types/Media)
12. What‘s next (after comprehensive use of metadata)?
Query processing and information requests
NOW
traditional queries based on keywords
attribute based queries
content-based queries
NEXT
‗high level‘ information requests involving
ontology-based, iconic, mixed-media, and
media-independent information rrequests
user selected ontology, use of profiles
13. GIS Data Representation – Example
multiple heterogeneous metadata models with different
tag names for the same data in the same GIS domain
Kansas State
FGDC Metadata Model UDK Metadata Model
Theme keywords: digital line graph, Search terms: digital line graph,
hydrography, transportation... hydrography, transportation...
Title: Dakota Aquifer Topic: Dakota Aquifer
Online linkage: Adress Id:
http://gisdasc.kgs.ukans.edu/dasc/ http://gisdasc.kgs.ukans.edu/dasc/
Direct Spatial Reference Method: Vector Measuring Techniques: Vector
Horizontal Coordinate System Definition: Co-ordinate System:
Universal Transverse Mercator Universal Transverse Mercator
… … … ... … … … ...
14. Generation III
• Increasing information overload and broader variety of information
content (video content, audio clips etc) with increasing amount of visual
information, scientific/engineering data
• Continued standardization related to Web for representational and metadata
issues (MCF, RDF, XML)
• Changes in Web architecture; distributed computing (CORBA, Java)
• Users demand simplicity, but complexities continue to rise
• Web is no longer just another information source, but decision support through
―data mining and information discovery, information fusion, information
dissemination, knowledge creation and management‖, ―information management
complemented by cooperation between the information system and humans‖
•Information Brokering Architecture proposed for information management
15. Information Brokering: An Enabler for the Infocosm
INFORMATION CONSUMERS arbitration between information
People consumers and providers for resolving
Corporations
Programs information impedance
Universities Government
Information Information Information
User User User Request Request Request
Query Query Query
INFORMATION/DATA
INFORMATION BROKERING
OVERLOAD
Information Data Information Information Data Information
System Repository System System Repository System
Newswires Corporations dynamic reinterpretation of information
requests for determination of relevant
Universities Research Labs
information services and products
INFORMATION PROVIDERS —
dynamic creation and composition of
information products
16. Information Brokering: Three Dimensions
THREE DIMENSIONS
C O N S U M E R S
B R O K E R S
VOCABULARY
M E T A D A T A
P R O V I D E R S
S E M A N T I C S
D A T A
S T R U C T U R E
S Y N T A X
S Y S T E M
Objective:
Reduce the problem of knowing structure and semantics of data in the huge
number of information sources on a global scale to: understanding and
navigating a significantly smaller number of domain ontologies
17. What else can Information Brokering do?
W W W + Information Brokering
WWW
Domain Specific Ontologies as
a confusing heterogeneity of media,
“semantic (Tower of Babel)
formats conceptual views”
information correlation usingusing concept
Information correlation physical (HREF)
mappings at the extensional data level level
links at the intensional concept
Browsing of information using information
location dependent browsing of terminological
using physical (HREF) links
relationships across ontologies
user has to keep track of information content !!
Higher level of abstraction, closer
to user view of information !!
18. Concepts, tools and techniques to support semantics
context semantic
proximity inter-ontological
relations
media-independent
information correlations
ontologies
(esp. domain-specific) profiles
domain-specific metadata
19. Tools to support semantics
• Context, context, context
• Media-independent information correlations
• Multiple ontologies
– Semantic Proximity (relationships between concepts within
and across ontologies) using domain, context,
modeling/abstraction/representation, state
– Characterizing Loss of Information incurred due to
differences in vocabulary
BIG challenge:identifying relationship or
similarity between objects of different media,
developed and managed by different persons and systems
20. Heterogeneity... … is a Babel Tower!!
SEMANTIC HETEROGENEITY
metadata
ontologies
contexts
SEMANTIC INTEROPERABILITY
21. The InfoQuilt Project
THE INFOQUILT VISION
Semantic interoperability between systems, sharing knowledge
using multiple ontologies
Logical correlation of information
Media independent information processing
REALIZATION OF THE VISION
fully distributed, adaptable, agent-based system
information/knowledgement supported by collaborative
processes
http://lsdis.cs.uga.edu/proj/iq/iq.html
22. InfoQuilt Project: using the Metadata REFerence link
MREF
Complements HREF, creating a ―logical web‖ through media
independent ontology & metadata based correlation
It is a description of the information asset we want to retrieve
Semantic Correlation using MREF MREF Concept
constraints
relations
attributes Model for logical
correlation using
domain ontologies ontological terms MREF
IQ_Asset ontology + and metadata
extension ontologies
Framework for RDF
representing MREF‘s
MREF
Serialization
(one implementation XML
keywords content attributes choice)
(color, scene cuts, …)
http://lsdis.cs.uga.edu/proj/iq/iq.html
23. Domain Specific Correlation – example
Potential locations for a future shopping mall identified by allregionshaving
apopulationgreater than 5000, andareagreater than 50 sq. ft. having an urban
land cover and moderaterelief<A MREF ATTRIBUTES(population > 5000; area > 50;
region-type = ‘block’; land-cover = ‘urban’; relief = ‘moderate’) can be viewed here</A>
domain specific metadata: terms chosen from domain specific ontologies
Population:
Area:
=>media-independent
relationshipsbetween domain
Boundaries:
specific metadata:population,
Regions Land cover: area, land cover, relief
(SQL): Image Features
Relief: (image processing
routines) =>correlation between image
Boundaries and structured data at a
higher domain specific level
asopposed to physical ―link-
chasing‖ in the WWW
Census DB TIGER/Line DB US Geological Survey
25. A DL II approach for Information Brokering
Iscape 1 Iscape N
CONSTRUCTING APPROPRIATE INFORMATION LANDSCAPES
CONSTRUCTING ADDITIONAL
META-INFORMATION RESOURCES
DISCOVERING COLLECTIONS OF
HETEROGENEOUS INFORMATION AND
META-INFORMATION RESOURCES
Domain
Specific Domain
Ontologies Independent
Images Data Stores Documents Digital Media
Ontologies
Physical/Simulation
World
26. ADEPT Information Landscape Concept Prototype
(a scenario for Digital Earth:
learning in the context of the “El Niño” phenomenon)
Sample Iscapes Requests:
–How does El Niño affect sea animals? Look for
broadcast videos of less than 2 minutes.
– How are some regions affected by El Niño? Look at
request information using
East/West Pacific regions.
keywords
– What disasters have been related to El Niño?
domain-specific attributes
– What storm occurrencesattributes
domain-independent are attributed to El Niño?
– Show reports related to El Niño that contain Clinton.
TRY ISCAPE CONCEPT DEMO
27. Putting MREFs to work
IQ_Asset ontology +
extension ontologies
domain ontologies
MREF Builder
MREF
User construct new MREF repository
MREF
repository
User
Agent
User Profile Broker Agent
profiles Manager
28. Context: the lynchpin of semantics
Cricket
―For instance, if you were to use Yahoo! or Infoseek to
search the web for pizza, your results would probably
be hundreds of matches for the word pizza. Many of
these could be pizza parlors around the world. Yet if
you run the same search within NeighborNet, you will
allows you to order pizza to be delivered instead of
shipped.‖
From a Press Resease of FutureOne, Inc. March 24, 1999
http://home.futureone.com/about/pr/021699.asp
29. Constructing c-contexts from ontological terms
C-CONTEXT:
―All documents stored in the database
have been published by some agency‖
DATABASE
OBJECTS => Cdef(DOC) = <(hasOrganization, AgencyConcept)>
AGENCY(RegNo, Name, Affiliation) C-Context = <(C1 , V1) (C2 , V2) ... (Ck , Vk) >
DOC(Id, Title, Agency) a collection of
contextual coordinatesCi s(roles) and
valuesVi s(concepts/concept descriptions)
Agency
Concept Advantages:
Document
Concept Use of ontologies for an intensional
domain specific description of data
Representation of extra information
Relationships between objects not
ONTOLOGICAL TERMS represented in the database schema
Using terminological relationships in
the ontology
30. Using c-contexts to reason about
EXAMPLE
information in database
Cdef(DOC) CQ
<(hasOrganization, AgencyConcept)> <(hasOrganization,{―USGS‖})>
glb(Cdef(DOC), CQ)
<(self, DocumentConcept),(hasOrganization, { ―USGS‖ })>
- Reasoning with c-contexts: glb(Cdef(DOC), CQ)
- Ontological Inferences:
- DocumentConcept
- (hasOrganization, { ―USGS‖ })
Challenge 1: use of multiple ontologies
Challenge 2: estimating the loss of information
31. Estimating information loss for multi-ontology based
query processing in the OBSERVER/InfoQuilt system
OBSERVER architecture
Data Repositories
IRM
Ontology
Server Mappings
Ontologies
Interontologies
Terminological Query User
Relationships Processor Query
IRM NODE USER NODE
COMPONENT NODE COMPONENT NODE
Ontology Ontology
Server Server
Mappings Mappings
Query Ontologies Query Ontologies
Processor Processor
Data Repositories Data Repositories
Eduardo Mena (III’98)
32. Estimating information loss for multi-ontology based
query processing in the OBSERVER/InfoQuilt system
Query construction - Example
“Get title and number of pages of books written by Carl Sagan”
User ontology: WN
[name pages] for
(AND book (FILLS creator “Carl Sagan”))
Target ontology: Stanford-I
Integrated ontology WN-Stanford-I
[title number-of-pages] for
(AND book (FILLS doc-author-name “Carl Sagan”))
Ontologies sites: http://www.cogsci.princeton.edu/~wn/w3wn.html
http://www-ksl.stanford.edu/knowledge-sharing/ontologies/html/bibliographic-data/
Eduardo Mena (III’98)
33. Estimating information loss for multi-ontology based
query processing in the OBSERVER/InfoQuilt system
Query construction - Example Re-use of Knowledge:
Biblio-Thing Bibliography Data Ontology
Stanford-I
“Get title and number of pages of books written by Carl Sagan”
Document Conference Agent
User ontology: WN
Person Organization
[name pages] for Author
Book Technical-Report
(AND book (FILLS creator “Carl Sagan”))
Publisher University
Miscellaneous-Publication
Proceedings
Target ontology: Stanford-I
Edited-Book
Thesis
Integrated ontology WN-Stanford-I
Periodical-Publication Technical-Manual
Cartographic-Map
[title number-of-pages] for
Doctoral-Thesis Computer-Program
Multimedia-Document
Journal Newspaper
(AND book (FILLS doc-author-name “Carl Sagan”))
Master-Thesis Artwork
Magazine
Ontologies sites: http://www.cogsci.princeton.edu/~wn/w3wn.html
http://www-ksl.stanford.edu/knowledge-sharing/ontologies/html/bibliographic-data/
Eduardo Mena (III’98)
34. Estimating information loss for multi-ontology based
query processing in the OBSERVER/InfoQuilt system
Re-use of Knowledge:
Query construction - Example
Print-Media A subset of WordNet 1.5
“Get title and number of pages of books written by Carl Journalism
Press Publication
Sagan”
User
Newspaper ontology: WN
Magazine Periodical
Book
[name pages] for Journals
Pictorial
Series
Trade-Book Brochure (AND book (FILLS creator “Carl Sagan”))
TextBook
SongBook
Reference-Book PrayerBook
Target ontology: Stanford-I
CookBook Encyclopedia
Integrated ontology WN-Stanford-I
WordBook
Instruction-Book HandBook Directory Annual
[title number-of-pages] for
GuideBook
(AND book (FILLS doc-author-name “Carl Sagan”))
Manual Bible
Ontologies sites: http://www.cogsci.princeton.edu/~wn/w3wn.html
Instructions Reference-Manual
http://www-ksl.stanford.edu/knowledge-sharing/ontologies/html/bibliographic-data/
Eduardo Mena (III’98)
35. Estimating information loss for multi-ontology based
query processing in the OBSERVER/InfoQuilt system
WN ontology and user query
Query construction - Example
“Get title and number of pages of books written by Carl Sagan”
User ontology: WN
[name pages] for
(AND book (FILLS creator “Carl Sagan”))
Target ontology: Stanford-I
Integrated ontology WN-Stanford-I
[title number-of-pages] for
(AND book (FILLS doc-author-name “Carl Sagan”))
Ontologies sites: http://www.cogsci.princeton.edu/~wn/w3wn.html
http://www-ksl.stanford.edu/knowledge-sharing/ontologies/html/bibliographic-data/
Eduardo Mena (III’98)
36. Estimating information loss for multi-ontology based
query processing in the OBSERVER/InfoQuilt system
Estimating the loss of information
To choose the plan with the least loss
To present a level of confidence in the answer
Based on intensional information (terminological difference)
Based on extensional information (precision and recall)
Plans in the example
User Query: (AND book
(FILLS doc-author-name “Carl Sagan”))
Plan 1: (ANDdocument(FILLS doc-author-name “Carl Sagan”))
Plan 2: (ANDperiodical-publication (FILLS doc-author-name “Carl Sagan”))
Plan 3: (ANDjournal(FILLS doc-author-name “Carl Sagan”))
Plan 4: (ANDUNION(book, proceedings, thesis, misc-publication, technical-report)
(FILLS doc-author-name “Carl Sagan”))
Eduardo Mena (III’98)
37. Estimating information loss for multi-ontology based
query processing in the OBSERVER/InfoQuilt system
Loss of information based on intensional information
User Query: (AND book (FILLS doc-author-name “Carl Sagan”))
Plan 1:
(ANDdocument (FILLS doc-author-name “Carl Sagan”))
book:=(AND publication (AT-LEAST 1 ISBN))
publication:=(AND document (AT-LEAST 1 place-of-publication))
Loss:“Instead of books written by Carl Sagan, OBSERVER is
providing all the documents written by Carl Sagan (even if they
do not have an ISBN and place of publication)”
Eduardo Mena (III’98)
38. Estimating information loss for multi-ontology based
query processing in the OBSERVER/InfoQuilt system
Example: loss for the plans
Plan 1:(AND document (FILLS doc-author-name “Carl Sagan”)) [case 2]
91.57% < (1-Loss) < 91.75%
Plan 2: (AND periodical-publication (FILLS doc-author-name “Carl Sagan”))
94.03% < (1-Loss) < 100%[case 3]
Plan 3: (AND journal (FILLS doc-author-name “Carl Sagan”)) [case 3]
98.56% < (1-Loss) < 100%
Plan 4: (AND UNION(book, proceedings, thesis, misc-publication, technical-
report) (FILLS doc-author-name “Carl Sagan”)) [case 1]
0% < (1-Loss) < 7.22%
Eduardo Mena (III’98)
39. Summary
Knowledge Mgmt.,
Visual, Information
Knowledge Semantic
Scientific/Eng. Brokering,
Cooperative IS
Structural, Mediator,
Semi-structured Metadata
Schematic Federated IS
Text Syntax,
Data Federated DB
Structured Databases System
40. Agenda for research
Interoperation not at systems level, but at informational and
possibly knowledge level
– traditional database and information retrieval solutions
do not suffice
– need to understand context; measures of similarities
Need to increase impetus on semantic level issues involving
terminological and contextual differences, possible perceptual
or cognitive differences in future
– information systems and humans need to cooperate,
possible involving a coordination and collaborative
processes
41. Related Reading
Books:
Information Brokering for Digital Media, Kashyap and Sheth, Kluwer,
1999 (to appear)
Multimedia Data Management: Using Metadata to Integrate and Apply
Digital Media, Sheth and Klas Eds, McGraw-Hill, 1998
Cooperative Information Systems, Papazoglou and Schlageter Eds.,
Academic Press, 1998
Management of Heterogeneous and Autonomous Database Systems,
Elmagarmid, Rusinkiewica, Sheth Eds, Morgan Kaufmann, 1998.
Special Issues and Proceedings:
Formal Ontologies in Information Systems, Guarino Ed., IOS Press, 1998
Semantic Interoperability in Global Information Systems, Ouksel and
Sheth, SIGMOD Record, March 1999.
http://lsdis.cs.uga.edu Acknowledgements:
[See publications on Metadata, Semantics,Context, Tarcisio Lima
InfoHarness/InfoQuilt] Vipul Kashyap
amit@cs.uga.edu