Yang Yu is proposing research on improving machine learning based ontology mapping by automatically obtaining training samples from the web. The proposed system would parse two input ontologies to generate queries to search engines and collect documents to use as samples for each ontology class. These samples would then be used to train text classifiers, which would produce probabilistic mappings between classes in the two ontologies. The results would be evaluated by comparing to mappings from human experts. Current work involves exploring alternative text classification tools and ways to utilize the probabilistic mapping values generated by the classifiers.
1. YANG YU (yangyu1, UMBC)
A research on how to improve
machine learning based
ontology mapping
Is Apple the Same as Orange?
To: nicholas@csee.umbc.edu
Subject: Yu
2. YANG YU (yangyu1, UMBC)
Presentation Overview
Semantic Web
Ontology
Ontology Mapping
Motivation
Methods (Machine Learning, Text Classification)
Problem
My Proposed Research
Evaluation
Current Results
Future Work
Comments & Questions
May mistaken something
EMAIL: yangyu1, UMBC
3. YANG YU (yangyu1, UMBC)
The Semantic Web
“in general, computers have no reliable way to process the
semantics”
Some achievements by complicated algorithm (search engine)
Apple and orange: Apple is a kind of fruit ?Is there anther way?
Knowledge Base, Databases, standalone(?) structured
information
HTML-Web, information not encoded, post-process
Database, information encoded, pre-process
Tim Berners-Lee, James Hendler, and Ora Lassila , 2001, the
Semantic Web, Scientific American
"The Semantic Web is an extension of the current web in which
information is given well-defined meaning, better enabling
computers and people to work in cooperation."
4. YANG YU (yangyu1, UMBC)
RDF -- well-defined meaning
“uses URIs to encode information”,
“the URIs ensure that concepts are not just
words in a document but are tied to a unique
definition that everyone can find on the Web”.
(quoted from The Semantic Web)
Example:
http://www.amk.ca/talks/2003-03/
6. YANG YU (yangyu1, UMBC)
RDF Example
Description of
the Author
Even the Author’s
Name is Apple, X
well-defined meaning
7. YANG YU (yangyu1, UMBC)
Ontology
What it is?
“Short answer: an ontology is a specification of a
conceptualization”
http://www-ksl.stanford.edu/kst/what-is-an-ontology.html
“The most typical kind of ontology for the Web has a taxonomy and
a set of inference rules”
From The Semantic Web
RDF, RDF-S, OWL (www.w3c.org)
A sample ontology
Wine Ontology
http://www.w3.org/TR/owl-guide/wine.rdf
How to use?
More sophisticated computing services will be based on Ontology
8. YANG YU (yangyu1, UMBC)
Some Large Ontologies
OpenCyc (www.opencyc.org)
the world's largest and most complete general knowledge base and commonsense
reasoning engine.
47,000 concepts: an upper ontology whose domain is all of human consensus reality,
interrelated and constrained by 306,000 assertions
WordNet (wordnet.princeton.edu)
English nouns synonym sets, verbs synsets, adjectives synsets and adverbs
synsets each representing one underlying lexical concept. Different relations link the
synonym sets.
OBO(obo.sourceforge.net)
Open Biomedical Ontology project Supported by NIH, NSF, etc.
Biological and medical domains, Sequnce, Palnt, etc. Eg, Gene Ontology: 17746 terms,
93.9% with definitions.
SUMO (IEEE)
Suggested Upper Merged Ontology
General-purpose concepts, foundation for more specific ontologies for different
domains.
9. YANG YU (yangyu1, UMBC)
More ontologies
www.google.com/search?q=filetype:owl+owl
UMBC Swoogle (swoogle.umbc.edu)
My Question: How to use ontologies, still in
research?
10. YANG YU (yangyu1, UMBC)
Why Ontology Mapping
The same term in two ontologies may mean different (previous
example).
Different Organizations may use different ontologies for the
same domain, resulting different terms representing the same
concept (eg, AI & CI); problems arise when they try to
communicate with each other – “interoperability problem”
H. S. Pinto. 1999, Some issues on ontology integration. In IJCAI-99
workshop on Ontologies and Problem-Solving Methods (KRR5)
Hi, I want to buy
some apples.
What are you talking about?
I only sell Red and Delicious
11. YANG YU (yangyu1, UMBC)
Ontology Mapping
Try to find relationships between each pair of concepts used in
two different ontologies. For example, Equivalent, Subclass_Of,
Superclass_Of, Siblings, Similar (how much similar?), Different
(how much different?)
Ontology A1
Ontology A2
Obtaining probabilistic values (N * M)
that shows how well class ni in Ontology A1
maps to class nj in Ontology A2
N M
12. YANG YU (yangyu1, UMBC)
Manual Mapping
OpenCyc
SENSUS, FIPS 10-4,several large (300k-term) pharmaceutical
thesauri, large portions of WordNet, MeSH/Snomed/UMLS,
and the CIA World Factbook.
Knowledge worker + domain expert
Interactive clarification tool + domain expert
Mapping Ontologies into Cyc, Cyc Corp, 2002
SUMO WordNet
Mapping WordNet to the SUMO Ontology, Teknowledge
Corp, 2002
Advantages and Disadvantages
13. YANG YU (yangyu1, UMBC)
Lexical Based Approach
John Li, 2003, LOM – a Lexicon based
ontology mapping tool. Information
Interpretation and Integration Conference
String matching, adding some techniques,
like word stem
MeetingPlace and the_Place_of_Meeting
Write and Written
14. YANG YU (yangyu1, UMBC)
Machine Learning Based
Approach
Machine Learning
Learning is a process, after which, if success,
enables one to do something one cannot do
before.
“Machine learning refers to a system capable of
the autonomous acquisition and integration of
knowledge” (AAAI)
Text Classification
Supervised Machine Learning
single-category text classification
15. YANG YU (yangyu1, UMBC)
Some Machine Learning Based
Ontology Mapping System
CAIMEN
Lacher, M.; and Groh, G. May 2001. Facilitating the
Exchange of Explicit Knowledge through Ontology
Mappings. In Proceedings of the 14th International
FLAIRS Conference. Key West, FL, USA
Glue
Doan Anhai, et al. 2003. Learning to match
ontologies on the Semantic Web. Volume 12, Issue
4, VLDB Journal
16. YANG YU (yangyu1, UMBC)
UMBC OntoMapper
Prasad, S.; Peng, Y.; and Finin, T. 2002. A Tool For Mapping
Between Two Ontologies (Poster), International Semantic
Web Conference (ISWC02).
According to the
researchers:
Results not encouraging
because of very
few samplers
17. YANG YU (yangyu1, UMBC)
A Problem of Machine Learning
Based Ontology Mapping
Samplers used to train the learner are
collected or created manually by ontology
workers
May ensure quality?
Lack of quantity
If samplers are not enough, a concept may not be well
represented.
18. YANG YU (yangyu1, UMBC)
My Proposed Research
Obtaining Samplers from the Web Automatically
for Machine Learning Based Ontology Mapping
Advantages:
Ensure samplers quantity
Web Documents: A lot of Documents created in a distributed
environment, well representing various aspects of a concept.
Low cost
By using search engines like Google, documents can be easily
collected
Disadvantages:
Quality issue
19. YANG YU (yangyu1, UMBC)
System Overview
Ontology A1
Ontology A2
parser
Samplers
By Classes
Samplers
By Classes
Queries A1
Queries A2
20. YANG YU (yangyu1, UMBC)
System Overview (Cont.)
Samplers
For A1
Samplers
For A2
Model A1
Model A2
Text Classifier
1
1
2
2
21. YANG YU (yangyu1, UMBC)
Text Classifier
System Overview (Cont.)
Ontology A1
Model A2
Samplers
For A1
Samplers for
N classes
Suppose having N classes
models for
M classes
Obtaining probabilistic values
(N * M)
that shows how well class ni in
Ontology A1
maps to class nj in Ontology A2
models for
M classes
22. YANG YU (yangyu1, UMBC)
Evaluation
Compare the mapping results of the “enhanced” system with
mapping results obtained from human experts.
23. YANG YU (yangyu1, UMBC)
Current Result & Future Work
Text Classifier Rainbow doesn’t work well, considering switching
to other text classification tool, for example Weka or some
sourceforge projects.
Trying to find how to utilize the raw probabilistic value obtained
from the cross-classification.
Trying to use clustering algorithms to improve the quality of the
samplers