Semantic Web Nature

Nature Inspired Methods for the Semantic Web

Monica Macoveiciuc and Constantin Stan

Faculty of Computer Science, Alexandru Ioan Cuza University, Iasi

Abstract. The Semantic Web is a vision of information that is under-
standable by computers. Although there is great exploitable potential,
we are still in “Generation Zero” of the Semantic Web, since there are
few real-world compelling applications. The heterogeneity, the volume
of data and the lack of standards are problems that could be addressed
through some nature inspired methods. The paper presents the most im-
portant aspects of the Semantic Web, as well as its biggest issues; it then
describes some methods inspired from nature - genetic algorithms, arti-
ﬁcial neural networks, swarm intelligence, and the way these techniques
can be used to deal with Semantic Web problems.

Introduction

The World Wide Web is a universal medium for information and data exchange.
Exploiting the huge amount of knowledge distributed on the Web is a significant
challenge. Humans can understand the information, but it takes great effort to
find and combine data from such a large number of sources; on the other hand,
computers can easily browse through millions of pages in no time, but they are
not capable of understanding the content. The Semantic Web is a new paradigm
for the Web in which the semantics of information is defined, making it possible
for the web to understand and satisfy the requests of people and machines to use
the web resources[1]. In other words, the Semantic Web is a vision of information
that is understandable by computers. It contains a set of design principles and
a variety of enabling technologies. Some of the elements are expressed in formal
specifications, while others are still to be rigorously described.

The ontology is a key aspect of the Semantic Web, although it does not have
a universally accepted definition. It is described as “a formal specification of a
shared conceptualization” [2]. There is no commonly agreed ontology that every
data provider would rely on; the information is heterogeneous and distributed.
Existing reasoning techniques may not be able to deal with the different ontolo-
gies describing the same piece of knowledge, with the high number of instances,
with the lack of maintenance, the unreliability of the network, the variety of qual-
ity of the information available on the web. Given this context, soft computing
has an important role in coping with knowledge, and the methods inspired from
natures might be able to suggest interesting solutions for these problems.

This paper presents nature inspired techniques that can address some of the
main issues of the Semantic Web. Genetic algorithms, swarm intelligence or
neural networks could represent viable solutions for overcoming problems such
as ontology alignment, concept classification, RDF query path optimization etc.

Semantic Web

Advanced management information is the main benefit brought by the Semantic
Web vision. One should stop browsing documents and start performing concrete
queries. New knowledge should be inffered from the existing facts. The potential
advantages of these achievements are multiple:

– information can be located based on its meaning;
– information from different sources can be combined, summarized, presented
to the user in a improved format;
– information can be integrated across different sources.

1 Technologies

Semantic Web technologies can be considered in terms of layers, each of them
resting on and extending the functionality of the layers beneath it. The hierarchy
of the most important languages and technologies is described in the famous
“Layer Cake” diagram [3].

Semantic Web Layer Cake

The core technologies are RDF (Resource Description Framework) and RDFS
(RDF Schema). RDF is a markup language for describing information and re-
sources on the web. Any object that is uniquely identifiable by an URI (or
Uniform Resource Identifier) is considered a resource. Resources have properties
(attributes or characteristics).

The RDF model is a collection of facts, represented by statements (triples).
Each triple consists of a subject, a predicate and an object. The most common
representation of the triple is the graph-based one: subject-predicate-object is
seen as a node-arch-node link. The statements are unambiguous and have a uni-
form structure; each concept is defined in a dedicated space on the web. For
example, the statement “Jane is Tom’s mother” can be expressed is RDF as:

<rdf:Description rdf:about=www.persons.org/#jane>
<s:isWoman>Jane</s:isWoman>
<s:hasChild rdf:resource= www.persons.org/#tom>
</rdf:Description>

In order to describe general statements about classes or groups of objects, we
use RDF Schema, or RDFS. RDFS provides a basic object model, while RDF
refers to specific objects. The statement above can be described in RDFS as “A
woman is someone’s mother”.

RDF and RDFS allow us to describe aspects of a domain, but the modeling
primitives are too restrictive to be of general use. The taxonomic structure of
the domain, the restrictions and constraints cannot be described through this
model. It is also not possible to reason over inference rules. All these limitations
are overcome with the use of ontologies. Ontologies provide a common under-
standing of a domain of interest. The specification is formal, which means that
computers can perform reasoning about it. OWL (Web Ontology Language) is
a family of ontology languages, and it is the W3C specification for creating
Semantic Web applications. OWL builds upon RDF and RDFS and defines hi-
erarchies and relationships between resources. Semantic Web ontologies consist
of a taxonomy and a set of inference rules from which machines can make logical
conclusions. A taxonomy is a system of classification that groups resources into
classes and sub-classes based on their relationships and shared properties.

The top layers of the Layer Cake are very important in the context of Seman-
tic Web applications deployment. The trust layer deals with authentication and
reliability of data and services, through the use of digital signatures, ratings by
certification agencies, recommendations by trusted agents etc. The proof layer
allows applications to give proof of their conclusions and it includes the actual
deductive process, validation etc. Several refinements have been proposed for the
Semantic Web Layer Cake. One of them, suggested by sir Tim Berners-Lee in
2006, includes new features, such as:

– Rules and Inferencing Systems (RIF). It is a language for representing rules
and for linking rule-based systems; the formalisms are being extended in
order to encapsulate probabilistic, temporal and causal knowledge.
– RDF Extraction. GRDDL “Gleaning Resource Descriptions from Dialects of
Languages”) is a language that identifies when an XML document contains
data compatible with RDF and it is capable of extracting the data.
– Database Support for RDF. Oracle provides support for RDF and OWL
databases; for the moment, the focus is on storage, rather than inferencing
capabilities. There are various open source projects that offer solutions for
storage - such as Jena, as well as query languages for RDF (SPARQL being
the most important).

Revised Semantic Web Layer Cake

2 Current Problems

Although the Semantic Web vision has great potential, for more than a decade
it has been “a kind of academic exercise rather than a practical technology” [4].
One of the main reasons is the lack of a common understanding of what the
Semantic Web can offer and, more particularly, of the role of ontologies.

RDF and OWL can be confusing and complicated to understand for less-technical
people. There is a huge amount of information that needs to be annotated in
order to be processed and infered, and the two possible solutions for this are
both hard to put into practice: either an automatic process should apply an al-
gorithm that takes a piece of text and produces RDF, or people should manually
annotate existing documents. The first approach - of an intelligent algorithm -
is unlikely, since having such an algorithm would make the RDF and OWL seem
deprecated. Manual annotation is inefficient and prone to error.

One of the biggest issues of the Semantic Web is that it seems to be scat-
tered into small pieces. The existing initiatives and applications focus on small
domains and the access to the Semantic Web seems limited from the perspective
of the average user.

However, there are already a wide range of applications in existence or under
development. Some typical areas seem to offer a great potential (although not
fully exploited for the moment) for the development of such applications.

1. E-Science. These kind of applications involve large data collections that require
computationally intensive processing. The participants are usually distributed

across the world. A representative project is the Gene Ontology (GO) [5] one.
GO is a major bioinformatics initiative with the aim of standardizing the repre-
sentation of gene and gene product attributes across species and databases. The
Human Genome Project, finalized in 2003, is probably the most famous e-science
project.

2. Travel Information Systems. There are efforts in the direction of building
XML based specifications which would allow the interchange of information be-
tween companies. The benefits would be major for the users, since they would
be able to easily plan the whole trip - accommodation, transportation etc. The
big issue for the moment is the inexistence of an agreed ontology for this domain.

3. Digital Libraries. Over the past years, institutions such as universities, li-
braries and museums have made their large inventories of materials available
online. Although they have the same goal, the implementations of these sys-
tems are totally different. It is difficult for an institution to access another one’s
catalogues. One solution for this problem is the use of ontologies and of some
ontology mapping techniques, that would help achieve semantic interoperability.

4. Health Care. This domain stands to gain tremendous benefit by adoption
of Semantic Web technologies, as it depends on the interoperability of informa-
tion from many domains and processes for efficient decision support.

At present, the Semantic Web is increasingly used by small and large business.
Oracle (RDF management platform), IBM, Adobe (tool for adding RDF-based
metadata to most of their file formats), Software AG, or Yahoo! are the most im-
portant corporations that have already started working with these technologies
and are already selling tools, as well as complete business solutions. In August
2008, Microsoft bought Powerset, a semantic search engine, for a reported $ 100
million.

There are also open source applications, such as Protege [6] and Kowari [7],
that provide building blocks for application development, making it more cost
effective to develop Semantic Web products.

Nature Inspired Methods in the Context of
Semantic Web

The vast amount, the variety and heterogeneity of the data involved in the Se-
mantic Web vision makes it sometimes difficult for applications to deal with it,
turning many real world problems into NP-hard problems. Nature inspired rea-
soning might be able to adress and solve some of these issues.

Natural computing finds its source of inspiration in biological phenomena and
social behaviors from mainly insects and birds. Such algorithms are able to find
acceptable results for NP-hard problems within a reasonable amount of time,
rather than guarantee the optimal solution. The most important methods in-
spired from nature include genetic algorithms, neural networks, particle swarm
or ant colony optimization etc.

3 Genetic Algorithms

Genetic Algorithms (GAs) consist of some model techniques used by simple bi-
ological system. These systems use reproduction to produce offspring that can
better survive in their environment. Genetic algorithms use reproduction oper-
ators (mutation and crossover) and strategies (’survival of the fittest’) inspired
from these realities, in order to improve the quality solutions to a particular
problem. The advantage of GAs compared to other algorithms and methods is
that they make only few assumptions about the underlying fitness landscape
and, therefore, they perform well in many different problem categories.

These algorithms proceed according to a simple scheme:

1. a population of random individuals is created;
2. each individual is tested in order to determine its utility as solution;
3. a fitness value is assigned to each individual, based on the previous evalua-
tion;
4. a selection process filters out the individuals with low fitness and allows those
with good fitness to enter the mating pool with a higher probability;
5. a reproduction process creates offspring by combining or varying the solution
candidates;
6. if the termination criterion is met, the evolution stops; otherwise, it continues
starting with step 2.

Genetic algorithms can be a viable solutions for different problems that Seman-
tic Web is confronting with, such as RDF query path and ontology alignment
optimization, Semantic Web services composition etc.

The possibility of querying large amounts of data from different, heterogeneous
sources, in an efficient way, is an unsolved problem at the moment. In this con-
text, an interesting research field is the determination of query paths - the order
in which the parts of a query are evaluated. The order has a major role when it
comes to the execution time of the query, thus a good algorithm for determining
the query path can contribute to quick, efficient querying. Genetic algorithms
have been already tested, with some success, in problems related to this field.
The Iterative Improvement algorithm, followed by Simulated Annealing - also
known as the ’Two-Phase Optimization’ - addresses the optimal determination
of query paths.

A RDF query can be seen as a chain of subject-predicate-object triples. It can
be visualized as a tree, in which the leaf nodes represent the inputs and the
internal nodes are relational algebra operations. The nodes in such a query can
be ordered in many different ways, all of them producing the same result, but
with different execution time. In these conditions, the challenge consists in de-
termining the order in which the nodes should be placed, in order to optimize
the response time.

It is not difficult to identify the solution space of the problem as the set of
all the possible RDF trees. A population can be created by randomly select-
ing some of these threes (the chromosomes). A simple mutation operator would
switch the order of two random nodes (triples) in a chromosome. A crossover
operator would pick some of the nodes from a chromosome, conserving the order,
and put them together with the missing nodes taken from a second chromosome
(also conserving the order in this second chromosome). The fitness function is
calculated based on the execution time. Long executing times are not desirable
for a GA in an RDF query execution environment, therefore the stopping con-
dition should also consider (or be complemented with) a time limit.

Another interesting problem is the ontology alignment optimization. At the mo-
ment, there is no general agreed standard when it comes to ontologies. The
diversity of data makes it even less likely that such a standard would be possible
in the near future - the standards do not often fit to the specific needs of all
the participants in a potential standardization process; and it is very difficult
and expensive for many organizations to reach an agreement. Thus, ontology
alignment is a key aspect in order to make the knowledge exchange possible in
the context of Semantic Web.

Many attempts have been made to solve this issue using different combinations
of matchers, such as string normalization or similarity, data type comparison,
linguistic methods, inheritance analysis, graph mapping, taxonomy analysis etc.
A solution involving genetic algorithms would be able to cope with huge amounts
of data, without requiring human intervention. There are two difficult tasks when
defining the problem from the GA point of view: the content of a tentative solu-

tion should be encoded in a string of values, and a good fitness function should
be provided (a similarity measure function between two ontologies). Genetics
for Ontology ALignments (GOAL) [8] is a software tool for optimizing ontol-
ogy matching functions. GOAL defines the alignment evaluation process based
on four goals: optimizing the precision, optimizing the recall, optimizing the
f-measure or reducing the number of false positives. A chromosome is defined
through a method that converts a bit representation to a set of floating-point
numbers in the real range [0, 1]. The fitness function consists of selecting one of
the parameters retrieved by an alignment evaluation. The parameters are:
– precision - the percentage of items returned that are relevant;
– recall - the fraction of the items that are relevant to a query;
– f-measure - a harmonic mean from precision and recall;
– false positives - relationships which have been provided although they are
false.
The algorithm has its limitations, but it has managed to find the optimal so-
lution for different instances of the ontology mapping problem, in an efficient way.

Semantic Web service composition consists in finding web services (available
in a repository) that are able to accomplish a certain task. The task is defined
in a form of a composition request that contains a set of available input pa-
rameters and a set of wanted output parameters. The parameters are not the
explicit values, but concepts from an ontology describing the semantics of the
values. A sequence of services is called a composition. If the input parameters
given in the request are provided, the services from this sequence can be subse-
quently executed and will finally produce the desired output parameters. For a
genetic algorithm approach, one needs to find a way of representing a web service
sequence as a chromosome. A simple solution is to use strings of service identi-
fiers, which can be processed by standard genetic algorithms. Considering that
the chromosomes can have variable length, the normal GA operators could be
modified in order to make the search more efficient. This operation either deletes
the first service from a sequence, or adds a promising service to the sequence.
The other standard GA operations can be easily applied.

4 Neural Networks
An artificial neural network (ANN) is a system loosely modeled based on the
human brain, an emulation of the biological neural system. It consists of an in-
terconnected group of artificial neurons. The information is processed using a
connectionist approach to computation. Generally, an ANN is an adaptive sys-
tem, changing its structure according to the information that flows through the
network during the learning phase. They can be used to model complex rela-
tionships between inputs and outputs or to find patterns in data.

In the context of Semantic Web, artificial neural networks can be used in the

process of ontology mapping. The heterogeneity among different ontologies is
one of the biggest issues in this field nowadays. Web applications are developed
by different parties, that design their own ontologies, according to their own
views of the world. Many approaches have been proposed, in order to deal with
this heterogeneity, but each of them has its drawbacks. A centralized ontology
is very unlikely, so the efforts are now focused on distributed solutions: trying
to match the individual ontologies, and possibly reuse each other as well. Most
of the existing techniques are either rule-based, or learning-based, but both cat-
egories have their disadvantages.

A different approach combines rule-based and learning-based solutions, integrat-
ing machine learning techniques, such that the weights of a concept’s semantic
aspects can be learned from training examples, instead of being pre-defined. In
the real world, a common problem that can occur is the lack of instance data
- either in quantity or quality. This method avoids this problem, because the
learning process is carried out at the schema level, instead of the instance level.

Artificial neural networks are a good solution for the learning process, for many
reasons: instances are represented by attribute-value pairs; the target function
output is a real-valued one; fast evaluation of the learned target function is
preferable. ANN are also known to perform well in the presence of noisy data.
If the ontologies are to be learned from uncontrolled data, such as real existing
web pages, the handling of noise becomes a real issue.

Another interesting approach to the problem of ontology mapping is the use
of interactive activation and competition (IAC) neural networks (NN) to search
for a global optimal solution to best satisfy the ontology constraints. An IAC neu-
ral network consists of a number of competitive nodes connected to each other.
Each of these nodes represents a hypothesis, while the connection between two
nodes is a constraint between their hypotheses. The connection can be either
positive (activation) - if the hypotheses support each other - or negative (com-
petition). Each connection has a weight, which is proportional to the strength
of the constraint. The activation of a node is determined by more sources:

– the initial activation;
– the input from its adjacent nodes;
– its bias;
– the external input.

The characteristics of ontology mapping and the mechanisms of the IAC network
have common properties. The constraints in ontology mapping can be interactive
or competitive between mapping hypotheses. Before applying a neural network
based algorithm for learning, a preliminary mapping is made, which estimates
both the linguistic and structure information of ontologies. These prior knowl-
edge can be sees as external inputs or bias of a node in the IAC network.

5 Swarm Intelligence

Swarm intelligence is another approach to problem solving that takes inspiration
from social behaviors of insects and of other animals. Particularly, ant colony
optimization is one of the most successful techniques. Ant colony optimization
(ACO) is inspired by the ants that deposit pheromone on the ground in order
to mark some favorable path that should be followed by other members of the
colony. A similar mechanism has be transposed in an algorithm for solving op-
timization problems.

Semantic Web reasoning systems deal with growing amounts of distributed, dy-
namic resources. Swarm intelligence could be used in order to implement a RDF
graph traversal algorithm. Among the main properties of swarms are adaptive-
ness, robustness and scalability. These correspond to three concepts - no central
control, locality and simplicity. Thus, the combination of reasoning and swarm
intelligence can be a viable solution for obtaining reasoning performance by ba-
sic means.

A model of a decentralized system implies the traversal of a graph in order
to calculate the deductive closure of the graph, with respect to the RDFS se-
mantics. The role of swarm intelligence is to reduce the computational cost. In
order to calculate the RDFS closure over an RDF graph, a set of rules need
to be applied repeatedly to the triples in the graph. In the metaphor of ants,
each insect represent one of these rules, which might be (partially) instantiated.
Ants communicate with each other only locally and indirectly. Whenever the
condition of a rule matches the node an ant is on, it locally adds the newly
derived triple to the graph. Only the active reasoning rules are moving in the
network and not the data, minimizing network traﬃc, since schema-data is far
less numerous than instance-data. Having some transition capabilities between
graph-boundaries, the method converges towards the closure. This model has
been successfully implemented and the results are described in [9].

Conclusions

The Semantic Web vision comes with the promise of a world in which a common
understanding of the meaning of data can help humans and computers coop-
erate. However, it takes great effort to put in practice the revolutionary ideas,
since it is very difficult to agree upon standards and, afterwards, to update the
existing resources according to the potential standards. For the moment, the
Semantic Web seems to be scattered into small pieces, being available only on
a small scale and for very specific domains. On the other hand, there is a huge
amount of knowledge that can be exploited through automated processing and
adapted in order to be used. In this context, methods inspired from nature seem
to have the potential to address the currently unresolved problems of the Se-
mantic Web. These methods can deal with numerous data and can be used to
build high scalable applications. Since there is no perfection, therefore, no op-
timal solution for these problems, concepts such as genetic algorithms, artificial
neural networks or swarm intelligence might be able to provide good results.

This paper presented some ideas of applying nature inspired methods in or-
der to deal with Semantic Web’s challenges. The main aspects of Semantic Web
have been described, as well as the evolution during the past decade. The ar-
eas of interest and some (potential) applications have been presented and the
most important problems have been introduced and explained. Finally, the pa-
per presented the way methods inspired from nature can address the problems
of Semantic Web. Three of the most important techniques - genetic algorithms,
swarm intelligence, artificial neural networks - have been briefly described, along
with the efforts of applying them for solving problems such as ontology mapping,
RDF path optimization, RDF graph traversal, ontology alignment optimization.

References
[1] Tim Berners-Lee, James Hendler and Ora Lassila. The Semantic Web. Scientific
American, May 2001.
[2] Tom Gruber. What is an Ontology? http://www-ksl.stanford.edu/kst/what-is-an-
ontology.html
[3] Semantic Web Layer Cake. http://c2.com/cgi/wiki?SemanticWebLayerCake
[4] Alex Iskold. Semantic Web: Difficulties with the Classic Approach.
http://www.readwriteweb.com
[5] Gene Ontology Project. http://www.geneontology.org/
[6] Protege. http://protege.stanford.edu/
[7] Kowari. http://kowari.sourceforge.net/
[8] Jorge Martinez-Gil, Enrique Alba, Jose F. Aldana Montes. Optimizing Ontology
Alignments by Using Genetic Algorithms. Proceedings of Nature inspired Reasoning
for the Semantic Web (NatuReS), 2008.
[9] Kathrin Dentler, Stefan Schlobach, Christophe Guret. Semantic Web Reasoning by
Swarm Intelligence. Vrije Universiteit Amsterdam

[10] Human Genome Project.
http://www.ornl.gov/sci/techresources/Human Genome/home.shtml
[11] Riccardo Leardi. Nature Inspired Methods in Chemometrics: Genetic Algorithms
and Artificial Neural Networks, Volume 23 (Data Handling in Science and Technol-
ogy). Elsevier BV, 2003.
[12] Alexander Hogenboom, Viorel Milea, Flavius Frasincar, Uzay Kaymak. Genetic
Algorithms for RDF Query Path Optimization. Proceedings of Nature inspired Rea-
soning for the Semantic Web (NatuReS), 2008.
[13] Thomas Weise, Steffen Bleul, Diana Comes, and Kurt Geihs. Different Approaches
to Semantic Web Service Composition. WowKiVS, 2009.
[14] http://en.wikipedia.org/wiki/Artificial neural network
[15] John Cardiff. The Evolution of the Semantic Web. Social Media Research Group,
Institute of Technology Tallaght, Dublin, Ireland.

Semantic Web Nature

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Semantic Web Nature

Similaire à Semantic Web Nature (20)

Dernier

Dernier (20)

Semantic Web Nature