Semantic web Document

A SEMINAR REPORT ON

SEMANTIC WEB

SUBMITTED TO THE UNIVERSITY OF PUNE,
IN THE PARIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE AWARD OF THE DEGREE

OF

POST GRADUATE PROGRAM
MASTER OF COMPUTER ENGINEERING

BY

MANEPATIL ABHIJIT CHANDRASEN

UNDER THE GUIDANCE OF

PROF. P. R. BARAPATRE

STES’s SKN SINHGAD INSTITUTE OF TECHNOLOGY

Gat No. 309/310, Kusgaon (Bk.) Off Mumbai-Pune Expressway,
Lonavala, Tal - Maval, Dist - Pune - 410401.

ACADEMIC YEAR: 2012-2013

1

OCT- NOV 2012

CERTIFICATE

This is to certify that the project report entitled

“SEMANTIC WEB”

Submitted by

Manepatil Abhijit Chandrasen
Roll No: SKN CE-19

is a bonafide work carried out by him under the supervision of Prof. P.R.
Baraptre and it is approved for the partial fulfillment of the requirement of
University of Pune, for the award of the degree of Master of Computer
Engineering

The seminar work has not been earlier submitted to any other institute or
university for the award of degree or diploma.

Prof. P.R. Barapatre Prof. A. M. Kanthe Dr. J.S. Inamdar
(Seminar Guide) P.G. Co-ordinator Principal,
Comp. Engg. Dept SKN-SIT, Lonavala

Place : Pune
Date :

2

ACKNOWLEDGEMENT

I express my sincere thanks to Prof. P. R. Barapatre whose supervision, inspiration
and valuable discussion has helped me tremendously to complete this Seminar. His guidance
proved to be the most valuable to overcome all the hurdles in the fulfillment of this Seminar.
I am grateful to Prof. A. M. Kanthe P.G. Co-ordinator of Computer Department for
his co-operation and inspiration.
I would also like to express my appreciation and thanks to all my friends who
knowingly or unknowingly have assisted and encouraged me throughout my hard work.

3

INDEX

Topics Page No.

1. Introduction _______________________________________ 08

2. History ___________________________________________ 10
2.1 Web 1.0 ________________________________________ 10
2.2 Web 2.0 ________________________________________ 12

3. Web 3.0- A Basic Introduction _________________________ 14

4. The Semantic Web Vision _____________________________ 16

5. A Layered Approach _________________________________ 18

6. Key Components ____________________________________ 21
6.1URI ____________________________________________ 21
6.2 RDF___________________________________________ 22
6.3 RDFS __________________________________________ 23
6.4 OWL __________________________________________ 26
6.5 Microformat _____________________________________ 27

7. Practical Illustration __________________________________ 30

8. Difference between Web 1.0, Web 2.0 and Web 3.0 ________ 32

9. Challenges _________________________________________ 34

10.Project Implementation _______________________________ 35

11.Conclusion _________________________________________ 38

12. References _________________________________________ 39

4

FIGURE INDEX

Figure Page No.

1. Figure 1 Web 1.0 Example …………………… 11
2. Figure 2 Web 2.0 Example …………………… 13
3. Figure 3 Layered Approach of Semantic Web .. 18
4. Figure 4 RDF Example ………………………… 25
5. Figure 5 Traditional Web Model ……………… 30
6. Figure 6 Semantic Web Model ……………....... 31

5

ABSTRACT

The Semantic Web is an evolving development of the World Wide Web in which the
meaning (semantics) of information and services on the web is defined, making it possible for
the web to "understand" and satisfy the requests of people and machines to use the web
content. At its core, the semantic web comprises a set of design principle. collaborative
working groups, and a variety of enabling technologies.
Some elements of the semantic web are expressed as prospective future
possibilities that are yet to be implemented or realized. Other elements of the semantic web are
expressed in formal specifications. Some of these include Resource Description Framework
(RDF), a variety of data interchange formats (e.g. RDF/XML, N3, Turtle, N-Triples), and
notations such as RDF Schema (RDFS) and the Web Ontology Language (OWL), all of which
are intended to provide a formal description of concepts, terms, and relationships within a
given knowledge .
The key components of semantic web technology are as follows:

1. OWL: The Web Ontology Language (OWL) is a family of knowledge representation
languages for authoring ontologies endorsed by the World Wide Web Consortium.
They are characterized by formal semantics and RDF/XML-based serializations for the
Semantic Web. OWL has attracted both academic, medical and commercial interest.
2. Resource Description Format: The Resource Description Framework (RDF) is a family
of World Wide Web Consortium (W3C) specifications originally designed as a
metadata data model. It has come to be used as a general method for conceptual
description or modeling of information that is implemented in web resources, using a
variety of syntax formats.
3. RDF Schema: RDF Schema (various abbreviated as RDFS, RDF(S), RDF-S, or
RDF/S) is an extensible knowledge representation language, providing basic elements
for the description of ontologies, otherwise called Resource Description Framework
(RDF) vocabularies, intended to structure RDF resources.
4. Microformat: A microformat (sometimes abbreviated μF) is a web-based approach to
semantic markup that seeks to re-use existing HTML/XHTML tags to convey metadata
and other attributes, in web pages and other contexts that support (X)HTML, such as
RSS. This approach allows information intended for end-users .

6

1. INTRODUCTION

Currently the focus of a W3C working group, the Semantic Web vision was
conceived by Tim Berners-Lee, the inventor of the World Wide Web. The World Wide Web
changed the way we communicate, the way we do business, the way we seek information and
entertainment – the very way most of us live our daily lives. Calling it the next step in Web
evolution, Berners-Lee defines the Semantic Web as ―a web of data that can be processed
directly and indirectly by machines.‖

In the Semantic Web data itself becomes part of the Web and is able to be
processed independently of application, platform, or domain. This is in contrast to the World
Wide Web as we know it today, which contains virtually boundless information in the form of
documents. We can use computers to search for these documents, but they still have to be read
and interpreted by humans before any useful information can be extrapolated. Computers can
present you with information but can’t understand what the information is well enough to
display the data that is most relevant in a given circumstance. The Semantic Web, on the other
hand, is about having data as well as documents on the Web so that machines can process,
transform, assemble, and even act on the data in useful ways.

Imagine this scenario. You’re a software consultant and have just received a
new project. You’re to create a series of SOAP-based Web services for one of your biggest
clients. First, you need to learn a bit about SOAP, so you search for the term using your
favorite search engine. Unfortunately, the results you’re presented with are hardly helpful.
There are listings for dish detergents, facial soaps, and even soap operas mixed into the results.
Only after sifting through multiple listings and reading through the linked pages are you able
to find information about the W3C’s SOAP specifications.

Because, of the different semantic associations of the word ―soap,‖ the results
you receive are varied in relevance and you still have to do a lot of work to find the
information you’re looking for. However, in a Semantic Web-enabled environment, you could
use a Semantic Web agent to search the Web for ―SOAP‖ where SOAP is a type of technology
specification used in Web services. This time, the results of your search will be relevant. Your
Semantic Web agent can also search your corporate network for the SOAP specification and
discover if your colleagues have completed similar projects or have posted SOAP-related

8

research on the network. Based on the semantic information available for SOAP, your agent
also presents you with a list of related technologies. Now you know that WSDL, XML, and
URI are all technologies related to SOAP, and that you’ll need to do some research on them,
too, before beginning your project. Armed with the information returned by your Semantic
Web agent, you read the related technology specifications and send emails to the colleagues
who have made SOAP-related materials available on the network to ask for their input before
starting your new project.

9

2. HISTORY

2.1 Web 1.0:
Web 1.0 (1991-2003) is a retronym which refers to the state of the World
Wide Web, and any website design style used before the advent of the Web 2.0
phenomenon. Web 1.0 began with the release of the WWW to the public in 1991, and
is the general term that has been created to describe the Web before the "bursting of the
Dot-com bubble" in 2001, which is seen by many as a turning point for the internet.

2.1.1 WEB 1.0 DESIGN ELEMENTS

Some typical design elements of a Web 1.0 site include:

Static pages instead of dynamic user-generated content.
The use of framesets.
Proprietary HTML extensions such as the <blink> and <marquee> tags introduced
during the first browser war.
Online guestbook.
GIF buttons, typically 88x31 pixels in size promoting web browsers and other products.
HTML forms sent via email. A user would fill in a form, and upon clicking submit their
email client would attempt to send an email containing the form's details.

10

2.1.2Web 1.0 Example:

Figure 1. Web 1.0 Example

Wikipedia is an example of web 1.0 because the site allows the user to only view pages
or search information at best, but the user interaction is minimum and the site is
basically static.

11

2.2 Web 2.0 :

The term "Web 2.0" (2004–present) is commonly associated with web
applications that facilitate interactive information sharing, interoperability, user-centered
design and collaboration on the World Wide Web. Examples of Web 2.0 include web-based
communities, hosted services, web applications, social-networking sites, video-sharing sites,
wikis, blogs, mashups, and folksonomies. A Web 2.0 site allows its users to interact with other
users or to change website content, in contrast to non-interactive websites where users are
limited to the passive viewing of information that is provided to them.

Although the term suggests a new version of the World Wide Web, it does
not refer to an update to any technical specifications, but rather to cumulative changes in the
ways software developers and end-users use the Web.

2.2.1 Web 2.0 Characteristics :

Web 2.0 websites allow users to do more than just retrieve information. They can
build on the interactive facilities of "Web 1.0" to provide "Network as platform" computing,
allowing users to run software-applications entirely through a browser. Users can own the data
on a Web 2.0 site and exercise control over that data .These sites may have an "Architecture of
participation" that encourages users to add value to the application as they use it.

The concept of Web-as-participation-platform captures many of these
characteristics. Bart Decrem, a founder and former CEO of Flock, calls Web 2.0 the
"participatory Web and regards the Web-as-information-source as Web 1.0.

The impossibility of excluding group-members who don’t contribute to the provision
of goods from sharing profits gives rise to the possibility that rational members will prefer to
withhold their contribution of effort and free-ride on the contribution of others.] This requires
what is sometimes called Radical Trust by the management of the website. According to Best
the characteristics of Web 2.0 are: rich user experience, user participation, dynamic content,
metadata, web standards and scalability. Further characteristics, such as openness, freedom]
and collective intelligence] by way of user participation, can also be viewed as essential
attributes of Web 2.0.

12

2.2.2 Web 2.0 Examples:

Figure 2 Web 2.0 Examples

Facebook is a social networking site and it is a prominent example of web 2.0. This site allows
user to make friends, write them messages, chat with them , upload and share photos etc.
activities.

13

3. Web 3.0- A Basic Introduction:

The Semantic Web is a mesh of information linked up in such a way as to be easily
processable by machines, on a global scale. You can think of it as being an efficient way of
representing data on the World Wide Web, or as a globally linked database.

The Semantic Web was thought up by Tim Berners-Lee, inventor of the WWW,
URIs, HTTP, and HTML. There is a dedicated team of people at the World Wide Web
consortium (W3C) working to improve, extend and standardize the system, and many
languages, publications, tools and so on have already been developed. However, Semantic
Web technologies are still very much in their infancies, and although the future of the project
in general appears to be bright, there seems to be little consensus about the likely direction and
characteristics of the early Semantic Web.

What's the rationale for such a system? Data that is generally hidden away in HTML
files is often useful in some contexts, but not in others. The problem with the majority of data
on the Web that is in this form at the moment is that it is difficult to use on a large scale,
because there is no global system for publishing data in such a way as it can be easily
processed by anyone. For example, just think of information about local sports events, weather
information, plane times, Major League Baseball statistics, and television guides... all of this
information is presented by numerous sites, but all in HTML. The problem with that is that, is
some contexts, it is difficult to use this data in the ways that one might want to do so.

The Semantic Web is a web of data. There is lots of data we all use every day, and
it is not part of the web. I can see my bank statements on the web, and my photographs, and I
can see my appointments in a calendar. But can I see my photos in a calendar to see what I was
doing when I took them? Can I see bank statement lines in a calendar?

Why not? Because we don't have a web of data. Because data is controlled by
applications, and each application keeps it to itself.

The Semantic Web is about two things. It is about common formats for
integration and combination of data drawn from diverse sources, where on the original Web
mainly concentrated on the interchange of documents. It is also about language for recording
how the data relates to real world objects. That allows a person, or a machine, to start off in

14

one database, and then move through an unending set of databases which are connected not by
wires but by being about the same thing.

15

4.0 The Semantic Web Vision
Today’s Web
The World Wide Web has changed the way people communicate with each other and
the way business is conducted. It lies at the heart of a revolution which is currently
transforming the developed world towards a knowledge economy, and more broadly speaking,
to a knowledge society. This development has also changed the way we think of computers.
Originally they were used for computing numerical calculations. Currently their predominant
use is information processing, typical applications being data bases, text processing, and
games. At present there is a transition of focus towards the view of computers as entry points
to the information highways. Most of today’s Web content is suitable for human consumption.
Even Web content that is generated automatically from data bases is usually presented without
the original structural information found in data bases. Typical uses of the Web today involve
humans seeking and consuming information, searching and getting in touch with other
humans, reviewing the catalogs of online stores and ordering products by filling out forms, and
viewing adult material. These activities are not particularly well supported by software tools.
Apart from the existence of links which establish connections between documents, the main
valuable, indeed indispensable, kind of tools are search engines. Keyword-based search
engines, such as AltaVista, Yahoo and Google, are the main tool for using today’s Web. It is
clear that the Web would not have been the huge success it was, were it not for search engines.
However there are serious problems associated with their use. Here we list the main ones: •
High recall, low precision: Even if the main relevant pages are retrieved, they are of little use if
another 28,758 mildly relevant or irrelevant documents were also retrieved. Too much can
easily become as bad as too little.

• Low or no recall: Often it happens that we don’t get any answer for our request, or that
important and relevant pages are not retrieved. Although low recall is a less frequent problem
with current search engines, it does occur. This is often due to the third problem:
• Results highly sensitive to vocabulary: Often we have to use semantically similar keywords
to get the results we wish; in these cases the relevant documents use different terminology
from the original query. This behaviour is unsatisfactory, since semantically similar queries
should return similar results.

16

• Results are single Web pages: If we need information that is spread over various documents,
then we must initiate several queries to collect the relevant documents, and then we must
manually extract the partial information and put it together.

Interestingly, despite obvious improvements in search engine technology, the difficulties
remain essentially the same. It seems that the amount of Web content outgrows the
technological progress. But even if a search is successful, it is the human who has to browse
selected retrieved documents to extract the information he is actually looking for. In other
words, there is not much support for retrieving the information (for some limited exceptions
see the next section), an activity that can be very time-consuming. Therefore the term
information retrieval, used in association with search engines, is somewhat misleading,
location finder might be a more appropriate term. Also, results of Web searches are not readily
accessible by other software tools; search engines are often isolated applications. The main
obstacle for providing a better support to Web users is that, at present, the meaning ofWeb
content is not machine accessible. Of course there are tools that can retrieve texts, split them
into parts, check the spelling, decompose them, put them together in various ways, and count
their words. But when it comes to interpreting sentences and extracting useful information
for users, the capabilities of current software is still very limited.

17

5.0 A Layered Approach :

Fig. 3 : A Layered Approach of Semantic Web

The development of the Semantic Web proceeds in steps, each step building a layer on
top of another. The pragmatic justification for this approach is that it is easier to achieve
consensus on small steps, while it is much harder to get everyone on board if too much is
attempted. Usually there are several research groups moving in different directions; this
competition of ideas is a major driving force for scientific progress. However, from an
engineering perspective there is a need to standardize. So if most researchers agree on certain
issues and disagree on others, it makes sense to fix the points of agreement. This way, even if
the more ambitious research efforts should fail, there will be at least partial positive outcomes.
Once a a standard has been established, many more groups and companies will adopt it,
instead of waiting to see which of the alternative research lines will be successful in the end.
The nature of the Semantic Web is such that companies and single users must build tools, add
content and use that content. We cannot wait until the full Semantic Web vision materializes –
it may take another 10 years for it to be realized to its full extent (as envisioned today, of

18

course!). In building one layer of the SemanticWeb on top of another, there are someprinciples
that should be followed:

1. Downward compatibility: Agents fully aware of a layer should also be able to interpret and
use information written at lower levels. For example, agents aware of the semantics of OWL
can take full advantage of information written in RDF and RDF Schema.
2. Upward partial understanding: On the other hand, agents fully aware of a layer should take
at least partial advantage of information at higher levels. For example, an agent aware only of
the RDF and RDF Schema semantics can interpret knowledge written in OWL partly, by
disregarding those elements that go beyond RDF and RDF Schema.

Figure shows the ―layer cake‖ of the Semantic Web, which is due to Tim Berners-Lee and
describes the main layers of the Semantic Web designand vision. At the bottom we find XML,
a language that lets one write structuredWeb documents with a user-defined vocabulary. XML
is particularly suitable for sending documents across the Web. RDF is a basic data model, like
the entity-relationship model, for writing simple statements about Web objects (resources).
The RDF data model does not rely on XML, but RDF has an XML-based syntax. Therefore in
Figure it is located on top of the XML layer.
RDF Schema provides modelling primitives for organizingWeb objects into hierarchies. Key
primitives are classes and properties, subclass and subproperty relationships, and domain and
range restrictions. RDF Schema is based on RDF. RDF Schema can be viewed as a primitive
language for writing ontologies. But there is a need for more powerful ontology languages that
expand RDF Schema and allow the representations of more complex relationships between
Web objects. The logic layer is used to enhance the ontology language further, and to allow to
write application-specific declarative knowledge. The proof layer involves the actual deductive
process, as well as the representation of proofs in Web languages (from lower levels) and proof
validation. Finally trust will emerge through the use of digital signatures, and other kind of
knowledge, based on recommendations by agents we trust, or rating and certification agencies
and consumer bodies. Sometimes the word Web of Trust is used, to indicate that trust will be
organised in the same distributed and chaotic way as theWWWitself. Being located at the top
of the pyramid, trust is a high-level and crucial concept: The Web will only achieve its full
potential when users have trust in its operations (security) and the quality of information
provided.

19

Description:

The basic architecture of semantic web contains Identifiers (Uniform Resource
Identifiers) and character code as Unicode. Above this layer is the Syntax layer, defining the
syntactical relationship and the base here is XML. Above this layer is the Data Interchange
layer with RDF defining the same. Above it the query handling part is handled by SPARQL
and the taxonomies is determined by RDFS. The Ontologies are governed by OWL and rules
by RIF/SWRL. Above it is the unifying logic and the proof layer. All the aforementioned
layers were encrypted using Cryptology. Above these is the Trust layer.

A brief description of all the aforementioned layers and components shall be given in the
upcoming segments of the report.

20

6. Key Components

Semantic Web has five main components which help in accomplishing the required
task and define the functioning of the web:

6.1 Uniform Resource Identifier :

A URI is simply a Web identifier: like the strings starting with "http:" or "ftp:" that
you often find on the World Wide Web. Anyone can create a URI, and the ownership of them
is clearly delegated, so they form an ideal base technology with which to build a global Web
on top of. In fact, the World Wide Web is such a thing: anything that has a URI is considered
to be "on the Web".
A URI may be classified as a locator (URL), or a name (URN), or both. A Uniform
Resource Name (URN) functions like a person's name, while a Uniform Resource Locator
(URL) resembles that person's street address . In other words: the URN defines an item's
identity, while the URL provides a method for finding it.
The URI syntax consists of a URI scheme name followed by a colon character,
and then by a scheme-specific part. The specifications that govern the schemes determine the
syntax and semantics of the scheme-specific part, although the URI syntax does force all
schemes to adhere to a certain generic syntax that, among other things, reserves certain
characters for special purposes (without always identifying those purposes). The URI syntax
also enforces restrictions on the scheme-specific part, in order to, for example, provide for a
degree of consistency when the part has a hierarchical structure. Percent encoding can add
extra information to a URI.

A URI reference is another type of string that represents a URI, and (in turn)
represents the resource identified by that URI. Informal usage does not often maintain the
distinction between a URI and a URI reference, but protocol documents should not allow for
ambiguity.

A URI reference may take the form of a full URI, or just the scheme-specific portion
of one, or even some trailing component thereof – even the empty string. An optional fragment
identifier, preceded by #, may be present at the end of a URI reference. The part of the

21

reference before the # indirectly identifies a resource, and the fragment identifier identifies
some portion of that resource.

In order to derive a URI from a URI reference, software converts the URI reference
to 'absolute' form by merging it with an absolute 'base' URI according to a fixed algorithm. The
system treats the URI reference as relative to the base URI, although in the case of an absolute
reference, the base has no relevance. The base URI typically identifies the document
containing the URI reference, although this can be overridden by declarations made within the
document or as part of an external data transmission protocol. If the base URI includes a
fragment identifier, it is ignored during the merging process. If a fragment identifier is present
in the URI reference, it is preserved during the merging process.

Web document markup languages frequently use URI references to point to other
resources, such as external documents or specific portions of the same logical document.

6.2 RDF:

The Resource Description Framework (RDF) is a family of World Wide Web
Consortium (W3C) specifications originally designed as a metadata data model. It has come to
be used as a general method for conceptual description or modeling of information that is
implemented in web resources, using a variety of syntax formats.

The RDF data model is similar to classic conceptual modeling approaches such
as Entity-Relationship or Class diagrams, as it is based upon the idea of making statements
about resources (in particular Web resources) in the form of subject-predicate-object
expressions. These expressions are known as triples in RDF terminology. The subject denotes
the resource, and the predicate denotes traits or aspects of the resource and expresses a
relationship between the subject and the object. For example, one way to represent the notion
"The sky has the color blue" in RDF is as the triple: a subject denoting "the sky", a predicate
denoting "has the color", and an object denoting "blue". RDF is an abstract model with several
serialization formats (i.e., file formats), and so the particular way in which a resource or triple
is encoded varies from format to format.

A collection of RDF statements intrinsically represents a labeled, directed multi-
graph. As such, an RDF-based data model is more naturally suited to certain kinds of
knowledge representation than the relational model and other ontological models. However, in

22

practice, RDF data is often persisted in relational database or native representations also called
Triplestores, or Quad stores if context (i.e. the named graph) is also persisted for each RDF
triple. As RDFS and OWL demonstrate, additional ontology languages can be built upon RDF.

The subject of an RDF statement is either a Uniform Resource Identifier (URI) or a
blank node, both of which denote resources. Resources indicated by blank nodes are called
anonymous resources. They are not directly identifiable from the RDF statement. The
predicate is a URI which also indicates a resource, representing a relationship. The object is a
URI, blank node or a Unicode string literal.

In Semantic Web applications, and in relatively popular applications of RDF like
RSS and FOAF (Friend of a Friend), resources tend to be represented by URIs that
intentionally denote, and can be used to access, actual data on the World Wide Web. But RDF,
in general, is not limited to the description of Internet-based resources. In fact, the URI that
names a resource does not have to be dereference able at all. For example, a URI that begins
with "http:" and is used as the subject of an RDF statement does not necessarily have to
represent a resource that is accessible via HTTP, nor does it need to represent a tangible,
network-accessible resource — such a URI could represent absolutely anything. However,
there is broad agreement that a bare URI (without a # symbol) which returns a 300-level coded
response when used in an http GET request should be treated as denoting the internet resource
that it succeeds in accessing.

6.3 RDFS:

RDF Schema (various abbreviated as RDFS, RDF(S), RDF-S, or RDF/S) is an
extensible knowledge representation language, providing basic elements for the description of
ontologies, otherwise called Resource Description Framework (RDF) vocabularies, intended to
structure RDF resources. The first version was published by the World-Wide Web Consortium
(W3C) in April 1998, and the final W3C recommendation was released in February 2004.
Many RDFS components are included in the more expressive language Web Ontology
Language (OWL).

For Example: rdfs:Class declares a resource as a class for other resources.

23

A typical example of an rdfs:Class is foaf:Person in the Friend of a Friend (FOAF)
vocabulary. An instance of foaf:Person is a resource that is linked to the class using the
rdf:type predicate, such as in the following formal expression of the natural language
sentence : 'John is a Person'.

Ex:John rdf:type foaf:Person

The definition of rdfs:Class is recursive: rdfs:Class is the rdfs:Class of any rdfs:Class.

rdfs:subClassOf allows to declare hierarchies of classes.

For example, the following declares that 'Every Person is an Agent':

foaf:Person rdfs:subClassOf foaf:Agent

Hierarchies of classes support inheritance of a property domain and range from a class
to its subclasses. The RDF Schema specification describes rdf:Property as the class of RDF
properties. Each member of the class is an RDF predicate.

rdfs:domain of an rdf:predicate declares the class of the subject in a triple whose second
component is the predicate.

rdfs:range of an rdf:predicate declares the class or datatype of the object in a triple whose
second component is the predicate.

For example, the following declarations are used to express that the property
ex:employer relates a subject, which is of type foaf:Person, to an object, which is of type
foaf:Organization:

ex:employer rdfs:domain foaf:Person

ex:employer rdfs:range foaf:Organization

Given the previous two declarations, the following triple requires that ex:John is necessarily a
foaf:Person, and ex:CompanyX is necessarily a foaf:Organization:

ex:John ex:employer ex:CompanyX

24

rdfs:subPropertyOf is an instance of rdf:Property that is used to state that all resources related
by one property are also related by another.

Example Statement: ―Abhijit stays in Pune.‖

Figure 4. RDF Example

RDF Triple: (Abhijit, stays in, Pune)

This can be mapped to a schema which contains the classes ‖ Citizen ‖ and ‖
Country‖. A Citizen ―abc‖ stays in a country ‖ X‖ , then ―X’ also involves ―abc‖.

The class citizen has subclasses ―Voting citizen ‖ and ‖ non voting citizen‖ and
the country class has subclasses ‖ states ‖ which inturn has subclasses ‖ city ‖ , ―town‖ , ‖
taluka‖ represented by the ―subclassof ‖ property.

25

The rectangle represents properties, ellipses in the RDFS layer represents classes
while ellipses in the RDF layer represents instances. The domain and range enforce
constraints on the subject and objects of a property.

So, the above diagram suggests that the subject ( Abhijit Thatte ) is a ―type‖
of voting citizen , object (Pune) is a ―type‖ of a city and the relationship between them is ‖
stays in‖ or ―resides in‖

6.4 OWL:

The Web Ontology Language (OWL) is a family of knowledge representation
languages for authoring ontologies endorsed by the World Wide Web Consortium. They are
characterised by formal semantics and RDF/XML-based serializations for the Semantic Web.
OWL has attracted both academic, medical and commercial interest.

In October 2007, a new W3C working group was started to extend OWL with
several new features as proposed in the OWL 1.1 member submission. This new version,
called OWL 2, soon found its way into semantic editors such as Protégé and semantic
reasoners such as Pellet, RacerPro and FaCT++. W3C announced the new version on 27
October 2009.

The OWL family contains many species, serializations, syntaxes and specifications
with similar names. This may be confusing unless a consistent approach is adopted. OWL and
OWL2 will be used to refer to the 2004 and 2009 specifications, respectively. Full species
names will be used, including specification version (for example, OWL2 EL). When referring
more generally, OWL Family will be used.

The data described by an ontology in the OWL family is interpreted as a set of
"individuals" and a set of "property assertions" which relate these individuals to each other. An
ontology consists of a set of axioms which place constraints on sets of individuals (called
"classes") and the types of relationships permitted between them. These axioms provide
semantics by allowing systems to infer additional information based on the data explicitly

26

provided. A full introduction to the expressive power of the OWL is provided in the W3C's
OWL Guide.

Example:

An ontology describing families might include axioms stating that a "hasMother" property
is only present between two individuals when "hasParent" is also present, and individuals of
class "HasTypeOBlood" are never related via "hasParent" to members of the
"HasTypeABBlood" class. If it is stated that the individual Harriet is related via "hasMother"
to the individual Sue, and that Harriet is a member of the "HasTypeOBlood" class, then it can
be inferred that Sue is not a member of "HasTypeABBlood".

6.5 Microformat:

A microformat (sometimes abbreviated μF) is a web-based approach to semantic
markup that seeks to re-use existing HTML/XHTML tags to convey metadata and other
attributes, in web pages and other contexts that support (X)HTML, such as RSS. This approach
allows information intended for end-users (such as contact information, geographic
coordinates, calendar events, and the like) to also be automatically processed by software.

Although the content of web pages is technically already capable of "automated
processing," and has been since the inception of the web, such processing is difficult because
the traditional markup tags used to display information on the web do not describe what the
information means. Microformats are intended to bridge this gap by attaching semantics, and
thereby obviate other, more complicated, methods of automated processing, such as natural
language processing or screen scraping. The use, adoption and processing of microformats
enables data items to be indexed, searched for, saved or cross-referenced, so that information
can be reused or combined.

Current microformats allow the encoding and extraction of events, contact
information, social relationships and so on. More are being developed. Version 3 of the Firefox

27

browser, as well as version 8 of Internet Explorer are expected to include native support for
microformats.

Microformats emerged as part of a grassroots movement to make recognizable data
items (such as events, contact details or geographical locations) capable of automated
processing by software, as well as directly readable by end-users Link-based microformats
emerged first. These include vote links that express opinions of the linked page, which can be
tallied into instant polls by search engines.

As the microformats community grew, CommerceNet, a nonprofit organization that
promotes electronic commerce on the Internet, helped sponsor and promote the technology and
support the microformats community in various ways. CommerceNet also helped co-found the
Microformats.org community site.

Neither CommerceNet nor Microformats.org is a standards body. The microformats
community is an open wiki, mailing list, and Internet relay chat (IRC) channel. Most of the
existing microformats were created at the Microformats.org wiki and associated mailing list,
by a process of gathering examples of web publishing behaviour, then codifying it. Some other
microformats (such as rel=nofollow and unAPI) have been proposed, or developed, elsewhere.

Example:

In this example, the contact information is presented as follows:

<div>
<div>Joe Doe</div>
<div>The Example Company</div>
<div>604-555-1234</div>
<a href="http://example.com/">http://example.com/</a>
</div>

With hCard microformat markup, that becomes:

<div class="vcard">
<div class="fn">Joe Doe</div>
<div class="org">The Example Company</div>
<div class="tel">604-555-1234</div>
<a class="url" href="http://example.com/">http://example.com/</a>

28

</div>

Here, the formatted name (fn), organisation (org), telephone number (tel) and web
address(url) have been identified using specific class names and the whole thing is wrapped in
class="vcard", which indicates that the other classes form an hCard (short for "HTML) and
are not merely coincidentally named. Other, optional, hCard classes also exist. It is now
possible for software, such as browser plug-ins, to extract the information, and transfer it to
other applications, such as an address book.

29

7. Practical Illustration Of Semantic Web Application:

If we suppose that a certain Professor Anjali Sharma wishes to
make a web page for her own encompassing a faculty page, a research page, a blog site and a
staff listing page then using traditional web modelling the pages would look like so:

Figure 5. Traditional Web Model

A Faculty Page

A research Page

Prof. xyz A Blog Site

A Staff Listing Page
Now, if she decides to use semantic web instead of the traditional web model then the
complexity and presentability of the web pages would increase immensely. So we can link
Professor Sharma’s faculty page to her research. Then link data in her blog to both of these.
And link profile data to her staff listing. And her staff listing could show some of the other
academics she works with. With her research page showing her links with worldwide research
collaborators. Who also know one of her colleagues. Who comment on Professor Sharma’s
blog regularly. With all this data being able to be displayed simply it provides a much richer
user experience and offers information that previously might not have been exposed.

The web page would now look like:

30

Figure 6 Semantic Web Model

The straight lines show the relationship between various web pages, researchers,
staff and other web entities. The inter twined relationship shows the complex relation between
data that can be viewed and the entities.

31

8. Difference between Web 1.0, Web 2.0 and Web 3.0:

Web 1.0: The Internet before 1999, experts call it Read-Only era. The average internet user's
role was limited only to reading the information presented to him. The best examples are
millions of static websites which mushroomed during the.com boom. There was no active
communication or information flow from consumer of the information to producer of the
information.

Web 2.0: The lack of active interaction of common user with the web lead to the birth of
Web 2.0. The year 1999 marked the beginning of a Read-Write-Publish era with notable
contributions from LiveJournal (Launched in April, 1999) and Blogger (Launched in August,
1999). Now even a non-technical user can actively interact & contribute to the web using
different blog platforms. This era empowered the common user with a few new concepts viz.
Blog, Social-Media & Video-Streaming. Publishing your content is only a few clicks away!
Few remarkable developments of Web 2.0 are Twitter, YouTube, eZineArticles, Flickr and
Facebook.

Web 3.0: It seems we have everything whatever we had wished for in Web 2.0, but it is way
behind when it comes to intelligence. Perhaps a six year old child has a better analytical
abilities than the existing search technologies! Keyword based search of web 2.0 resulted in an
information overload. The following attributes are going to be a part of Web 3.0:

contextual Search
Tailor made Search
Personalized Search
Evolution of 3D Web
Deductive Reasoning

Though Web is yet to see something which can be termed as fairly intelligent but the efforts to
achieve this goal has already began. 2 weeks back the Official Google Blog mentioned about
how Google search algorithm is now getting intelligent as it can identify many synonyms.

For example Pictures & Photos are now treated as similar in meaning. From now onwards your
search query GM crop will not lead you to GM (General Motors) website. Why? Cause, first

32

by synonym identification Google will understand that GM may mean General Motors or
Genetically Modified. Then by context i.e. by the keyword crop it will deduce that the user
wants information on genetically modified crops and not on General Motors. Similarly, GM
car will not lead you to genetically modified crop. Try out yourself to check how this newly
added artificial intelligence works in Google. Also, there are many websites built on Web 3.0
which personalizes your search. The web is indeed getting intelligent.

33

9. Challenges :

1. Vastness: The World Wide Web contains at least 48 billion pages as of this writing
(August 2, 2009). The SNOMED CT medical terminology ontology contains 370,000
class names, and existing technology has not yet been able to eliminate all semantically
duplicated terms. Any automated reasoning system will have to deal with truly huge
inputs.
2. Vagueness: These are imprecise concepts like "young" or "tall". This arises from the
vagueness of user queries, of concepts represented by content providers, of matching
query terms to provider terms and of trying to combine different knowledge bases with
overlapping but subtly different concepts. Fuzzy logic is the most common technique
for dealing with vagueness.
3. Uncertainty: These are precise concepts with uncertain values. For example, a patient
might present a set of symptoms which correspond to a number of different distinct
diagnoses each with a different probability. Probabilistic reasoning techniques are
generally employed to address uncertainty.
4. Inconsistency: These are logical contradictions which will inevitably arise during the
development of large ontologies .Deductive reasoning fails catastrophically when faced
with inconsistency, because "anything follows from a contradiction―.
5. Deceit: This is when the producer of the information is intentionally misleading the
consumer of the information. Cryptography techniques are currently utilized to
alleviate this threat.

34

10. Project Implementation:

This section provides some example projects and tools, but is very incomplete. The choice of
projects is somewhat arbitrary but may serve illustrative purposes. It is also remarkable that in
this early stage of the development of semantic web technology, it is already possible to
compile a list of hundreds of components that in one way or another can be used in building or
extending semantic webs.

A). DBPEDIA

DBpedia is an effort to publish structured data extracted from Wikipedia: the data is published
in RDF and made available on the Web for use under the GNU Free Documentation License,
thus allowing Semantic Web agents to provide inferencing and advanced querying over the
Wikipedia-derived dataset and facilitating interlinking, re-use and extension in other data-
sources.

B). FOAF

A popular application of the semantic web is Friend of a Friend (or FoaF), which uses RDF to
describe the relationships people have to other people and the "things" around them. FOAF
permits intelligent agents to make sense of the thousands of connections people have with each
other, their jobs and the items important to their lives; connections that may or may not be
enumerated in searches using traditional web search engines. Because the connections are so
vast in number, human interpretation of the information may not be the best way of analyzing
them.

FOAF is an example of how the Semantic Web attempts to make use of the relationships
within a social context.

C). GOODRELATIONS FOR E-COMMERCE

A huge potential for Semantic Web technologies lies in adding data structure and typed links
to the vast amount of offer data, product model features, and tendering / request for quotation
data.

35

The GoodRelations ontology is a popular vocabulary for expressing product information,
prices, payment options, etc. It also allows expressing demand in a straightforward fashion.

GoodRelations has been adopted by BestBuy, Yahoo, OpenLink Software, O'Reilly Media, the
Book Mashup, and many others.

D). SIOC

The SIOC Project - Semantically-Interlinked Online Communities provides a vocabulary of
terms and relationships that model web data spaces. Examples of such data spaces include,
among others: discussion forums, weblogs, blogrolls / feed subscriptions, mailing lists, shared
bookmarks, image galleries.

E). SIMILE

Semantic Interoperability of Metadata and Information in unLike Environments

SIMILE is a joint project, conducted by the MIT Libraries and MIT CSAIL, which seeks to
enhance interoperability among digital assets, schemata/vocabularies/ontologies, meta data,
and services.

F). NEXTBIO

A database consolidating high-throughput life sciences experimental data tagged and
connected via biomedical ontologies. Nextbio is accessible via a search engine interface.
Researchers can contribute their findings for incorporation to the database. The database
currently supports gene or protein expression data and is steadily expanding to support other
biological data types.

G). LINKING OPEN DATA

Datasets in the Linking Open Data project, as of Sept 2008

Class linkages within the Linking Open Data datasets

The Linking Open Data project is a W3C-led effort to create openly accessible, and
interlinked, RDF Data on the Web. The data in question takes the form of RDF Data Sets

36

drawn from a broad collection of data sources. There is a focus on the Linked Data style of
publishing RDF on the Web.

H). OPENPSI

OpenPSI the (OpenPSI project) is a community effort to create UK government linked data
service that supports research. It is a collaboration between the University of Southampton and
the UK government, lead by OPSI at the National Archive and is supported by JISC funding.

I). ERFGOEDPLUS.BE

Erfgoedplus.be ('heritage-plus') is a Belgian project aimed at disclosing all types of heritage
from the provinces of Limburg and Flemish Brabant and the city of Leuven to the public by
applying semantic web technology. Erfgoedplus.be uses RDF/XML, OWL and SKOS to
describe relationships to heritage types, concepts, objects, people, place and time. Data are
normalized and enriched by means of thesauri (AAT) and an ontology (CIDOC CRM),
available for input, conversion and navigation.

Erfgoedplus.be is a regional aggregator for EuropeanaLocal (Europeana) and an example of
how semantic web technology is applied within the heterogeneous context of heritage.

37

11. CONCLUSION

Semantic Web is the future of Internet. Semantic web is expected to re write the
internet as we know it and change the way we search information on net. The searches will
become personalized and the results will be more accurate and more relevant. The use of
Resource Description Format and Microformats will help in the advent of this technology.
Although there are many challenges that have to be overcome in order to do so but the
possibility of this technology overcoming and replacing the traditional web model seem
bright currently.
The traditional model of internet does not allow for intelligent searches and
takes a lot of time because of the irrelevant searches being displayed too. Semantic Web can
overcome all these problems to provide a better and rich user experience to consumers all
over the globe. The next generation of web will better connect people and will further advent
the information technology revolution.

38

13. REFERENCES

IEEE Internet Computing The Semantic Web: The Roles of XML and RDF
 Stefan Decker And Sergey Melnik Stanford University

IEEE INTELLIGENT SYSTEMS Ontology Languages for the Semantic Web
 Asunción Gómez-Pérez and Oscar Corcho, Universidad Politécnica de
Madrid

IEEE Published by the IEEE Computer Society:
 Semantics Scales Up Beyond Search in Web 3.0
 Amit Sheth • Kno.e.sis, Wright State University
 November/December 2011

T. Berners-Lee. Semantic Web Road Map.
 www.w3.org/DesignIssues/Semantic

www.semanticweb.org

www.wikipedia.org

39

Semantic web Document

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Semantic web Document

Similaire à Semantic web Document (20)

Dernier

Dernier (20)

Semantic web Document