SAnno: a unifying framework for semantic annotation

SAnno: a unifying framework for semantic annotation

Davide Eynard
IDSIA, 01/06/2010

2

Introduction

• S(emantic)Anno(tations)
• … in Italy, “sanno” also means “they know”

• Basic principle: anyone should be able to say anything about
anything else
• Well, this should hold in general :-)
• Actually, in our case it is “anything about any URI”
• And we would like everyone to say that in a formal way

• But first, a little step back in time...

IDSIA, 01/06/2010 Davide Eynard

3

Participation and semantics

Data

Structure


4

Sanno's grandfather: Speakinabout [1]

• Purpose: produce semantic annotations about named entities
• When you read “Harry Potter”, is it the book or the movie?

• Plays with user gratifications
• When users annotate a string as matching a specific concept, they
are shown a list of services/search engines which are related to it

• Relies on user provided data:
• Freebase types
• User generated search templates, built inside a wiki system


5



6



7

Sanno's father: RDFMonkey [2]

• Purpose: augment browsing experience by providing
information/services related to the visited URL

• Relies on Freebase types
• … as in SpeakinAbout, but without requiring user interaction
• Types are found by searching backlinks in Freebase (which topics
are linking the visited page)

• Related services as widgets inside a browser extension
• The app could load widgets at runtime (from Freebase itself or
another collaborative system)


8

Sanno's father: RDFMonkey [2]
Cities
Musical Artists

Books


9

The problem

• We already have semantics on the annotation (i.e. Annotea), but
how can we have semantics within the annotation?

• Good starting points:
• Some participative systems already provide semi-structured
information (i.e. infoboxes in Wikipedia)
• Some communities of practice already built their own bottom-up
way to structure information (i.e. machine tags)
• Some (relatively new) systems allow, with some additional effort, to
save information in a structured way almost without requiring users
to know that (i.e. semantic wikis)

• Challenges
• Provide a shared way to describe annotations coming from
heterogeneous systems
• Aggregate this information to provide something new and useful


10

SAnno as a framework

• Sanno is built up of many different parts, which all together
provide something (we consider) new and useful
• An ontology to describe annotations (the “shells” that contain
metadata about a resource)
• An ontology describing the types of properties we are already able
to aggregate
• A set of conversion tools which are able to translate existing
annotations from other systems into our notation
• A system to show the results of the aggregation of different
annotations
• A system to manage provenance, authorship, and filters on
incoming annotations


11

The annotations ontology

• Every annotation can be considered as a “Post-it”, a piece of
paper where something is written about something else
• … you can say things about what is written there, but also about the
Post-it itself

• The annotation is about a resource, it is created by someone in a
specific date, it comes from a particular annotation system and
might be connected to a specific community

• Main goal: do not reinvent everything from scratch
• Reuse well-known ontologies such as DC, SIOC, etc.
• Use named graphs as an alternative to reifications

• Start in an easy way: restriction to URLs
• Also a way to provide instant gratification to users: show
annotations while they are browsing a website


12

The aggregation ontology

• Aggregation deals with the contents of the annotation (i.e. The
triples found in the NG)

• Objectives
• Avoid constraining users to a specific vocabulary for annotations
• Find a way to collect different annotations and provide something
new and interesting by aggregating them

 Our approach
• Properties used inside annotations could be described as belonging
to families we already know how to deal with
• Examples: very specific (tags, ratings), more general (transitive
relations)
• Properties inside some external vocabulary are mapped as
subproperties of ours
• … by whom? High-experience users who have incentives to do this
(think about users building templates in Wikipedia...)


13

Conversion tools

• Our worst enemy: the bootstrap
• who is going to annotate the first resources? I don't have time!

• Our best friends: already existing annotation systems
• why don't we convert existing data to our notation and show the
advantages of our approach?

 Different families of conversion tools
• Easy: already existing APIs, with realtime search functionalities
(i.e. del.icio.us)
• Medium: conversions from existing structured repositories such as
SPARQL endpoints (advantage: the conversion is very clean, you
just need one tool and different CONSTRUCTs)
• A little harder: Web scraping when no other sources are available


14

Annotation client

• Actually, two possible clients in our mind:
• a browser extension which shows annotations while users are
browsing the Web
• an independent service which is able to aggregate heterogeneous
information related to similar resources (i.e. URLs marked as being
MP3 files)

• Filter annotations according to author, date, originating system,
and community
• Users should be able to “subscribe” to some annotating
communities and ignore others
• System is thought as distributed, as data can come from different,
unrelated sources


15

The prototype

• Early annotation ontology
• Property families: tag, rating, generically related URI
• Conversions from SMW, Delicious
• Visualization as a web service + Firefox extension
• No subscriptions yet


16

The prototype


17

The end

Thank you! Questions?

References:
• [0] D.Laniado, D.Eynard and M.Colombetti. Using WordNet to turn a folksonomy into a
hierarchy of concepts. Semantic Web Application and Perspectives 192–201, 2007.
• [1] D.Eynard and M.Colombetti. Exploiting User Gratification for Collaborative Semantic
Annotation. Proceedings of SWUI 2008. April 2008.
• [2] D.Eynard. Using semantics and user participation to customize personalization. HP Labs
Technical Report HPL-2008-197. September 2008.
• [3] L.Mazzola, D.Eynard and R.Mazza. GVIS: a framework for graphical mashups of
heterogeneous sources to support data interpretation. HSI 2010. May 2010.


Contact Davide Eynard

Tel. 02 2399 4010
Fax 02 2399 3411

eynard@elet.polimi.it
http://davide.eynard.it
Project page @AIRLab: http://airwiki.elet.polimi.it

Back

SAnno: a unifying framework for semantic annotation

Recommended

Recommended

More Related Content

Similar to SAnno: a unifying framework for semantic annotation

Similar to SAnno: a unifying framework for semantic annotation (20)

More from Davide Eynard

More from Davide Eynard (15)

Recently uploaded

Recently uploaded (20)

SAnno: a unifying framework for semantic annotation