SAnno: a unifying framework for semantic annotation
1. SAnno: a unifying framework for semantic annotation
Davide Eynard
IDSIA, 01/06/2010
2. 2
Introduction
• S(emantic)Anno(tations)
• … in Italy, “sanno” also means “they know”
• Basic principle: anyone should be able to say anything about
anything else
• Well, this should hold in general :-)
• Actually, in our case it is “anything about any URI”
• And we would like everyone to say that in a formal way
• But first, a little step back in time...
IDSIA, 01/06/2010 Davide Eynard
4. 4
Sanno's grandfather: Speakinabout [1]
• Purpose: produce semantic annotations about named entities
• When you read “Harry Potter”, is it the book or the movie?
• Plays with user gratifications
• When users annotate a string as matching a specific concept, they
are shown a list of services/search engines which are related to it
• Relies on user provided data:
• Freebase types
• User generated search templates, built inside a wiki system
IDSIA, 01/06/2010 Davide Eynard
7. 7
Sanno's father: RDFMonkey [2]
• Purpose: augment browsing experience by providing
information/services related to the visited URL
• Relies on Freebase types
• … as in SpeakinAbout, but without requiring user interaction
• Types are found by searching backlinks in Freebase (which topics
are linking the visited page)
• Related services as widgets inside a browser extension
• The app could load widgets at runtime (from Freebase itself or
another collaborative system)
IDSIA, 01/06/2010 Davide Eynard
9. 9
The problem
• We already have semantics on the annotation (i.e. Annotea), but
how can we have semantics within the annotation?
• Good starting points:
• Some participative systems already provide semi-structured
information (i.e. infoboxes in Wikipedia)
• Some communities of practice already built their own bottom-up
way to structure information (i.e. machine tags)
• Some (relatively new) systems allow, with some additional effort, to
save information in a structured way almost without requiring users
to know that (i.e. semantic wikis)
• Challenges
• Provide a shared way to describe annotations coming from
heterogeneous systems
• Aggregate this information to provide something new and useful
IDSIA, 01/06/2010 Davide Eynard
10. 10
SAnno as a framework
• Sanno is built up of many different parts, which all together
provide something (we consider) new and useful
• An ontology to describe annotations (the “shells” that contain
metadata about a resource)
• An ontology describing the types of properties we are already able
to aggregate
• A set of conversion tools which are able to translate existing
annotations from other systems into our notation
• A system to show the results of the aggregation of different
annotations
• A system to manage provenance, authorship, and filters on
incoming annotations
IDSIA, 01/06/2010 Davide Eynard
11. 11
The annotations ontology
• Every annotation can be considered as a “Post-it”, a piece of
paper where something is written about something else
• … you can say things about what is written there, but also about the
Post-it itself
• The annotation is about a resource, it is created by someone in a
specific date, it comes from a particular annotation system and
might be connected to a specific community
• Main goal: do not reinvent everything from scratch
• Reuse well-known ontologies such as DC, SIOC, etc.
• Use named graphs as an alternative to reifications
• Start in an easy way: restriction to URLs
• Also a way to provide instant gratification to users: show
annotations while they are browsing a website
IDSIA, 01/06/2010 Davide Eynard
12. 12
The aggregation ontology
• Aggregation deals with the contents of the annotation (i.e. The
triples found in the NG)
• Objectives
• Avoid constraining users to a specific vocabulary for annotations
• Find a way to collect different annotations and provide something
new and interesting by aggregating them
Our approach
• Properties used inside annotations could be described as belonging
to families we already know how to deal with
• Examples: very specific (tags, ratings), more general (transitive
relations)
• Properties inside some external vocabulary are mapped as
subproperties of ours
• … by whom? High-experience users who have incentives to do this
(think about users building templates in Wikipedia...)
IDSIA, 01/06/2010 Davide Eynard
13. 13
Conversion tools
• Our worst enemy: the bootstrap
• who is going to annotate the first resources? I don't have time!
• Our best friends: already existing annotation systems
• why don't we convert existing data to our notation and show the
advantages of our approach?
Different families of conversion tools
• Easy: already existing APIs, with realtime search functionalities
(i.e. del.icio.us)
• Medium: conversions from existing structured repositories such as
SPARQL endpoints (advantage: the conversion is very clean, you
just need one tool and different CONSTRUCTs)
• A little harder: Web scraping when no other sources are available
IDSIA, 01/06/2010 Davide Eynard
14. 14
Annotation client
• Actually, two possible clients in our mind:
• a browser extension which shows annotations while users are
browsing the Web
• an independent service which is able to aggregate heterogeneous
information related to similar resources (i.e. URLs marked as being
MP3 files)
• Filter annotations according to author, date, originating system,
and community
• Users should be able to “subscribe” to some annotating
communities and ignore others
• System is thought as distributed, as data can come from different,
unrelated sources
IDSIA, 01/06/2010 Davide Eynard
15. 15
The prototype
• Early annotation ontology
• Property families: tag, rating, generically related URI
• Conversions from SMW, Delicious
• Visualization as a web service + Firefox extension
• No subscriptions yet
IDSIA, 01/06/2010 Davide Eynard
17. 17
The end
Thank you! Questions?
References:
• [0] D.Laniado, D.Eynard and M.Colombetti. Using WordNet to turn a folksonomy into a
hierarchy of concepts. Semantic Web Application and Perspectives 192–201, 2007.
• [1] D.Eynard and M.Colombetti. Exploiting User Gratification for Collaborative Semantic
Annotation. Proceedings of SWUI 2008. April 2008.
• [2] D.Eynard. Using semantics and user participation to customize personalization. HP Labs
Technical Report HPL-2008-197. September 2008.
• [3] L.Mazzola, D.Eynard and R.Mazza. GVIS: a framework for graphical mashups of
heterogeneous sources to support data interpretation. HSI 2010. May 2010.
IDSIA, 01/06/2010 Davide Eynard