Pundit is a semantic annotation tool developed by Semedia to enrich digital libraries with semantically structured annotations. It allows users to create annotations linked to controlled vocabularies and ontologies that can then be shared and accessed by other users and applications. Annotations are collected in notebooks that can be organized and shared privately or publicly.
Dh2012 enriching digital libraries contents with pundit system
1. ENRICHING DIGITAL LIBRARIES
CONTENTS WITH SEMLIB SEMANTIC
ANNOTATION SYSTEM
(PUNDIT!)
Michele Nucci, Marco Grassi, Christian Morbidoni and Francesco Piazza
Semedia (Semantic Web and Multimedia)
http://semedia.dii.univpm.it
DII - Department of Information Engineering. Polytechnic University of Le Marche, Ancona, Italy
Tuesday, July 24, 2012
2. DIGITAL EVOLUTION
• Most of the resources of interest for the
Humanities:
• in digital format (digitized or born digital)
• available on the Web
• Information is multiplying faster and faster:
• classification and management increasingly complex task
• well structured metadata a key requirement
• Semantic Web technologies in Digital Libraries
• Publish DL content as Linked Data
• define ontologies or vocabularies for metadata
encoding (Europeana Data Model, OAI-ORE…)
Enriching digital libraries contents with Pundit m.grassi@univpm.it
Tuesday, July 24, 2012
3. THE WEB SCENARIO
• Web (> 2.0) has become more social and interactive
• Annotation of Web content is beneficial:
• More engaging and productive user experience
• Exploit social engagement to improve resource ranking, classification
• Annotating web content has become a common task
• Comments and tags are widely supported by mainstream application
• Facebook pictures tags, Flickrs pictures comments, etc ...
• Many tools to bookmark, highlight, comment web page fragments
• E.g. sharedcopy.com, annotateit.org, diigo.com,
• Some tools support collaborative annotations
Enriching digital libraries contents with Pundit m.grassi@univpm.it
Tuesday, July 24, 2012
4. DL SCENARIO
• Digital Libraries (DL) are no longer simple “expositions” of digital objects but
provide users with more interaction Experts
Create Contents
Add Content Add Annotations
Experts
on
Digital Library
cti
Consume Commenting
Contents
ra
Tagging Linking
te
Create Contents Consume
Expert model Contents
rI n
Digital Library
se
Experts
U
Consume Commenting Users
Contents
Crowdsourcing
Tagging Linking
Consume
Contents
Create Contents
Digital Library
Users
Consume Contents
Social Engagement
Users
• Crowdsourcing experiments for enriching DL, curating contents or uploading digital
material of interest for the DL (BBC WW2 People’s War, …)
Enriching digital libraries contents with Pundit m.grassi@univpm.it
Tuesday, July 24, 2012
5. SEMANTICALLY STRUCTURED
ANNOTATIONS
• ... so what’s missing?
• Most of existing annotation tools are usually limited to simple textual tags and
comments.
• limitation due to the ambiguity of natural language (“orange” a fruit or a color?)
• their semantic is not machine interpretable
• Semantically structured annotations to make smart use of such added knowledge:
• Unambiguously express semantics to be processed by software agents (e.g. annotations can be
harvested and used by recommender systems, search engines, etc.)
• Power Digital Libraries (improving browsing, search, automatic content classification, ...)
• Reuse such a collaborative knowledge in different contexts and different applications
Enriching digital libraries contents with Pundit m.grassi@univpm.it
Tuesday, July 24, 2012
6. SEMANTICALLY STRUCTURED
ANNOTATIONS
Users to create knowledge graphs where web content
fragments, concepts and entities are meaningfully connected.
Enriching digital libraries contents with Pundit m.grassi@univpm.it
Tuesday, July 24, 2012
7. SEMANTICALLY STRUCTURED
ANNOTATIONS
• Rely on controlled vocabularies and ontologies
• share the same terminology and “talk about the same things”
• annotations can be meaningfully mashed-up
• Link to the emerging Web of Data
• a software can automatically get additional, useful semantic data (e.g. date and place of
birth, pictures, citations, multi-language data)
Augmenting the information of the
original annotation content to
support smarter application
Ex. We have discovered that the two
images contain american film actors
showing anger emotion!
Enriching digital libraries contents with Pundit m.grassi@univpm.it
Tuesday, July 24, 2012
8. • Pundit is a novel semantic annotation tool:
Semedia (Semantic Web and Multimedia)
http://semedia.dii.univpm.it
• developed by: with the collaboration of NET7
Semlib Project Eu Project
• funded by: http://semedia.dii.univpm.it
• supported and
further developed in: DM2E EU Project AGORA EU Project
http://dm2e.edu/ http://project-agora.eu/
Enriching digital libraries contents with Pundit m.grassi@univpm.it
Tuesday, July 24, 2012
9. SEMLIB PROJECT
Semlib Project
Semantic Web Tools for DL
http://www.semlibproject.eu/
• R&D project supported by EU FP7 Theme: Research for SMEs (no. FP7-SME -2010-01- 262301 -
SEMLIB)
• 24 months (commenced in January 2011, currently at month 19)
www.semedia.dii.univpm.it/ www.deri.ie/
www.in-two.com www.liberologico.com/ www.knowledgehives.com/ www.netseven.it/
Enriching digital libraries contents with Pundit m.grassi@univpm.it
Tuesday, July 24, 2012
10. ANNOTATION MODEL
• Based on Open Annotation Collaboration (OAC) ontology*
Contextual Information
Enriching digital libraries contents with Pundit m.grassi@univpm.it
Tuesday, July 24, 2012
11. ANNOTATION MODEL
• Based on Open Annotation Collaboration (OAC) ontology*
Contextual Information
Annotation Content
Enriching digital libraries contents with Pundit m.grassi@univpm.it
Tuesday, July 24, 2012
12. ANNOTATION MODEL
• Based on Open Annotation Collaboration (OAC) ontology*
Semantically Structured Content
Contextual Information
Annotation Content
Enriching digital libraries contents with Pundit m.grassi@univpm.it
Tuesday, July 24, 2012
13. ANNOTATION MODEL
• Based on Open Annotation Collaboration (OAC) ontology*
SPARQL support to query
slices of knowledge
Named Graph
Contextual Information
Annotation Content
Enriching digital libraries contents with Pundit m.grassi@univpm.it
Tuesday, July 24, 2012
14. NOTEBOOKS
• Annotations are collected in notebooks
2011-01-27 10:30:56 • Provide users with the capability to
dcterms:creator
organize their annotations
• users has a default notebook
My Example Notebook
dcterms:created
rdfs:label • can create more
An Example Notebook
used to show the model
rdfs:comment
• Put together annotations so that they
NotebookURI
can be retrieved and queried
• Different UNIX style read/write
privileges (from private to completely
public)
• Identified by a URI
Enriching digital libraries contents with Pundit m.grassi@univpm.it
Tuesday, July 24, 2012
15. NOTEBOOKS
• Notebooks allow annotations sharing
2011-01-27 10:30:56
dcterms:creator E SINGLE USER
R
HA
My Example Notebook
dcterms:created S
RI
kU
rdfs:label oo
teb
An Example Notebook No
used to show the model
WIKI
SHARE
rdfs:comment
NotebookURI
NotebookURI SH COMMUNITIES
AR
No E
te
bo
ok
U RI
PUBLIC
• Sharing a notebook is as easy as sharing its URL on the web (similarly to
popular file sharing platforms)
Enriching digital libraries contents with Pundit m.grassi@univpm.it
Tuesday, July 24, 2012
16. USER AUTHENTICATION
• Authentication is based on OpenID:
• No need to store user’s credentials
• Implemented already by mainstream company (Google, Yahoo, ...)
• Possibly avoid user multiple registration (waste of time, another password)
• Single identity can be used among different Pundit-enabled Digital Libraries
• Adding an OpenID provider is easy and transparent to the Pundit server.
Enriching digital libraries contents with Pundit m.grassi@univpm.it
Tuesday, July 24, 2012
17. ANNOTATION SHARING SCENARIO
Create structured
annotations Annotation
Client
Annotation
Client
Annotation
Client
structured annotations structured annotations
Annotation
Authoring API
Annotation Server
Annotation
Consuming API
Enriching digital libraries contents with Pundit m.grassi@univpm.it
Tuesday, July 24, 2012
18. ANNOTATION SHARING SCENARIO
Create structured
annotations Annotation
Client
Annotation
Client
Annotation
Client
structured annotations structured annotations
Store them into Annotation
a unique Authoring API
knowledge base Annotation Server
Annotation
COLLECTIVE KB Consuming API
Enriching digital libraries contents with Pundit m.grassi@univpm.it
Tuesday, July 24, 2012
19. ANNOTATION SHARING SCENARIO
Create structured
annotations Annotation
Client
Annotation
Client
Annotation
Client
structured annotations structured annotations
Store them into Annotation
a unique Authoring API
knowledge base Annotation Server
Annotation
COLLECTIVE KB Consuming API
Annotation Annotation Annotation
Client Client Client
...whose slices can be accessed not
only by their creator...
Enriching digital libraries contents with Pundit m.grassi@univpm.it
Tuesday, July 24, 2012
20. ANNOTATION SHARING SCENARIO
Create structured
annotations Annotation
Client
Annotation
Client
Annotation
Client
structured annotations structured annotations
Store them into Annotation
a unique Authoring API
knowledge base Annotation Server
Annotation
COLLECTIVE KB Consuming API
Third Party
Application
Annotation Annotation Annotation Annotation
Client Client Client Client
...whose slices can be accessed not ...but also by other users and
only by their creator... third party applications!
Enriching digital libraries contents with Pundit m.grassi@univpm.it
Tuesday, July 24, 2012
21. ANNOTATION SHARING SCENARIO
Create structured DL administrator can select
annotations Annotation
Client
Annotation
Client
Annotation
Client
annotations and publish back
as trusted annotations to
structured annotations structured annotations enrich DL content
Store them into Annotation
trusted/ufficial
a unique Authoring API
annotations
knowledge base
Annotation
Annotation Server Client
Annotation selected
COLLECTIVE KB Consuming API annotations
Third Party
Application
Annotation Annotation Annotation Annotation
Client Client Client Client
...whose slices can be accessed not ...but also by other users and
only by their creator... third party applications!
Enriching digital libraries contents with Pundit m.grassi@univpm.it
Tuesday, July 24, 2012
22. NAMED CONTENT
• DLs change over time
<div class="pundit-content" about="http://example.org/contents/123">
• Presentation can restyled and content can be <!-- HTML goes here. -->
re-organized <p>This is a named content and contains both text and a picture</p>
<img src="http://example.org/pictires/pictire123.png" />
• Same content in different pages <p><em>Caption:</em> this is a caption.</p>
</div>
• Some part of the page should not be
annotated (menu, ...)
• Specific markup can be added in the
pages to allows Pundit:
• identifying atomic pieces of content (by
means of URI)
• attaching the annotations to such
contents
• avoid the annotation of page accessory
component
Enriching digital libraries contents with Pundit m.grassi@univpm.it
Tuesday, July 24, 2012
23. NAMED CONTENT
• DLs change over time
<div class="pundit-content" about="http://example.org/contents/123">
• Presentation can restyled and content can be <!-- HTML goes here. -->
re-organized <p>This is a named content and contains both text and a picture</p>
<img src="http://example.org/pictires/pictire123.png" />
• Same content in different pages <p><em>Caption:</em> this is a caption.</p>
</div>
• Some part of the page should not be
annotated (menu, ...)
• Specific markup can be added in the
pages to allows Pundit:
• identifying atomic pieces of content (by
means of URI)
• attaching the annotations to such
contents
• avoid the annotation of page accessory
component
Enriching digital libraries contents with Pundit m.grassi@univpm.it
Tuesday, July 24, 2012
24. NAMED CONTENT
Text
The same content in different pages
shows the same annotations!
Enriching digital libraries contents with Pundit m.grassi@univpm.it
Tuesday, July 24, 2012
25. NAMED CONTENT
Text
The same content in different pages
shows the same annotations!
Enriching digital libraries contents with Pundit m.grassi@univpm.it
Tuesday, July 24, 2012
26. PUNDIT ARCHITECTURE
CLIENT
• Set of Javascript modules (Dojo Framework)
• Easily extendable
• Highly customizable
• Open Source RESTful Web Service (Java
Jersey framework)
• Cross origin request
• CORS (Cross-Origin Resource Sharing)
SERVER
• JSONP
• Sesame triple store
• SPARQL and inference
• Different sail are provided to implement different
storages (BigOWLIM, MySQL, PostgreeSQL,
Virtuoso ...)
• MySQL for user data
Enriching digital libraries contents with Pundit m.grassi@univpm.it
Tuesday, July 24, 2012
27. DIFFERENT ANNOTABLE CONTENTS
• Pundit allows the annotation of different types of
contents at different level of granularity
• Text fragments
• Images
• Image fragments (under development)
• Videos and video fragments (experimented in Semtube)
Enriching digital libraries contents with Pundit m.grassi@univpm.it
Tuesday, July 24, 2012
28. • Semantic annotation of YouTube videos (alpha state) based on Pundit JavaScript
libraries and annotation server
http://semedia.dii.univpm.it/semtube
Enriching digital libraries contents with Pundit m.grassi@univpm.it
Tuesday, July 24, 2012
29. DIFFERENT TYPES OF ANNOTATIONS
Annotation with different levels of expressivity and structure
Comment/Tag Panel
Enriching digital libraries contents with Pundit m.grassi@univpm.it
Tuesday, July 24, 2012
30. DIFFERENT TYPES OF ANNOTATIONS
Annotation with different levels of expressivity and structure
Comment/Tag Panel
Enriching digital libraries contents with Pundit m.grassi@univpm.it
Tuesday, July 24, 2012
31. DIFFERENT TYPES OF ANNOTATIONS
Annotation with different levels of expressivity and structure
• Textual comments Comment/Tag Panel
Enriching digital libraries contents with Pundit m.grassi@univpm.it
Tuesday, July 24, 2012
32. DIFFERENT TYPES OF ANNOTATIONS
Annotation with different levels of expressivity and structure
• Textual comments Comment/Tag Panel
• Semantic Tags
• Automatically extracted from textual
comments (Dbpedia Spotlight)
Enriching digital libraries contents with Pundit m.grassi@univpm.it
Tuesday, July 24, 2012
33. DIFFERENT TYPES OF ANNOTATIONS
Annotation with different levels of expressivity and structure
• Textual comments Comment/Tag Panel
• Semantic Tags
• Automatically extracted from textual
comments (Dbpedia Spotlight)
• Popular Linked Data service(Dbpedia,
Freebase, Wordnet, ..)
• Define your own (SPARQL endpoint)
Enriching digital libraries contents with Pundit m.grassi@univpm.it
Tuesday, July 24, 2012
34. DIFFERENT TYPES OF ANNOTATIONS
Annotation with different levels of expressivity and structure
Triple Composer
• Textual comments
• Semantic Tags
• Popular Linked Data service(Dbpedia,
Freebase, Wordnet, ..)
• Automatically extracted from textual
comments (Dbpedia Spotlight)
• Define your own (SPARQL endpoint)
• Semantic Relations
• Subject-Property-Object Statements
• Drag&Drop and suggestions
• Connect different resources (user
selection, linked data entities, ...) with
semantically defined properties
Enriching digital libraries contents with Pundit m.grassi@univpm.it
Tuesday, July 24, 2012
35. DIFFERENT TYPES OF ANNOTATIONS
Annotation with different levels of expressivity and structure
Triple Composer
• Textual comments
• Semantic Tags
• Popular Linked Data service(Dbpedia,
Freebase, Wordnet, ..)
• Automatically extracted from textual
comments (Dbpedia Spotlight)
• Define your own (SPARQL endpoint)
• Semantic Relations
• Subject-Property-Object Statements
• Drag&Drop and suggestions
• Connect different resources (user
selection, linked data entities, ...) with
semantically defined properties
Enriching digital libraries contents with Pundit m.grassi@univpm.it
Tuesday, July 24, 2012
36. DIFFERENT TYPES OF ANNOTATIONS
Annotation with different levels of expressivity and structure
Triple Composer
• Textual comments
• Semantic Tags
• Popular Linked Data service(Dbpedia,
Freebase, Wordnet, ..)
• Automatically extracted from textual
comments (Dbpedia Spotlight)
• Define your own (SPARQL endpoint)
• Semantic Relations
• Subject-Property-Object Statements
• Drag&Drop and suggestions
• Connect different resources (user
selection, linked data entities, ...) with
semantically defined properties
Enriching digital libraries contents with Pundit m.grassi@univpm.it
Tuesday, July 24, 2012
37. CUSTOM VOCABULARIES
• Pundit allows to use custom vocabularies/taxonomies (and
relations):
• Create a JSONp file (manually or automatically from an ontology )
• Put it online
• Add its URL to the configuration to import and use it
Enriching digital libraries contents with Pundit m.grassi@univpm.it
Tuesday, July 24, 2012
38. CROSS PAGE / DOMAIN ANNOTATIONS
• Special Bookmarklet allows to lunch Pundit on every Web page to perform annotations
• Selected resources (text fragments, images, ...) on different pages and domain can be
added to “My Items” to be stored on server and reused on different pages
Enriching digital libraries contents with Pundit m.grassi@univpm.it
Tuesday, July 24, 2012
39. CROSS PAGE / DOMAIN ANNOTATIONS
• Special Bookmarklet allows to lunch Pundit on every Web page to perform annotations
• Selected resources (text fragments, images, ...) on different pages and domain can be
added to “My Items” to be stored on server and reused on different pages
Use in another page
Add to My Items
cites
Create cross page semantic relations
Enriching digital libraries contents with Pundit m.grassi@univpm.it
Tuesday, July 24, 2012
40. DEMO TIME!
http://thepund.it
Enriching digital libraries contents with Pundit m.grassi@univpm.it
Tuesday, July 24, 2012
41. THANK YOU!
http://thepund.it
Semedia (Semantic Web and Multimedia)
http://semedia.dii.univpm.it
Semlib Project Eu Project
DM2E EU Project AGORA EU Project
http://www.semlibproject.eu/
http://dm2e.edu/ http://project-agora.eu/
Tuesday, July 24, 2012