This document describes a system for creating digital journal clubs for emergency response. The system harvests and semantically maps bibliographic metadata from various sources to expose focused collections. It augments the metadata with information on author relationships, georeferences and concepts. Tools enable exploration of collections through visualizations and maps. Social features allow users to tag, comment and collaborate, stored as semantic triples to enable interoperability. The system aims to provide responders with timely access to vetted information and collaboration tools to help address emergency situations.
Using Architectures for Semantic Interoperability to Create Journal Clubs for Emergency Response
1. LA-UR-09-02970
Approved for public release;
distribution is unlimited.
Title: Using Architectures for Semantic Interoperability to Create
Journal Clubs for Emergency Response
Author(s): James E. Powell
Linn Marks Collins
Mark L. B. Martinez
Intended for: 6th International Conference on Information Systems for
Crisis Response and Management
Special Session on Solutions for Information Overload
May 10-13, 2009
Göteborg, Sweden
Los Alamos National Laboratory, an affirmative action/equal opportunity employer, is operated by the Los Alamos National Security, LLC
for the National Nuclear Security Administration of the U.S. Department of Energy under contract DE-AC52-06NA25396. By acceptance
of this article, the publisher recognizes that the U.S. Government retains a nonexclusive, royalty-free license to publish or reproduce the
published form of this contribution, or to allow others to do so, for U.S. Government purposes. Los Alamos National Laboratory requests
that the publisher identify this article as work performed under the auspices of the U.S. Department of Energy. Los Alamos National
Laboratory strongly supports academic freedom and a researcher’s right to publish; as an institution, however, the Laboratory does not
endorse the viewpoint of a publication or guarantee its technical correctness.
Form 836 (7/06)
2. Using Architectures for
Semantic Interoperability to
Create Journal Clubs for
Emergency Response
James E. Powell, Linn Marks Collins, and Mark L.B. Martinez
{jepowell, linn, mlbm}@lanl.gov
Knowledge Systems and Human Factors Team
Research Library, Los Alamos National Laboratory
Slide 1
3. Bibliographic Metadata to Social,
Semantic Collection
Metadata In
Social Semantic Repository Out
UNCLASSIFIED Slide 2
4. SARS event
(Severe Acute Respiratory Syndrome)
First cases in 2002 in Asia
Spread to 29 countries
8,000+ cases, nearly 800 deaths
Significant economic toll
Chaotic and secretive response
Took months to identify cause
Science played a critical role
But essentially, we're still here because we got lucky.
UNCLASSIFIED Slide 3
5. How we (re-)defined journal club
Traditional journal club
Regular face-to-face meetings
•
Reader summaries, supervised discussion
•
Short reading list
•
In support of research, clinical, academic activities
•
Our version: social, topical collections
Metadata for a few hundred to a few thousand selected papers (mini-digital libraries)
•
Various mechanisms to explore collection
•
Social tools for calling participants attention to particular papers
•
Interoperability with other collaboration frameworks
•
UNCLASSIFIED Slide 4
6. Digital Library - “thought in cold
storage”*
What is a digital library?
Collection of content – e-journal articles, e-books, etc.
•
Metadata to expose that content – the “archive”
•
Persistent identifiers
•
Search engine(s) to explore metadata
•
Citation counts enhance the utility of metadata
•
Metadata export tools
•
Some Examples:
Name Domain Size
ACM Digital Library Computer Science >54,000
PubMed Life sciences and biomedicine >17,000,000
Web of Science Broad scientific coverage >38,000,000
Scopus Broad scientific coverage >33,000,000
arXiv Physical sciences >44,000
Oppie Broad scientific coverage >94,000,000
NSDL Broad scientific coverage >2,000,000
UNCLASSIFIED Slide 5
7. Research in Progress at the
Los Alamos National Laboratory,
Los Alamos, NM, US
Making digital library content relevant in emergencies
Is deeper knowledge of value in this situation?
•
What sources would be useful?
•
Develop strategy for creating a focused, or targeted, collection
— Topic specific harvest or
— results of a search against a larger digital library
• Match focused collections with people – (expert user or librarian(s) +
technology)
Normalize, import and provide access to the data
• Map the data to RDF expose using tools for exploring moderately
large highly relevant collections, and
• Make the content social
UNCLASSIFIED Slide 6
8. Components of the System
Perform query or harvest content
Topic harvester automatically maps content to RDF/XML
Augments content with georeference and foaf data
•
RDF repository hosts the semantic content
Middleware –
REST web services which expose standards compliant output in response to queries
•
against RDF repository
SPARQL query templates at the core
•
Depending on service, output is GraphML, KML, GDF, etc.
•
Rendering layer –
Combines output with a viewing app (e.g. google maps view, graph applet view)
•
RDF Digital Library –
Search and social linking services – ties it all together
•
UNCLASSIFIED Slide 7
9. From query to RDF repository
Bibliographic metadata search service that returns results as XML
Metadata mapped to RDF/XML representations (using Dublin Core, FOAF
ontologies)
Each unique author assigned a UUID
Social network of co-authors, author matching via UUID, FOAF
Georeference augmentation via placenames found in MARC bibliographic
fields – geonames.org lookup
Option to perform additional augmentation by submitted abstracts to
OpenCalais
UNCLASSIFIED Slide 8
10. Harvest to RDF
OAI-PMH – Open Archives Initiative – Protocol for Metadata Harvesting
Data provider shares metadata
•
Service provider harvests metadata, aggregates, and exposes it
•
OAI-PMH enables retrieval of bibliographic metadata from remote repositories
•
Protocol allows for limited set of verbs for exploring a target repository,
•
including: ListMetadataFormats, ListSets, ListRecords
http://www.oaforum.org/tutorial/
•
Then, map and augment as with search results
Map metadata to triples
•
Represent authorship networks
•
Augment with georeference information
•
Store content in RDF repository
•
UNCLASSIFIED Slide 9
11. Embracing Lossy
80% solution needs to be okay
Need quick turnaround
•
Accept that metadata mapping is not an exact science
•
Handle mismatched or omitted elements gracefully
•
Cope with metadata lacking sufficient granularity or type information
•
Downgrade metadata when necessary, e.g. MARC to Dublin Core
•
Normalize and extract maximum value from available data, both explicit and implicit
•
Mapping Metadata
Original element RDF property name
identifier dc:identifier
title dc:title
creator, contributor dc:creator
foaf:name
date dc:date
subject dc:subject
geo:placename
description dc:description
UNCLASSIFIED Slide 10
12. Augmentation
Make explicit what is implicit: social networks
Assign a UUID to each author
•
Associate authors and co-authors using FOAF
•
Match author names across publications, within query results/harvested set
•
Enhance what's there
Find placenames in metadata
•
Ask geonames service for coordinates
•
Leverage knowledge extraction services
OpenCalais can identify placenames in full text
•
Also identifies, exposes other concepts and RDF links
•
UNCLASSIFIED Slide 11
13. Pre- and post-augmentation results
for sample queries
situation: (quot;situation awarenessquot;)
infoviz: (quot;information visualizationquot;)
humanitarian: (quot;humanitarian assistancequot; OR quot;disaster reliefquot;)
sars: (“sars” and “coronavirus”)
Q uery O riginal Post-processing
situation # records 6652 4609
situation # georefs 5 445
infoviz # records 6807 4636
infoviz # georefs 7 194
hum anitarian # records 1626 1312
hum anitarian # georefs 86 455
sars # records 4674 2937
sars # georefs 369
UNCLASSIFIED Slide 12
15. Core ontologies used in mapped data
Dublin Core (xmlns:dc=quot;http://purl.org/dc/elements/1.1/quot;)
Metadata about publications
•
<object> dc:title “SARS: How a Global Epidemic was Stopped”.
—
<object> dc:identifier “92-9061-213-4”.
—
FOAF (xmlns:foaf=quot;http://xmlns.com/foaf/0.1/quot;)
Properties describing people
•
<object> foaf:name “James Powell”.
—
uuid foaf:knows uuid.
—
Geonames (xmlns:geo=quot;http://www.geonames.org/ontology#”)
Properties describing places
•
<object> geo:name “Ohio, United States”.
—
“Ohio, United States” geo:lat “40.5”.
—
“Ohio, United States” geo:long “-82.5”.
—
UNCLASSIFIED Slide 14
16. Social (Author) Awareness Tool
Query layer
/srdf?q=Visualization&repo=cr&format=graphml
Output could be used with any
•
visualization tool that supports
GraphML markup
Rendering layer
/sat?q=Visualization&repo=cr
Uses GUESS Java visualization Applet
•
Combines Java applet with output from
•
above REST web service
SPARQL query
SELECT ?title ?name ?selfuuid ?knowsuuid
WHERE
{ ?y <http://purl.org/dc/elements/1.1/#title>
?title.
...
UNCLASSIFIED Slide 15
17. Geographic Awareness Tool
Query layer
/grdf?query=avian+flu&repo=influenza
&format=kml
KML output could be used with
•
Google Earth
Rendering layer
/gat?query=avian+flu&repo=influenza
Combines Java applet with output
•
from above REST web service
uses Google Maps API
•
SPARQL query
SELECT ?title ?name ?selfuuid ?knowsuuid WHERE { ?y
<http://purl.org/dc/elements/1.1/#title> ?title.
?y <http://purl.org/dc/elements/1.1/#creator>
?selfuuid. ?selfuuid
<http://xmlns.com/foaf/0.1/#name> ?name.
...
UNCLASSIFIED Slide 16
19. Enabling Journal Club style
interactions
Practitioners reviewing and discussing a set of published journal articles
Intersects nicely with social linking capabilities
Tagging – user provided keywords related to a paper
•
Rating – a user's personal ranking of the paper on a simple numeric scale
•
Comment - a user's thoughts about a particular paper
•
Technical Issues
Unique identifier ensures social data is connected to appropriate object
•
User identity, and overlap with authorship
•
Users may be interacting all over the place – how can we be interoperable with other
•
collaboration spaces?
UNCLASSIFIED Slide 18
20. Use SIOC
SIOC stands for Semantically Interlinked Online Communities
• enables harvesting and aggregation of social linking data
• from disparate social networking sites
We use SIOC properties to describe users of our social linking tools, as well as
some aspects of the content they input when they annotate a particular record.
We also use SIOC types module's Comment property
A Java servlet converts social linking form input into triples, and uses the
openrdf API to insert the triples into the triplestore
Social linking – totally native RDF, no intermediary –
Immediately accessible to aggregation services
UNCLASSIFIED Slide 19
21. A User Comment in SIOC
<sioct:Comment>
<sioc:about
rdf:resource=quot;oai:casi.ntrs.nasa.gov:
20000081109quot;/>
<dc:title>Pele</dc:title>
<dcterms:created>2009.04.08 03:41:26
MDT</dcterms:created>
<sioc:has_container
rdf:resource=quot;uuid:1234-1234-1234quot;/>
<sioc:has_creator>
<sioc:User
rdf:about=quot;http://rdfsearch/queryRdf/
search.jsp#jpowellquot;>
<rdfs:seeAlso
rdf:resource=quot;http://rdfsearch/queryR
df/showComments?userid=jpowellquot;/>
</sioc:User>
</sioc:has_creator>
<sioc:content>Does Pele know about Io
(or for that matter,
Jupiter?)</sioc:content>
UNCLASSIFIED Slide 20
22. Next up, user tags
Users will be able to tag content
Tags will receive similar treatment as Comments,
stored as triples,
•
using various ontologies including Dublin Core, FOAF, SIOC
•
applicable ontologies include SCOT (Social Semantic Cloud of Tags), and/or
•
MOAT (Meaning Of A Tag)
Tags will link to DBPedia concepts
RDF Linking will greatly enhance the utility of the tags,
• The tag “infectious disease” would be associated with
http://dbpedia.org/page/Infectious_disease
• Reuse of “infectious disease” will point to other library content
Our tags could be aggregated with tags from other collaboration spaces
UNCLASSIFIED Slide 21
23. Summary
Disease outbreaks can be disastrous
but diseases often emerge slowly, sometimes crossing species barriers
•
appear in one region, then move with populations
•
are caused by a pathogen that may eventually be identified
•
may be treatable, corralled by quarantine, curtailed by hygiene, halted by vaccines
•
Science can prevent or reduce casualties from, and, if we're lucky,
reduce or eliminate the risk of future outbreaks of disease
but only if scientists have access to topical, vetted, high quality information
•
and tools to explore and share this knowledge
•
Our semantic digital library solution has normalized collections of enhanced metadata,
organized by topic (search), or source (harvest) with
• Middleware layers that enable exploration of the data
• Rendering layers that make results usable (e.g. via visualization tools, overlay
results onto map)
• Social linking capabilities for collaboration and interoperability
UNCLASSIFIED Slide 22
25. Awareness tools for E-SOS
Our content exploration tools started out as a suite of Just In Time Information
Retrieval Tools (JITIR)
• leverage user's context
• proactively perform tasks in support of user's main activity
• manage user's attention appropriately
“Context” is content authoring – e.g. blog post, forum entry, email message
Input text is “intercepted” and parsed for tokens that may be used to automatically
construct queries
Query is stop word deleted – boolean linked significant terms, or results of an
intermediate call to a term extraction service (Calais, Yahoo)
Query is submitted to query middleware tool
Results are returned when available, using appropriate rendering layer
UNCLASSIFIED Slide 24