About the Webinar
The library and cultural institution communities have generally accepted the vision of moving to a Linked Data environment that will align and integrate their resources with those of the greater Semantic Web. But moving from vision to implementation is not easy or well-understood. A number of institutions have begun the needed infrastructure and tools development with pilot projects to provide structured data in support of discovery and navigation services for their collections and resources.
Join NISO for this webinar where speakers will highlight actual Linked Data projects within their institutions—from envisioning the model to implementation and lessons learned—and present their thoughts on how linked data benefits research, scholarly communications, and publishing.
Speakers:
Jon Voss - Strategic Partnerships Director, We Are What We Do
LODLAM + Historypin: A Collaborative Global Community
Matt Miller - Front End Developer, NYPL Labs at the New York Public Library
The Linked Jazz Project: Revealing the Relationships of the Jazz Community
Cory Lampert - Head, Digital Collections , UNLV University Libraries
Silvia Southwick - Digital Collections Metadata Librarian, UNLV University Libraries
Linked Data Demystified: The UNLV Linked Data Project
NISO Webinar: Library Linked Data: From Vision to Reality
1. http://www.niso.org/news/events/2013/webinars/linked_data
NISO Webinar:
Library Linked Data:
From Vision to Reality
December 11, 2013
Speakers:
Jon Voss - Strategic Partnerships Director, We Are What We Do
Matt Miller - Front End Developer, NYPL Labs at the New York Public Library
Silvia Southwick - Digital Collections Metadata Librarian, UNLV University
Libraries
Cory Lampert - Head, Digital Collections , UNLV University Libraries
81. Project Overview
• Investigating the application of Linked Open Data
to enhance the discovery and visibility of digital
cultural heritage materials.
• Build new methods of connecting cultural data.
• Uncover meaningful connections between
documents and data related to the personal and
professional lives of musicians who often practice
in rich and diverse social networks.
Professor Cristina Pattuelli at the Pratt Institute School of Library Information Science is the
director of the project which began in 2011.
82. Linked Data Now!
Why?
• Bootstrap your project with existing data.
• Highlights knowledge you have created and
knowledge that is missing.
• Facilitates sharing, but also growing your own
project.
83. Bootstrapping – Identifying
Research Question
How can we discover and analyze
the rich and diverse network of
relationships between jazz
musicians?
Primary Sources
Oral history interview transcripts
of jazz musicians.
84. Bootstrapping – Identifying
Research Question
How can we discover and analyze
the rich and diverse network of
relationships between jazz
musicians?
Primary Sources
Oral history interview transcripts
of jazz musicians.
We need to know the names
(and variants) of jazz
musicians in a structured
controlled vocabulary.
85. Bootstrapping – Identifying
Charlie Parker
Many different LOD datasets contain this
information. We need to access, query and link it
for only jazz related individuals.
87. Bootstrapping – Querying
• Processing the DBpedia dataset resulted in around
9,000 URIs.
– DBpedia is fluid! After each release (currently 3.9) we
reprocess the files resulting in the addition of 500-700
URIs.
• We now have a name directory, but we want
additional forms of personal names. To accomplish
this we try mapping to Library of Congress.
• Matching DBpedia and LC URIs is not automatic.
88. Bootstrapping – Mapping
• We matched identities based on:
• Name
• Life Dates
• White listed words found in sources
(http://www.loc.gov/mads/rdf/v1#Source)
• Reconciling authorities is difficult!
• Use others work: http://viaf.org/viaf/data/
• But don’t discount your own processes.
• Using our relatively simple process we
were able to match about 1500 more URIs
than VIAF.org.
• This is due to a smaller domain (jazz).
Our name directory creation and authority
matching is documented:
https://github.com/thisismattmiller/linkedjazz-name-directory
90. Bootstrapping – Review
• Start small, think big.
– Specific subject domain.
– Large infrastructure not required (triple stores, etc.)
• Can get started with extract files and python scripting.
• Reuse as much as possible, but try new processes
leveraging domain specificity.
• Always be curating, use tools to facilitate process but
a human hand is often required.
91. Applying the Data
• Use the name directory to locate individuals in
the interview transcript.
• This project phase involves 50 transcripts.
• Because the names are tied to URIs we can
infer a relationship triple between two
individuals.
<foaf:Person> <rel:knowsOf> <foaf:Person>
94. Transcript Analyzer
• An interface to curate the transcripts and verify
detected names.
• Implements off the shelf NLP (NLTK) to attempt
to locate additional names not in our directory as
well as corporate names and locations.
• Global rule system, as we process more
transcripts the system is being trained.
• Using URIs to represent entities we can quickly
see where we are discovering new material.
– 50 Transcripts
• 1800 person entities tagged.
• 250 names tagged without authoritative URI.
– Knowledge Creation
95. New Dataset
• We have created a new LOD dataset now of
jazz musician’s relationships.
• Our next steps are:
– Visualize.
– Further qualify the rel:knowsOf relationships.
– Provide access to the data created.
97. Qualify Relationships – 52nd St.
• Recruit jazz experts and enthusiasts to help
categorize relationships based on transcript
text.
• We use existing vocabularies to build the data
set: Foaf, Relationship Vocabulary, Music
Ontology
• The interface is critical for crowdsourcing tools,
we work with user experience experts and
conduct user studies to refine our public facing
tools
99. Provide Access
• We provide a SPARQL endpoint.
• But also a traditional API:
– http://linkedjazz.org/api/
– Can return:
• JSON
• N-Triples
• Gephi graph files (GXEF)
100. Learn and Grow as a Team
• Experience through doing.
• Empower graduate
students with skills and
practical experience
working with a LOD
project.
• Use the project as a
vehicle to make intra- and
inter-intuitional
collaborations.
Linked Jazz Team July 2013
101. Next Steps
• Refactor our prototype tools into sustainable open
source projects.
• Redesign 52nd St. based on user study groups.
• Work on emerging collaborations with Jazz Centers.
103. Linked, Exposed Data: UNLV
Linked Data Project
NISO Webinar: Library Linked Data: From Vision to
Reality
December 11, 2013
Silvia B. Southwick
Digital Collections Metadata Librarian
UNLV Libraries
Cory K. Lampert
Head, Digital Collections
UNLV Libraries
105. How it Started
•
•
•
•
Conferences and “buzz”
Curiousity and professional development
Exploration and pilot project
Compelling results; sharing impact of what
we’ve learned
• Assessment
• Much more to do...
106. Current Practice
• Data (or metadata) encapsulated in records
• Records contained in collections
• Very few links are created within and/or across
collections
• Links have to be manually created
• Existing links do not specify the nature of the
relationships among records
This structure hides potential links within and
across collections
107. What we can do with linked data
•
•
•
•
•
•
Free data from silos
Expose relationships
Powerful, seamless, interlinking of our data
Users interact or query data in new ways
Search results would be more precise
Data can be easily repurposed
108. Making the Case for Linked Data in
Academic Library Digital Collections
– Problem: Rich metadata is being lost in dumbed down
DC records
– Issue: Investment and resource allocation (Item-level
philosophy)
– Goal: Increased: exposure, collaboration, and
openness
– Outcome: Increased discovery and user-focus
109. Gaining Buy In
Administration
• Innovative project, high impact
• Pilot, experiment, learn by doing, share results
Staff
• We already have the metadata; We need to
transform them into triples
• Managing change
113. Implications (Internal)
• Cross-unit collaboration is necessary
• Staff expertise will evolve
• Staff roles will change to accommodate new /
parallel workflow
• Data clean-up will be an investment
• Management of data becomes critical
• Discovery issues = user interfaces still need
development
114. Implications (External)
• Publish data from our collections in the Linked
Data Cloud to improve discoverability and
connections with other related data sets on
the Web
• Sharing data in new ways with new partners
may raise new issues
• Need to engage with linked data community
for technologies, tools, best practices, and to
demand library vendor support for LOD.
115. UNLV Linked Data Project
Goals:
• Study the feasibility of developing a common
process that would allow the conversion of our
collection records into linked data preserving
their original expressivity and richness
• Publish data from our collections in the Linked
Data Cloud to improve discoverability and
connections with other related data sets on the
Web
117. Actions
Prepare data
Export data
Import data
Clean data
Reconcile
Generate
triples
Export RDF
Import data
Publish
Technologies
CONTENTdm
Open Refine
Mulgara /
Virtuoso
118. Prepare / Export Data
Technology: CONTENTdm
• Increase consistency across collections:
– metadata element labels
– use of CV, share local CVs
– etc.
• Export data as spreadsheet
Create mapping between metadata elements and
EDM model predicates
119. OpenRefine
• Open source
• It is a server – can communicate with other
datasets via http
• Open Refine and its RDF extension should be
installed
Screenshots to show some of the functions we have
used
138. Actions
Prepare data
Export data
Import data
Clean data
Reconcile
Generate
triples
Export RDF
Import data
Publish
Query
Technologies
CONTENTdm
Open Refine
Mulgara /
Virtuoso
148. Next steps for the UNLV project
• Transform all digital collections into linked data
(parallel structure)
• Increase linkage with other datasets
• Design interfaces to access and display our data
and related data from other datasets
• Evaluate alternative interfaces from user’s
perspective
• Produce a cost benefit analysis to inform future
plans for the development of digital collections
150. NISO Webinar:
Library Linked Data: From Vision to Reality
Questions?
All questions will be posted with presenter answers on
the NISO website following the webinar:
http://www.niso.org/news/events/2013/webinars/linked_data
NISO Webinar • December 11, 2013
151. THANK YOU
Thank you for joining us today.
Please take a moment to fill out the brief online survey.
We look forward to hearing from you!
Notes de l'éditeur
Introduce SelfNYPL RoleLinked Jazz Role as a developer, a practical look at how we used linked data in our project and why
Guiding principle of the project is to develop practical and applicable ways to use linked data.Not some far off ideal, there are practical applications of it right now.
The General layout of a project, research questions and primary documents