Riley, Jenn. “Designing the Garden: Getting Grounded in Linked Data.” Beyond the Looking Glass: Real World Linked Data. What Does it Take to Make it Work? ALCTS Preconference, San Francisco, CA, June 26, 2015.
Student Profile Sample - We help schools to connect the data they have, with ...
Tending the Metadata Garden: Our Role in a Linked Data Ecosystem
1. Designing the Garden: Getting
Grounded in Linked Data
#rwlod
Jenn Riley (@jenlrile)
Associate Dean, Digital Initiatives
McGill University
2. "Full fathom five thy father lies,
Of his bones are coral made,
Those are pearls that were his eyes,
Nothing of him that doth fade,
But doth suffer a sea-change,
into something rich and strange,
Sea-nymphs hourly ring his knell,
Ding-dong.
Hark! now I hear them, ding-dong, bell.”
--Shakespeare, The Tempest
#rwlod Beyond the Looking Glass, ALCTS Preconference 2015
3. #rwlod Beyond the Looking Glass, ALCTS Preconference 2015
Photo by Liam Moloney, https://flic.kr/p/7Qux27, CC BY-SA
4. 1. It’s not “our” data and “their”
data – it’s one big graph.
#rwlod Beyond the Looking Glass, ALCTS Preconference 2015
5. It’s about the connections
https://linkedjazz.org/network/
6. Graph-based world views
#rwlod Beyond the Looking Glass, ALCTS Preconference 2015
http://ebiquity.umbc.edu/blogger/2015/06/06/querying-rdf-data-with-text-annotated-graphs/
7. Connecting things together
#rwlod Beyond the Looking Glass, ALCTS Preconference 2015
dbpedia:Willia
m_Shakespeare
1615-04-23
1564-04-26
dbpedia:Str
atford-
upon-Avon
dbpedia-owl:
birthDate
worldcat-
work:10745266
81
fast:1069678
# Political
Refugees
“Tragicomedy’
“Shakespeare’s
Comedy of the
Tempest”
schema:
genre
viaf:9699404
8
# William
Shakespeare
foaf:person
rdf:
type
schema:
CreativeWork
rdf:
type
8. Creating so much data!
Where the scope of “library data” ends
What it is and isn’t “our job” to do
Making “complete” descriptions ourselves
Mapping data from “other” vocabularies into “library”
vocabularies
This means we can stop worrying about…
#rwlod Beyond the Looking Glass, ALCTS Preconference 2015
9. Learning other metadata cultures
Being Linked Data ecosystem good citizens
How the technology and the data can most effectively work
together
Making connections between things
Understanding other vocabularies and communities
And start worrying about…
#rwlod Beyond the Looking Glass, ALCTS Preconference 2015
10. 2. We can expect more
intelligence in the system.
#rwlod Beyond the Looking Glass, ALCTS Preconference 2015
12. #rwlod Beyond the Looking Glass, ALCTS Preconference 2015
Guido Reni (1575-1642)
Hercules Killing the Hydra of Lerna
c. 1620-1621
oil on canvas
commissioned along with other
scenes from the mythology of
Hercules in 1617 by Ferdinando
Gonzaga, for a room in the Villa
Favorita in Mantua, Italy inv. 535
Musée du Louvre, Paris, France
13. Mine usage data to enhance relevance and utility in discovery
Start from the most relevant information and provide easy
means for quick expansion on demand
Coherently display conflicting information
Give indications of provenance of information
What systems must do for users
#rwlod Beyond the Looking Glass, ALCTS Preconference 2015
14. If we’re serious about
information literacy, we have to
give our users tools and then
trust them.
#rwlod Beyond the Looking Glass, ALCTS Preconference 2015
15. Flag dead ends for review and action
Normalize most string-based data
Highlight potentially conflicting information
Hide complexity (URIs, etc)
Mine and show candidate connections for review
What systems must do for metadata maintainers
#rwlod Beyond the Looking Glass, ALCTS Preconference 2015
16. Choosing one authoritative “correct” assertion in the face of
conflicting data
Whether or not a given source meets a certain standard for
authority
Authorized headings, access points
And textual justifications for them
A large proportion of the data cleanup tasks we used to do
For example, the formatting of strings
But we can expect a new set of these to emerge!
This means we can stop worrying about…
#rwlod Beyond the Looking Glass, ALCTS Preconference 2015
17. How to enhance system algorithms
How users best interact with complicated information
Methods for automated metadata creation and cleanup
Getting large amounts of new data into the system
And start worrying about…
#rwlod Beyond the Looking Glass, ALCTS Preconference 2015
18. 3. The information age has provided
a new definition of “authority.”
#rwlod Beyond the Looking Glass, ALCTS Preconference 2015
19. Current narrative:
Libraries create good
metadata!
Because we’re trained to do so
We’re consistent and follow
rules
That’s what makes good data
Other people create bad data
Because it’s not consistent
Using a well built record
structure is a key part of good
metadata
But…
We don’t read the books we
catalogue
We don’t typically have expertise in
the subjects of the works we
describe
Sometimes we don’t even read or
speak the language those works use
Our perspective is pretty different
from our users’
Number of things to describe is
quickly expanding and our budgets
are shrinking
Let’s take a good, hard look at our “authoritative” data
#rwlod Beyond the Looking Glass, ALCTS Preconference 2015
20. Remember, we’re looking at more intelligent systems in the LD
world
That can deal with inconsistency, masking it or cleaning it up
The LD graph allows us to not worry about metadata structures
So consistency and rules are no longer the primary drivers of
good metadata
Which means we can turn large swaths of the creation of
metadata over to domain experts
Wait, it’s not about consistency and rules?
#rwlod Beyond the Looking Glass, ALCTS Preconference 2015
22. If we’re serious about good
metadata, we need to start from
expert information.
#rwlod Beyond the Looking Glass, ALCTS Preconference 2015
23. We need this guy
#rwlod Beyond the Looking Glass, ALCTS Preconference 2015
http://www.betterlivingthroughbeowulf.com/scholars-lose-themselves-in-their-research/
24. And we need these folks too!
#rwlod Beyond the Looking Glass, ALCTS Preconference 2015
Photo by veggiesosage, https://flic.kr/p/5WjAsK, CC-BY-NC-ND
25. Really, it’s going to be OK
https://xkcd.com/386/
#rwlod Beyond the Looking Glass, ALCTS Preconference 2015
26. The Linked Data community cares about provenance
#rwlod Beyond the Looking Glass, ALCTS Preconference 2015
http://www.w3.org/standards/techs/provenance#w3c_all
27. Deep research on materials for which there’s already a
knowledgeable community
Descriptions being “complete”
Making sure everything is “right”
This means we can stop worrying about…
#rwlod Beyond the Looking Glass, ALCTS Preconference 2015
28. Mining the data that’s already out there
Promoting voices of those who engage with content
How usable systems can be built to generate Linked Data from
activities real people already partake in
Seeding basic information for the rare and unique materials we
hold that have never been released
Connecting data from communities operating in different
languages
And start worrying about…
#rwlod Beyond the Looking Glass, ALCTS Preconference 2015
29. 4. Our job is to tend the garden.
#rwlod Beyond the Looking Glass, ALCTS Preconference 2015
30. Metadata is an ecosystem
Photo by Temari 09, https://flic.kr/p/6UskT1, CC-BY-NC
31. The garden needs tending
Photo by Center for International Forestry Research, https://flic.kr/p/dbx1Gt,CC-BY-NC-ND
32. A new model – making connections
https://thenounproject.com/term/connection/25392/
33. “Original cataloguing” and “copy cataloguing”
Getting data into “our systems”
Doing all the work that needs to be done the first time we think
about a specific item
This means we can stop worrying about…
#rwlod Beyond the Looking Glass, ALCTS Preconference 2015
34. Understanding the Linked Data environment
Locating large and useful datasets
Understanding vocabularies developed elsewhere
Finding good people that can analyze relationships and make
new connections
And start worrying about…
#rwlod Beyond the Looking Glass, ALCTS Preconference 2015
35. The data and systems are all in the cloud
Library-based discovery less important but likely still around for a
while
Several ways systems can navigate the graph
Crawling
Dereferencing
Query federation
(See http://linkeddatabook.com/editions/1.0/#htoc84)
So how will Linked Data systems work?
36. This is all the way it should work
It’s going to be a while before we get there
Big effort needed to start connecting up these data sets
Data sets and tools will get better as we start using the data in
this way and demand more
The library community can help to shape this evolution, but only
if we fully understand and engage with the assumptions and
mechanisms in play in the Linked Data community
Reality check
#rwlod Beyond the Looking Glass, ALCTS Preconference 2015
37. This will be hard. But that won’t stop us.
We need to redefine our baseline.
We need to rethink what new models mean for us.
And, most importantly, we need to put the right people in
positions to work through these issues and get the details
settled.
A sea-change? Most definitely.
#rwlod Beyond the Looking Glass, ALCTS Preconference 2015
We got this.
Spirit sings this song to Ferdinand (Prince of Naples) when he believes his father Alonso is drowned.
To me, this means:
A recognized shift to something new
Guidance through the transition process
Comfort w change
Respect for the previous iteration (grounded)
This is what we’re facing with LD
We’ve learned a ton over the last 50 years (and more!)
Now what we’ve learned is turning into something else
We need comfort and support through this process
We can mourn, but we must move forward
Note there’s an ecosystem note here too which I’ll cycle back to
Another interesting parallel – the MARC is dead meme.
I’ll note that Alonso (the father) is not actually dead at this point in the story
Analogies only go so far – don’t let that derail the need to manage change!
We are in the midst of a sea change
Our community is redefining itself
It needs both technologists and metadata specialists
And people who can get past the HOW to the WHY and WHAT and redesign new HOWs that fit internet-age information models
I’m going to lay out 4 different fundamental changes that we’re facing in how library metadata operates
And speculate a bit on what each means for us
The value in LD is bringing together data from different sources
LD model inherently doesn’t care where the data comes from
And not just one type of data
People, books, journal articles, relationships, events, facts
This is what the metadata universe is starting to look like
Not just bibliographic information but everything anyone would want to know
Why shouldn’t the library catalogue be an encyclopedia?
We learn about triples but LD is really about the graph
The big shift = we’re not describing resources, we’re helping all the right connections be made
Discovery tools as entry points into the graph
Facilitate exploration from known points
A search as a way to get suggestions for useful entry points
Which causes some rethinking about the role of library-based discovery interfaces
Why can’t our holdings just be hooked to information navigation/discovery tools?
We can’t replicate google
But we also can’t assume google is the only game in town
So the strategy must be:
Expose holdings (data we uniquely have)
Participate in description but don’t own it
This is how this works in practice
Lots of data sources
The graph is built when connections are made between them
Note we’re not being really pedantic about work/manifestation in this example, or the genre as an entity rather than a string; still a lot of modeling to do
More on this at the end of the presentation
These are astonishing numbers
Need maintenance/caretaking
But also a different way of thinking
Can’t manually manage
Have to let the ecosystem (technology) keep things going, only intervene at key points
Be pragmatic, can’t lovingly curate as we did
This volume helps us get over Not Invented Here
If we were to try to manage this volume of data using our current mindsets and approaches, we’d be like Hercules facing the Hydra
Note here again the analogy only goes so far – Hercules did eventually defeat the Hydra. But we’re not Hercules, and we want information to thrive, not die.
Theme: we don’t decide beforehand, we provide data and systems that let people decide
There really is a wisdom in the crowd
We can not predict what a user is going to care about
The long tail is as powerful as the most important stuff
Kathleen Fitzpatrick –Director of SC, MLA and english/media studies Ph.D. and scholar, author of Planned Obsolescence: Publishing, Technology, and the Future of the Academy
-need filters, not gatekeepers
With this much data it’s not possible for libraries to pay people to serve a gatekeeping function
Or to only let in what we review first
Therefore we need systems to do that for us
And we don’t have to build them ourselves, this is an internet-wide problem
And by them, I mean both the users and the tools.
The “we know best” attitude has to go.
For the metadata maintenance we will do (more on this in a minute)
Volume of data makes this necessary
Data normalization – turn strings into URIs
Lots of system intelligence suggesting the things that do need human expertise
Don’t have to interact directly with the data model for all things
No:
typing/verifying uri’s
sorting out namespaces
worrying about RDF syntax
Data cleanup not needed
- Date formats
Note large amounts of new data requires automated means, not manual metadata creation
This is all the metadata the BPL could provide for this image
Train people know better.
We MUST leverage them
And by them, I mean both the users and the tools.
The “we know best” attitude has to go.
We can tap into scholars, established researchers
But think about the bitter fights they get into
How authoritative are they, really?
Go to the crowd
The enthusiasts
The hobbyists
They’re already producing data
And can be incentivized to produce more
Just look at ebay
And online forums
Flickr
reddit
We need to locate the info they already produce
And tap into the tools they use
Metadata creation and maintenance as gardening
Lots of variety
Needs upkeep
But never perfect
Things are introduced, grow, and die
Gardener as an overseer not a creator
Sun, soil, rain all occur out of our control – the system does a lot to maintain itself
We just poke at it from time to time
Metadata creation when none already exists
Take seeds from elsewhere (do basic research, create new data – think author research, not necessarily subject analysis)
Give them some TLC and get them into the ground
They grow, change, interact with the environment and give rise to new things
This is the most important part
You’re creating hybrids
Merging things to make new things that others will continue to build on
Not “is this good enough” but “how can this be most useful?”
You’re enhancing the graph
Eg sameAs for both classes and properties
Find and connect good sources of topical/analytical data
Find good vocabularies and integrate them into library managed and external vocabularies
And flagging/processing/marking importance of data for use in library run discovery
Poke a bit to make things more machine readable (Eg make a template to parse some data)
Cataloguers as part of LD world, not just library world
Remember – it’s not about describing resources, it’s about making connections
Some will say we should provide evaluative information on sources
But there are too many for us to do so
Distinction between local and external data minimized or gone
No local copies of records that we edit directly
To the degree libraries run discovery systems will need design and effective use of data
But we can’t write all this intelligence ourselves
Will utilize software written elsewhere
Patterns:
Crawling -> local index (all automated – doesn’t get out of sync easily!)
Dereferencing – on the fly go grab a URI to learn about it; can be slow
Query federation – complex queries to predetermined data sources
This is going to get better
Saying ‘you should do it our way’ isn’t going to work
Look at the oclc data – start at worldcat
See the current state, but also see the possibilities