This presentation discusses how a model of “data sharing as publishing” can contribute to developing Linked Open Data resources in archaeology and the study of the ancient world. The paper gives examples from Open Context’s developing approach to data editing, documentation and quality improvement processes. The goal of these efforts is to better align the professional interests of individual researchers with the needs of the larger community to access and use high-quality data in Linked Data scenarios.
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentation
1. Publishing to the “Web of Data”
in Archaeology:
Quality and Workflows
Eric Kansa
UC Berkeley / OpenContext.org
Unless otherwise indicated, this work is licensed under a Creative Commons
Attribution 3.0 License <http://creativecommons.org/licenses/by/3.0/>
2. Web of Data (2011)
Main Contributors:
●
Institutions (esp. government)
●
Thematic collections / projects
3. Thousand Flowers
●
Open access, open licensed
data
●
Archiving by California Digital
Library
●
Persistent Identifiers (DOIs,
ARKs)
●
Web services
●
NSF/NEH links for data
management plans
4. Thousand Flowers
Fills a Gap:
Most data sources are institutional.
Open Context publishes individual,
small group contributions
5. Thousand Flowers
Fills a Gap:
Most data sources are institutional. Challenge:
Open Context publishes individual, Diverse
small group contributions contributions,
needing lots of
work to clean-
up and “link”
6. •
3-year project Oct 2010 – Sep 2013
•
Funded with a National Leadership Grant from the
Institute for Museum and Library Services, LG-06-
10-0140-10, “Dissemination Information Packages
for Information Reuse”
•
Ixchel Faniel, PI & Elizabeth Yakel, Co-PI
http://www.dipir.org
7. Open Context Interviewees
•
22 Ph.D. or graduate students
interviewed
–
13 men
–
9 women
•
Novices / Experts
–
19 experts
–
3 novices
•
Interviewees who where
curators or professors also
with a curatorial role = 6
9. Data Documentation Practices
I use an Excel spreadsheet…which I … inherited from my research
advisers. …my dissertation advisor was still recording data for each
specimen on paper when I was in graduate school so that's what I
started …then quickly, I was like, "This is ridiculous.“… I just started
using an Excel spreadsheet that has sort of slowly gotten bigger and
bigger over time with more variables or columns…I've added …color
coding…I also use…a very sort of primitive numerical coding system,
again, that I inherited from my research advisers…So, this little book
that goes with me of codes which is sort of odd, but …we all know
that a 14 is a sheep.” (CCU13)
10. Data Documentation Practices
I use an Excel spreadsheet…which I … inherited from my research
advisers. …my dissertation advisor was still recording data for each
specimen on paper when I was in graduate school so that's what I
started …then quickly, I was like, "This is ridiculous.“… I just started
using an Excel spreadsheet that has sort of slowly gotten bigger and
bigger over time with more variables or columns…I've added …color
coding…I also use…a very sort of primitive numerical coding system,
again, that I inherited from my research advisers…So, this little book
that goes with me of codes which is sort of odd, but …we all know
that a 14 is a sheep.” (CCU13)
A long way to go before we
get Linked Data
12. Thousand Flowers
●
Clean-up and document
contributed data
●
Map to ArchaeoML
●
Mint URIs to entities
(potsherds, projects, contexts,
people)
●
Link to important vocabularies /
collections (Pleiades,
Encyclopedia of Life)
●
Working on CLAROS-based
CIDOC-CRM (RDF)
representations (not
straightforward)
13. My Precious Data
Image Credit: “Lord of the Rings” (2003, New
Line), All Rights Reserved Copyright
16. Publishing
Data Quality and Standards
Alignment
(1) Check consistency
(2) Edit functions
(3) Align to common standards
(“Linked Data” if applicable)
(4) Issue tracking, version
control
17. Publishing
Tools of the Trade
(1) Google Refine (check, edit,
consistancy)
(2) Mantis (issue-tracker,
coordinate edits, metadata
creation)
19. Publishing
Entity Reconciliation
(1) With Google Refine
(2) Implemented, EOL and
Pleiades
(3) Need more vocabularies!
(4) Simple model, not complex
ontology mapping
20. ●
CDL Archiving Service
●
How do DOIs, ARK's, etc. work
with Web and Linked Data?
●
Question of granularity and
emphasis(archive “objects”)
21. Summary
Outcomes of Publishing Data:
(1) Communicate and set
expectations about content and
quality
(2) Organize workflows to improve
data quality and usability
(3) Make “datasets” first class citizens
in world of scholarly
communications
22. Final Thoughts
Publication needs to evolve!
(1) Participating in Linked Data is
a great goal, but far removed
from most everyday practice
(2) Researchers need help.
(3) 19th century publication norms
poorly suited to 21st century
methods, research, public
goals