ARK identifiers: lessons learnt at BnF: paths forward

•Télécharger en tant que PPTX, PDF•

0 j'aime•916 vues

John Kunze

ARK identifiers: lessons learned at BnF and potential changes to the ARK standard.

Technologie

ARK IDENTIFIERS
Lessons learnt at BnF
Paths forward

OUTLINE
1. Reminder what are ARKs
2. 8 years of implementing ARKs at BnF
3. Considerations about evolving the ARK standard

REMINDER: WHAT ARE ARKS
A maintaining institution A specification
A user registry A discussion list
http://groups.google.com/group/arks
-forum
http://www.cdlib.org/uc3/naan_registry.txt
http://tools.ietf.org/pdf/draft-kunze-ark-18.pdf

REMINDER, 2: ARK ANATOMY
http://www.flickr.com/photos/jenwaller/2207918246/
> >the resource
Name assigning
authority
number
(NAAN)
Name
the world
the naming
authority
ark:/12148/bpt6k103039f
Schem
e
delivery
service
http://gallica.bnf.fr/
>page> variant
Qualifiers
Name
mapping
authority
>
/f26.thumbnail
ASSIGN IDENTIFIERS
RESOLVE
IDENTIFIERS
RESOLVE
IDENTIFIERS
http://gallica.bnf.fr/ark:/12148/bpt6k103039f/f26.thumbnail

RISKS IN PRACTICE: WHAT OCCURRED?
Originally: ARKs for
- digitized items
- bibliographic records from the main catalogue
New applications New objects Existing apps,
- for new objects
- for existing
objects :
preservation
repository,
linked data
services
additional features
- full text OCR
- full text search
- audio rendering
Changing domain
names
- finding aids
- illuminations
- museographic
descriptions
- born digital
documents
http://www.flickr.com/photos/jenwaller/2207918246/
- virtual exhibitions
Changing technical
environment
Changing organization

WHAT CONCLUSIONS?
 Anything can happen in 8 years, especially
unforeseen cases
 The only feasible response is organizational:
 Documentation
 Commitment
 Internal advocacy
 Internal “ARK master” task force

LESSONS LEARNT
 DON’T reassign identifiers!
 Only reveal what is meaningful for the end user to
cite
 Be consistent
 Keep it simple
Stay in touch with ARK users
Document procedures
Address needs without overkill

3/ ARK EVOLUTIONS
ARK REDUX
SUGGESTED EVOLUTION
http://gallica/ark:/12148/btv1b8496236q

SEMANTIC WEB: NEW ARK QUALIFIERS?
 Semantic web best practices :
 Distinguish the document from the described object
<URI-web-page> <URI-person>
 One way to do it:
<URI-123456> <URI-123456#classifier>
 This is not compatible with the ARK spec :
 ARKs can only be followed by “/”, “.” or “?”
 Could change if nobody used “#” in ARK names
http://data.bnf.fr/ark:/12148/cb
11908252k
http://data.bnf.fr/ark:/12148/c
b11908252k#foaf:Person

ARK INFLECTIONS
 Inflections get A (an ARK) to different things
 A by itself (no inflection) means get the named thing
 A? means get the thing’s metadata
 A?? means get its commitment statement
 Possible landing page debate stopper?
 A/ means get the thing’s landing page (if any)
 A./ means get the preferred payload (if any)
mikebaird on flickr

WHAT KIND OF PERSISTENCE IS PROMISED?
 ARKs need metadata to express things like:
 Persistent and unchanging content (rare)
 Persistent but dynamic content (eg, NLM home page)
 Persistent but correctable (eg, most curated content)
 Persistent but growing (eg, streaming data, journal)
 And who are you to promise that?
 Your organizational mission
 Your private/public/non-profit status
 Any inspectable track record (eg, link uptime stats)

GOING FORWARD
 Discussion about evolving the ARK specification
 Sharing best practices and implementation
experiences
 Interested? Stay tuned on
http://groups.google.com/group/arks-forum

THANK YOU FOR YOUR ATTENTION!
BnF implementation
sebastien.peyrard AT bnf.fr
jean-philippe.tramoni AT bnf.fr
ARKs at CDL:
john.kunze AT ucop.edu

Contenu connexe

Similaire à ARK identifiers: lessons learnt at BnF: paths forward

Accessible Rich Internet Applications for the OUNick Freear

RDFa Introductory Course Session 2/4 How RDFaPlatypus

How RDFa worksJISC Netskills

URL DesignWalter Ebert

About Flink streaming용휘 김

ObservabilityDiego Pacheco

Building Rackspace Cloud Monitoringgdusbabek

Showcase: IngrossoBLTJose Sanchez Tejeda

Web services and JavaScriptChristian Heilmann

PIACERE project at EClipse Con 2023PIACERE

Hammock, a Good Place to RestStratoscale

JPA Week3 Entity Mapping / Hexagonal ArchitectureCovenant Ko

D Baker - Galaxy UpdateJan Aerts

What is Rack Hijacking APINomo Kiyoshi

Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...Provectus

Using docker to develop NAS applicationsTerry Chen

Introduction To Linked DataLeigh Dodds

RDFauthor (EKAW)Norman Heino

David container security-with_falcoLorenzo David

Semantic web and Drupal: an introductionKristof Van Tomme

Similaire à ARK identifiers: lessons learnt at BnF: paths forward (20)

Accessible Rich Internet Applications for the OU

RDFa Introductory Course Session 2/4 How RDFa

How RDFa works

URL Design

About Flink streaming

Observability

Building Rackspace Cloud Monitoring

Showcase: IngrossoBLT

Web services and JavaScript

PIACERE project at EClipse Con 2023

Hammock, a Good Place to Rest

JPA Week3 Entity Mapping / Hexagonal Architecture

D Baker - Galaxy Update

What is Rack Hijacking API

Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...

Using docker to develop NAS applications

Introduction To Linked Data

RDFauthor (EKAW)

David container security-with_falco

Semantic web and Drupal: an introduction

Plus de John Kunze

The YAMZ MetadictionaryJohn Kunze

YAMZ Metadata Vocabulary BuilderJohn Kunze

The ARK Alliance: 20 years, 850 institutions, 8.2 billion persistent identifi...John Kunze

EZID and N2T at CDLJohn Kunze

YAMZ.net: better, faster, cheaper taxonomy buildingJohn Kunze

A Vocabulary for PersistenceJohn Kunze

Identifiers obey Resolvers not SchemesJohn Kunze

Names, Things, and Open Identifier Infrastructure: N2T and ARKsJohn Kunze

YAMZ: a cross-domain crowd-sourced metadata vocabularyJohn Kunze

DataONE Preservation and Metadata Working Group Report 2014John Kunze

Selected Bash shell tricks from Camp CDL breakout groupJohn Kunze

Annotating Research DatasetsJohn Kunze

The Data Management EcosystemJohn Kunze

Library Tools Supporting Data-Rich ResearchJohn Kunze

Big Data's Long TailJohn Kunze

Pamwg 2012ahmJohn Kunze

Scalable Identifiers for Natural History CollectionsJohn Kunze

Future-Proofing the Web: What We Can Do TodayJohn Kunze

Supporting Data-Rich Research on Many FrontsJohn Kunze

New Metaphors: Data Papers and Data CitationsJohn Kunze

Plus de John Kunze (20)

The YAMZ Metadictionary

YAMZ Metadata Vocabulary Builder

The ARK Alliance: 20 years, 850 institutions, 8.2 billion persistent identifi...

EZID and N2T at CDL

YAMZ.net: better, faster, cheaper taxonomy building

A Vocabulary for Persistence

Identifiers obey Resolvers not Schemes

Names, Things, and Open Identifier Infrastructure: N2T and ARKs

YAMZ: a cross-domain crowd-sourced metadata vocabulary

DataONE Preservation and Metadata Working Group Report 2014

Selected Bash shell tricks from Camp CDL breakout group

Annotating Research Datasets

The Data Management Ecosystem

Library Tools Supporting Data-Rich Research

Big Data's Long Tail

Pamwg 2012ahm

Scalable Identifiers for Natural History Collections

Future-Proofing the Web: What We Can Do Today

Supporting Data-Rich Research on Many Fronts

New Metaphors: Data Papers and Data Citations

Dernier

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

GenCyber Cyber Security Day PresentationMichael W. Hawkins

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

Developing An App To Navigate The Roads of BrazilV3cube

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer

A Domino Admins Adventures (Engage 2024)Gabriella Davis

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

🐬 The future of MySQL is Postgres 🐘RTylerCroy

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

Histor y of HAM Radio presentation slidevu2urc

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

Dernier (20)

Handwritten Text Recognition for manuscripts and early printed texts

[2024]Digital Global Overview Report 2024 Meltwater.pdf

Data Cloud, More than a CDP by Matt Robison

GenCyber Cyber Security Day Presentation

Automating Google Workspace (GWS) & more with Apps Script

Developing An App To Navigate The Roads of Brazil

Boost PC performance: How more available memory can improve productivity

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

Tata AIG General Insurance Company - Insurer Innovation Award 2024

A Domino Admins Adventures (Engage 2024)

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...

Presentation on how to chat with PDF using ChatGPT code interpreter

Driving Behavioral Change for Information Management through Data-Driven Gree...

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

🐬 The future of MySQL is Postgres 🐘

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service

Injustice - Developers Among Us (SciFiDevCon 2024)

Histor y of HAM Radio presentation slide

CNv6 Instructor Chapter 6 Quality of Service

The 7 Things I Know About Cyber Security After 25 Years | April 2024

ARK identifiers: lessons learnt at BnF: paths forward

1. ARK IDENTIFIERS Lessons learnt at BnF Paths forward

2. OUTLINE 1. Reminder what are ARKs 2. 8 years of implementing ARKs at BnF 3. Considerations about evolving the ARK standard

3. REMINDER: WHAT ARE ARKS A maintaining institution A specification A user registry A discussion list http://groups.google.com/group/arks -forum http://www.cdlib.org/uc3/naan_registry.txt http://tools.ietf.org/pdf/draft-kunze-ark-18.pdf

4. REMINDER, 2: ARK ANATOMY http://www.flickr.com/photos/jenwaller/2207918246/ > >the resource Name assigning authority number (NAAN) Name the world the naming authority ark:/12148/bpt6k103039f Schem e delivery service http://gallica.bnf.fr/ >page> variant Qualifiers Name mapping authority > /f26.thumbnail ASSIGN IDENTIFIERS RESOLVE IDENTIFIERS RESOLVE IDENTIFIERS http://gallica.bnf.fr/ark:/12148/bpt6k103039f/f26.thumbnail

5. 8 YEARS LATER Lessons learnt at BnF

6. RISKS IN PRACTICE: WHAT OCCURRED? Originally: ARKs for - digitized items - bibliographic records from the main catalogue New applications New objects Existing apps, - for new objects - for existing objects : preservation repository, linked data services additional features - full text OCR - full text search - audio rendering Changing domain names - finding aids - illuminations - museographic descriptions - born digital documents http://www.flickr.com/photos/jenwaller/2207918246/ - virtual exhibitions Changing technical environment Changing organization

7. WHAT CONCLUSIONS?  Anything can happen in 8 years, especially unforeseen cases  The only feasible response is organizational:  Documentation  Commitment  Internal advocacy  Internal “ARK master” task force

8. LESSONS LEARNT  DON’T reassign identifiers!  Only reveal what is meaningful for the end user to cite  Be consistent  Keep it simple Stay in touch with ARK users Document procedures Address needs without overkill

9. 3/ ARK EVOLUTIONS ARK REDUX SUGGESTED EVOLUTION http://gallica/ark:/12148/btv1b8496236q

10. SEMANTIC WEB: NEW ARK QUALIFIERS?  Semantic web best practices :  Distinguish the document from the described object <URI-web-page> <URI-person>  One way to do it: <URI-123456> <URI-123456#classifier>  This is not compatible with the ARK spec :  ARKs can only be followed by “/”, “.” or “?”  Could change if nobody used “#” in ARK names http://data.bnf.fr/ark:/12148/cb 11908252k http://data.bnf.fr/ark:/12148/c b11908252k#foaf:Person

11. ARK INFLECTIONS  Inflections get A (an ARK) to different things  A by itself (no inflection) means get the named thing  A? means get the thing’s metadata  A?? means get its commitment statement  Possible landing page debate stopper?  A/ means get the thing’s landing page (if any)  A./ means get the preferred payload (if any) mikebaird on flickr

12. WHAT KIND OF PERSISTENCE IS PROMISED?  ARKs need metadata to express things like:  Persistent and unchanging content (rare)  Persistent but dynamic content (eg, NLM home page)  Persistent but correctable (eg, most curated content)  Persistent but growing (eg, streaming data, journal)  And who are you to promise that?  Your organizational mission  Your private/public/non-profit status  Any inspectable track record (eg, link uptime stats)

13. GOING FORWARD  Discussion about evolving the ARK specification  Sharing best practices and implementation experiences  Interested? Stay tuned on http://groups.google.com/group/arks-forum

14. THANK YOU FOR YOUR ATTENTION! BnF implementation sebastien.peyrard AT bnf.fr jean-philippe.tramoni AT bnf.fr ARKs at CDL: john.kunze AT ucop.edu

Notes de l'éditeur

Back in 2006 we lay the foundations: we chose ARK as our persistent identifier scheme. Now we have around 20 million ARK identifiers assigned. What changed other time? In an ideal world, « nothing » because it is persistent. Actually almost everything changed. Which is the point of this short feedback section which is kind of stating the obvious: persistence is not something we have to see as eternity. Eternity is paralysing. We need to find an efficient time-span where the real questions occur. Looking back at 8 years of implementation and moving forward is a way to do this.
The risk is increased complexity: New objects: the risk is to multiply identifier assignment procedures New applications: as responsible for ARKs, you have an increasingly growing number of apps to watch over as they evolve to make sure there is no regression in resolving ARKs. Existing objects, new applications: e.g. our ARK for catalogue data are displayed by our main MARC catalogue AND by our linked data service, data.bnf.fr, but you need to be clear that you are talking about the same thing. I will elaborate upon this in the second part of the presentation. Existing apps, additional features: the people that are maintaining and evolving the apps kind of own the apps. They find ARKs work pretty well, but they tended to define their own qualifiers each time there is a new service -> qualifier proliferation. Good news in a sense: ARKs became business as usual. It works! So what? Changing technical environment: among other things, a hugely increasing flow of incoming requests on our resolver, and an increasing number of application: needed to make some modularized evolution of the resolver architecture. Changing organization: 8 years ago, 7 expert-team, from 2 departments. Now: only one person from the original team remains, and one of the departments no longer exists! Now the audience is less « pioneer », less technical. ARKs became business as usual (curators that use ARK for citations, web application managers, linked data experts, …
Build an organization around our persistent identifiers implementation that is responsive to changes or appearing risks which meansAdapt the documentation and communication to non-experts, so that people can understand the key requirements and what is at stakes! -> this must be solid but lightweight at the same time: 2 people, from librarian side and IT side, that are internal consultants on ARKs. Set up an internal communication task force with around 40 different people using ARKs at several level to explain the basics and so that the 2 “ARK masters” are identified.
Another thing is it is much easier to start with what you should NOT do rather than what you should do. So we start with the don’ts when we communicate with people, to set clear limits (apart from that, everything is open to discussion and negotiation) Qualifiers: any technical parameters, like the search keywords for digitized books, tend to be stuffed into the URI as ARK qualifiers. Say them we do not need that, because the end users do not want to cite the searched words, they want to cite the page, or the digitized book.
The previous was about practice. What is following is considerations, either grounded in BnF use cases or initiated by CDL, that we believe could translate into useful evolutions of the standard.
There is a debate in the research data citation community about requiring the default behavior for persistent ids to take you to a landing page. On the one hand, a landing page can give you all the context you need to find out more about the dataset, such as newer versions. On the other hand, landing pages are not machine actionable, so you cannot link persistently to, say, an inline image or a CSV file. Requiring only one behavior or the other would be a hard choice. The default behavior for ARKs is not specified, but one way out is to permit a user or a provider to construct or publish ARKs with an indicator of what to expect. With a random ARK found in the wild, the user cannot know in advance whether a landing page exists, but the user might still request a landing page experience if there is one. Similarly the user could request the canonical (provider preferred) “immersive” experience of the object. The provider would be free to ignore the “./” and “/” inflections (as if they hadn’t been supplied) or to support them.
Persistence isn’t either/or, on/off – it’s nuanced. The ARK spec describes different kinds of persistence but has no metadata vocabulary for providers to express it to users. Unchanging content is rare – political pressure usually trumps preservation except where legal requirements hold. Some institutions, for example, are required to hold certain content unchanged for a period of time, and to delete if after that. Dynamic content is common – probably every national library in the world will claim that their home page is persistent and persistently thematically relevant, but all of those home pages are dynamic. Most deliberately curated content is correctable – with responsibility comes political pressure to make sure it’s right, non-impinging, safe, respectable, etc. Lots of debate about datasets that are appended to every 6 seconds, appearing to be highly dynamic and therefore unsuitable for citing, would suddenly stop if we could assure people that any content, once written, will be persistent, perhaps even unchanging, but that new content is likely or possible to show up at the end of the dataset. Anyone can promise anything. What is the nature of your organization?

ARK identifiers: lessons learnt at BnF: paths forward

Recommandé

Recommandé

Contenu connexe

Similaire à ARK identifiers: lessons learnt at BnF: paths forward

Similaire à ARK identifiers: lessons learnt at BnF: paths forward (20)

Plus de John Kunze

Plus de John Kunze (20)

Dernier

Dernier (20)

ARK identifiers: lessons learnt at BnF: paths forward

Notes de l'éditeur