Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Chargement dans…3
×

Consultez-les par la suite

1 sur 35 Publicité

Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014

Télécharger pour lire hors ligne

Implementation of persistent and globally unique identifiers for specimens held in natural history collections worldwide will open up new opportunities for referring to these physical resources in an interlinked digital context such as the Internet. Here, we will describe the approach for persistent identification of collection specimens developed and implemented at the Natural History Museum in Oslo (NHM-UiO) by the the Norwegian participant node to the Global Biodiversity Information Facility (GBIF-Norway). The Norwegian university museums are invited to use our resolver service at "http://purl.org/gbifnorway/id/<uuid>" when publishing biodiversity data to GBIF. All occurrence records published through GBIF-Norway, with appropriate PURL-UUID identifiers mapped to the Darwin Core occurrenceID, will automatically be added to our resolver service and kept updated.

Implementation of persistent and globally unique identifiers for specimens held in natural history collections worldwide will open up new opportunities for referring to these physical resources in an interlinked digital context such as the Internet. Here, we will describe the approach for persistent identification of collection specimens developed and implemented at the Natural History Museum in Oslo (NHM-UiO) by the the Norwegian participant node to the Global Biodiversity Information Facility (GBIF-Norway). The Norwegian university museums are invited to use our resolver service at "http://purl.org/gbifnorway/id/<uuid>" when publishing biodiversity data to GBIF. All occurrence records published through GBIF-Norway, with appropriate PURL-UUID identifiers mapped to the Darwin Core occurrenceID, will automatically be added to our resolver service and kept updated.

Publicité
Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Similaire à Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014 (20)

Publicité

Plus par Dag Endresen (20)

Plus récents (20)

Publicité

Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014

  1. 1. Persistent Iden+fiers, Herbarium-­‐workshop at Kongsvold Fjellstue, September 1-­‐4, 2014. Dag Endresen, NHM-­‐UiO, GBIF-­‐Norway
  2. 2. The purpose of iden.fiers …is to name things, making it possible to refer to them. 2
  3. 3. Name ambiguity: George Many things are named George 3
  4. 4. What is an iden.fier: “Each iden3fier refers to one and only one thing” (Coyle 2006). “An associa(on between a string and a thing” (Kunze 2003). “A stated associa(on between a symbol and a thing; that the symbol may be used to unambiguously refer to the thing within a given context” (Campbell 2007). 4
  5. 5. 5
  6. 6. When is the iden.fier “good enough”? Unique and persistent -­‐ within a given context. “The common experience is that an iden3fier is created within a system or within a context, and that at a later date it needs to be used in another or larger context” (Coyle 2006). Expanding context: • Within one museum collec+on (catalog number). • Within a network between museum collec+ons (collec+on code + catalogue number). • Within biodiversity informa.on network (ins+tu+on code + collec+on/dataset code + catalogue number). • At the Internet (e.g. hbp URI, DOI, LSID, etc…) • … larger contexts are possible to imagine in the future!! 6
  7. 7. Expanding context 7
  8. 8. Iden+fy the thing that you care about • The specimen itself (the physical en+ty) • Image of the specimen • Descrip+on of the specimen • Loca+on where the specimen was captured • The occurrence event when the specimen was captured • … 8
  9. 9. Record-­‐level Terms dcterms:type | dcterms:modified | dcterms:language | dcterms:rights | dcterms:rightsHolder | dcterms:accessRights | dcterms:bibliographicCita+on | dcterms:references | ins.tu.onID | collec.onID | datasetID | ins.tu.onCode | collec.onCode | datasetName | ownerIns+tu+onCode | basisOfRecord | informa+onWithheld | dataGeneraliza+ons | dynamicProper+es Occurrence occurrenceID | catalogNumber | occurrenceRemarks | recordNumber | recordedBy | individualID | individualCount | sex | lifeStage | reproduc+veCondi+on | behavior | establishmentMeans | occurrenceStatus | prepara+ons | disposi+on | otherCatalogNumbers | previousIden+fica+ons | associatedMedia | associatedReferences | associatedOccurrences | associatedSequences | associatedTaxa MaterialSample materialSampleID Event eventID | samplingProtocol | samplingEffort | eventDate | eventTime | startDayOfYear | endDayOfYear | year | month | day | verba+mEventDate | habitat | fieldNumber | fieldNotes | eventRemarks dcterms:Loca.on loca.onID | higherGeographyID | higherGeography | con+nent | waterBody | islandGroup | island | country | countryCode | stateProvince | county | municipality | locality | verba+mLocality | verba+mEleva+on | minimumEleva+onInMeters | maximumEleva+onInMeters | verba+mDepth | minimumDepthInMeters | maximumDepthInMeters | minimumDistanceAboveSurfaceInMeters | maximumDistanceAboveSurfaceInMeters | loca+onAccordingTo | loca+onRemarks | verba+mCoordinates | verba+mLa+tude | verba+mLongitude | verba+mCoordinateSystem | verba+mSRS | decimalLa+tude | decimalLongitude | geode+cDatum | coordinateUncertaintyInMeters | coordinatePrecision | pointRadiusSpa+alFit | footprintWKT | footprintSRS | footprintSpa+alFit | georeferencedBy | georeferencedDate | georeferenceProtocol | georeferenceSources | georeferenceVerifica+onStatus | georeferenceRemarks GeologicalContext geologicalContextID | earliestEonOrLowestEonothem | latestEonOrHighestEonothem | earliestEraOrLowestErathem | latestEraOrHighestErathem | earliestPeriodOrLowestSystem | latestPeriodOrHighestSystem | earliestEpochOrLowestSeries | latestEpochOrHighestSeries | earliestAgeOrLowestStage | latestAgeOrHighestStage | lowestBiostra+graphicZone | highestBiostra+graphicZone | lithostra+graphicTerms | group | forma+on | member | bed Iden.fica.on iden.fica.onID | iden+fiedBy | dateIden+fied | iden+fica+onReferences | iden+fica+onVerifica+onStatus | iden+fica+onRemarks | iden+fica+onQualifier | typeStatus Taxon taxonID | scien.ficNameID | acceptedNameUsageID | parentNameUsageID | originalNameUsageID | nameAccordingToID | namePublishedInID | taxonConceptID | scien+ficName | acceptedNameUsage | parentNameUsage | originalNameUsage | nameAccordingTo | namePublishedIn | namePublishedInYear | higherClassifica+on | kingdom | phylum | class | order | family | genus | subgenus | specificEpithet | infraspecificEpithet | taxonRank | verba+mTaxonRank | scien+ficNameAuthorship | vernacularName | nomenclaturalCode | taxonomicStatus | nomenclaturalStatus | taxonRemarks ResourceRela.onship (Auxiliary Terms) resourceRela.onshipID | resourceID | relatedResourceID | rela+onshipOfResource | rela+onshipAccordingTo | rela+onshipEstablishedDate | rela+onshipRemarks MeasurementOrFact (Auxiliary Terms) measurementID | measurementType | measurementValue | measurementAccuracy | measurementUnit | measurementDeterminedDate | measurementDeterminedBy | measurementMethod | measurementRemarks 9
  10. 10. Term name: occurrenceID Iden+fier: hbp://rs.tdwg.org/dwc/terms/occurrenceID Class: hbp://rs.tdwg.org/dwc/terms/Occurrence Defini+on: An iden+fier for the Occurrence (as opposed to a par+cular digital record of the occurrence). In the absence of a persistent global unique iden.fier, construct one from a combina+on of iden+fiers in the record that will most closely make the occurrenceID globally unique. Comment: For a specimen in the absence of a bona fide global unique iden+fier, for example, use the form: "urn:catalog:[ins.tu.onCode]: [collec.onCode]:[catalogNumber]". Examples: "urn:lsid:nhm.ku.edu:Herps:32", "urn:catalog:FMNH:Mammal:145732". For discussion see hbp://code.google.com/p/darwincore/wiki/ Occurrence 10
  11. 11. Iden.fiers for museum collec.ons The longevity of museums lead to: “The need to use iden(fiers from our past in the current highly-­‐ networked digital systems” (Coyle 2006 [talking about libraries]). Specify a namespace for the iden+fiers? • URI – uniform resource iden+fier (unique in the context of the web). • URN – uniform resource name (name not +ed to loca+on). • URL – uniform resource locator (network loca+on as iden+fier). • PURL – persistent URL (commitment to service longevity). Something else…? • DOI – digital object iden+fier • ARK – archival resource key • UUID – universal unique iden+fier 11
  12. 12. • Persistent Iden+fier (PID) • Globally Unique Iden+fier (GUID) • Universal Resource Iden+fier (URI) • Persistent Uniform Resource Locator (PURL) • Life Science Iden+fier (LSID) • Digital Object Iden+fier (DOI) • Handle system (Handle) • Archival Resource Key (ARK, EZID) • Universally Unique Iden+fier (UUID) • … 12
  13. 13. Reuse exis(ng iden(fiers PURL Photo: Smithsonian Na+onal Museum of Natural History, USNM-­‐445024-­‐Eutoxeres-­‐aquila 13
  14. 14. hbp://purl.org/nhmuio/id/41d9cbb4-­‐4590-­‐4265-­‐8079-­‐ca44d46d27c3 Illustra+on by Miroslav Šašek (1963) Reuse iden(fiers 14
  15. 15. • Globally unique • Scalability, number of IDs • Community acceptance • Long-­‐term life-­‐cycle • Resolvable, resolu+on service(s) • Cost per iden+fier • People-­‐friendly or machine-­‐friendly • Solu+on for the genera+on of new IDs – Central genera+on, PID issuer – Distributed genera.on at source 15
  16. 16. • A UUID is a 16-­‐octet (128-­‐bit) 36-­‐chars number. • Example: C37E3F9B-­‐BCAF-­‐4479-­‐8EB7-­‐3346A2DB2373 • The probability of one duplicate would be about 50% if every person on earth create 600 million UUIDs. • Allows for easy genera.on at source in a distributed network. 16
  17. 17. Iden+fier Resolver Specimen Loca+on The resolver is a system to resolve loca+ons from iden+fiers, enabling retrieval even when the loca+on changes. 17
  18. 18. PURL technology provides a robust resolu+on service ready for the future -­‐ and a stable solu+on that is working well right now. PURL for the NHM-­‐resolver: hbp://purl.org/nhmuio/id/[PID] The NHM-­‐PURL redirects here: hbp://gbif.no/resolver/[PID] Could with few modifica+ons redirect e.g. here: hCp://gbif.org/resolver/[PID] 18
  19. 19. hbp – PURL – UUID hbp://purl.org/nhmuio/id/41d9cbb4-­‐4590-­‐4265-­‐8079-­‐ca44d46d27c3 19
  20. 20. hbp://purl.org/nhmuio/id/UUID à hbp://gbif.no/resolver/UUID hbp://purl.org/gbifnorway/id/UUID à hbp://gbif.no/resolver/UUID 20
  21. 21. Including machine readable formats 21
  22. 22. Catalog number: O-­‐L-­‐000014 hbp://purl.org/nhmuio/id/41d9cbb4-­‐4590-­‐4265-­‐8079-­‐ca44d46d27c3 22
  23. 23. Machine readable labels 23
  24. 24. • Quick Response Code (QR code). • A type of matrix barcode (or two-­‐ dimensional code). • Popular due to its fast readability and large storage capacity. • The use of QR Codes is free of any license. • The QR Code is clearly defined and published as an ISO standard. • Invented in Japan by the Toyota subsidiary Denso Wave in 1994. 24
  25. 25. hbp://purl.org/nhmuio/id/d91e8253-­‐0ac1-­‐4681-­‐ac69-­‐e50070af86a2 25
  26. 26. UUID QR codes for museum objects at NHM-­‐UiO provides: • Machine-­‐readable iden.fiers (using a simple smart phone -­‐ or a barcode reader) • Allows for new and efficient workflows for collec+on management. • Deployment for stable iden.fiers appropriate for data-­‐basing. 26
  27. 27. Efficient workflow rou+nes 27
  28. 28. hbp://gbif.no/dugnad/ 28
  29. 29. • Peer review op+on for biodiversity data sets. • Authors get scien+fic credit for data publica+on. • Mee+ng concerns over data quality. • Mee+ng concerns over data cita.on mechanism. • Towards à Each data set published through GBIF accompanied by a data paper…? 29
  30. 30. Why publish your data • Citable publica+on • Establish scien+fic priority • Increase collabora+on • Link data to bigger network • Re-­‐use and mul+ply effect • Respond to funding requirements hbp://biodiversitydatajournal.com/ Smith V, Georgiev T, Stoev P, Biserkov J, Miller J, Livermore L, Baker E, Mietchen D, Couvreur T, Mueller G, Dikow T, Helgen K, Frank J, Agos+ D, Roberts D, Penev L (2013) Beyond dead trees: integra+ng the scien+fic process in the Biodiversity Data Journal. Biodiversity Data Journal 1: e995. DOI: 10.3897/BDJ.1.e995 30
  31. 31. Globally unique iden+fiers are one of the three core components in the TDWG technical architecture. 31
  32. 32. Status 27. August 2014 GBIF enables free and open access to biodiversity data online. We are an interna+onal government-­‐ini+ated and funded ini+a+ve focused on making biodiversity data available to all and anyone, for scien+fic research, conserva+on and sustainable development. 32
  33. 33. GBIF provides a data discovery system that is dependent on resolvable stable iden3fiers for efficient func3onality global registry data portal 33
  34. 34. Dag Endresen dag.endresen@nhm.uio.no Herbarium-­‐workshop at Kongsvold {ellstue, September 1 to 4, 2014 Gary Larson, 1987 34
  35. 35. Slide 1: Image source: TU GRAZ, Austria, hbp://campusonline.tugraz.at/organisa+on/campusonline. Fair use ra+onale: The image is used to illustrate the principle of stable and persistent iden+fiers forming the glue to connect data objects. Slide 3: George: George Orwell, George Harrison, George Bush, George Bush jr, George Soros, George Washington, Boy George, George (Seinfeld), George Lucas, George Clooney, Prince George of Cambridge, King George III of England, George Armstrong Custer, Georges Enescu, Curious George, St George in New Brunswick, George Coleman, George Eliot. Fair use ra+onale: Images of people and places named George from an Internet search. These images are used here to illustrate the weakness of using a human-­‐friendly iden+fier/name, and that in the global society context, many people and places are named George, leading to a name ambiguity problem. We will not know which George it is referred to. Slide 5: Photo: Sancya/AP./ Published: 03/31/2009 3:58:00, hbp://www.nydailynews.com/news/money/pile-­‐unsold-­‐cars-­‐graveyards-­‐gallery-­‐1.45144 Fair use ra+onale: The image is used to illustrate the principle of uniqueness of iden+fiers within a given context -­‐ such as here car license number plates. The car license number is unlikely to be globally unique in a larger context such as e.g. the Internet. Slide 6: Illustra+on retrieved from hbp://www.hypnosisinmelbourne.com.au/index.php?p=49. Fair use ra+onale: The image is used to illustrate the principle of expanding context that stable iden+fiers can be subject to. An iden+fier used in a par+cular context, such as the Internet, could be exposed to a larger context at a later future +me. Slide 7: Fair use ra+onale: The image is of unknown source, retrieved from an Internet search. The image is used to illustrate the principle of expanding context that stable iden+fiers can be subject to. An iden+fier used in a par+cular context, such as the Internet, could be exposed to a larger context at a later future +me. Slide 14: Image: This is Cape Canaveral (M. Sasek, 1963), hbp://blog.miroslavsasek.com/wp-­‐content/uploads/2009/05/moon-­‐birdwatchers-­‐400.jpg by Miroslav Šašek(1916-­‐1980), hbp://www.miroslavsasek.com/, hbp://www.ilike.org.uk/2009/05/this_is_m_sasek.html. Fair use ra+onale: The image is used here to illustrate the principle of aiming at naming an observed organism re-­‐using common exis+ng persistent iden+fiers. Slide 23: Photo: J.Schulzki. Fair use ra+onale: The image is used to illustrate the principle of machine-­‐readable labels. The handling of luggage n an airport context (or the handling of parcels and lebers in a postal service context) could serve as an inspira+on for developing robo+zed handling of museum specimens -­‐ if these specimens are given machine-­‐readable labels. Slide 34: Image: Gary Larson, The Far Side Observer, October 1987, hbp://i227.photobucket.com/albums/dd202/tomcat600/gary-­‐larson-­‐oct-­‐1987.gif. Fair use ra+onale: This drawing is assumed to be copyrighted by Gary Larson and used here under a fair use claim. The image is used to illustrate the principle of naming all things using persistent iden+fiers. The images are used in an educa+onal and not-­‐for-­‐profit, non-­‐commercial purpose. 35

×