Cette présentation décrit le scénario changeant de la publication des données sur la biodiversité, tel qu'il est en 2015. Elle était présenté pour la première fois dans la formation GB22 pour des points nodaux du GBIF.
Diapos produites et présentés par L. Russell (Vertnet), traduites en Français par M. Raymond (Secrétariat GBIF).
Session 09, Sommaire et évaluation. Formation GB22 pour points nodaux.
Séance 02, Le paysage de la publication des données en 2015, dans la formation GB22 pour points nodaux du GBIF
1. GB22 TRAINING EVENT FOR NODES – 4 OCTOBER 2015
Séance 02: Le paysage de la publication
de données en 2015
Laura Russell
2. INDEX
Le « paysage » de la publication des données
La publication des données sur la biodiversité
Les types de données
Les normes de données
La normalisation des données et la qualité des
données
Les méthodes de publication des données
La -promotion de la publication de données
Les cas d'utilisation
3. LE PAYSAGE DE LA PUBLICATION DES DONNÉES
DiGIR / TAPIR
très utilisé
pour publier
des données
sur la
biodiversité
Idée de
publier à base
de fichier
texte simple
et comprimé
présenté à
TDWG
Le GBIF lance
l’ IPT 1,0
Le GBIF
réaménage l’
IPT
Le GBIF lance
l’ IPT 2.0
La publication
des données
est enseigné à
la formation
des nœuds
Le points
nodaux et les
agrégateurs
commencent
à installer et à
utiliser l’ IPT
Les jeux de
données de
type
occurrence et
checklist ainsi
que le
nombre
d’installations
de l’IPT
montrent une
croissance
continue
2008 2008 2009 2010 2011 2011
2012
2011
4. LE PAYSAGE DE LA PUBLICATION DES DONNÉES -
STATISTIQUES
http://www.gbif.org/ipt/stats
No. d’installations de l’IPT enregistrées avec le GBIF
5. LE PAYSAGE DE LA PUBLICATION DES DONNÉES -
STATISTIQUES
No. de jeux de données publiées avec l’IPT
6. LE PAYSAGE DE LA PUBLICATION DES
DONNÉES EN 2015
L'engagement continue du
GBIF à améliorer l'accès
aux données de la
biodiversité
Le raffinement et
l'expansion des normes et
des logiciels de
publication
L'évolution des normes
sociales
La plupart des données sont
encore publiées avec le
« core » simple pour
occurrences
Les portails n’ont pas encore
les fonctionnalités pour
soutenir les données plus
riches
De nombreuses institutions
ont encore besoin d’ être
convaincues de publier des
données sur la biodiversité
http://www.gbif.org/page/82104
7. INDEX
Le « paysage » de la publication des données
La publication des données sur la biodiversité
Les types de données
Les normes de données
La normalisation des données et la qualité des
données
Les méthodes de publication des données
La -promotion de la publication de données
Les cas d'utilisation
8. QU'EST-CE QUE DES DONNÉES DE BIODIVERSITÉ?
Texte numérique ou donnée multimédia détaillant les
faits sur l'instance d’occurrence d'un organisme, à
savoir sur le quoi, où, quand, comment et par qui
de l’occurrence et de l'enregistrement.
9. QU'EST CE LA PUBLICATION DE DONNÉES?
La « publication » consiste à rendre des jeux de données de
la biodiversité accessibles au public et découvrable, sous une
forme standardisée, via un point d'accès, généralement une
adresse Web (URL).
IPT
∞
10. LES TYPES DE DONNÉES SUR LA BIODIVERSITÉ
http://www.gbif.org/publishing-data/summary#datatypes
Checklistes
Occurrences
Metadonnées
11. LES TYPES DE DONNÉES SUR LA BIODIVERSITÉ –
DONNÉES D’ ÉCHANTILLONNAGE
http://www.gbif.org/newsroom/news/sample-based-data
Échantillonnage
12. LES NORMES OU “STANDARDS”
http://www.tdwg.org/standards/
ABCD Access to Biological Collection
Data (2005)
DwC Darwin Core (2009)
AC Audubon Core Multimedia
Resources Metadata Schema (2013)
NCD Natural Collection Descriptions
(Draft)
13. DARWIN CORE
http://rs.tdwg.org/dwc
recordedBy: Une liste (concaténée et séparée) de noms de personnes, groupes ou
organisations responsables de l'enregistrement de l'occurrence originale. Le collecteur
ou observateur primaire, en particulier celui qui applique un identifiant personnel
(RecordNumber), doit être cité en premier. Exemples: « José E. Crespo », « Oliver P.
Pearson | Anita K. Pearson »
14. DARWIN CORE SIMPLE
SIMPLEDWC est une spécification
pour une façon particulière d'utiliser
les termes Darwin Core - de
partager des données sur les
taxons et leurs occurrences avec
une structure simple - et
probablement ce que veux dire
quelqu'un qui suggère de "formater
vos données conformément à la
Darwin Core".
http://rs.tdwg.org/dwc/terms/simple/index.htm
15. LES ARCHIVES DARWIN CORE
Une Archive Darwin Core (DwCA) est la
représentation en texte des données formatées à
Darwin Core.
Une DwCA est un fichier compressé contenant un
minimum de trois fichiers.
http://rs.tdwg.org/dwc/terms/guides/text/index.htm
17. “MAPPING CORES” OU FICHIERS CENTRALES
Taxon Core
La catégorie d'informations relatives aux noms taxonomiques, le nom du taxon, les
usages ou les concepts de taxons. Sortie en Avril 2015, cette version supprime
dcterms:source and dcterms:rights, et ajoute dcterms: licence. 43 termes.
Occurrence Core
La catégorie d'informations relatives aux preuves d’une occurrence dans la nature,
dans une collection ou dans un jeu de données (de spécimens, d’observations etc.)
Sortie en July 2015, cette version supprime les terms dcterms:source, dcterms:rights,
dwc:individualID, dwc:occurrenceDetails,et ajoute dcterms:license,
dwc:organismQuantity, dwc:organismQuantityType, dwc:organismID,
dwc:organismName, dwc:organismScope, dwc:associatedOrganisms,
dwc:organismRemarks, dwc:parentEventID, dwc:sampleSizeValue,
dwc:sampleSizeUnit. 169 termes.
Event
La catégorie des informations relatives à un événement d'échantillonnage. Sortie le 29
May 2015. 95 termes
18. EXTENSIONS
Darwin Core ne fournit pas de termes pour chaque type
de données possible.
• 22 inscrits
• 25 en cours de développement
Exemples
• Audubon Media Description (ou “Audubon Core”)
• Darwin Core Identification History (historique de
l’identification)
• Darwin Core Measurement or Facts (measures ou faits)
http://tools.gbif.org/dwca-validator/extensions.do
19. EXEMPLE SCHÉMA ÉTOILE - OCCURRENCE
Media
Occurrence
Core
Geographica
l
Determination
meta.xml
EML.xml
+
DwC Archive
Occurrence
Germoplas
m
20. EXEMPLE SCHÉMA ÉTOILE - CHECKLISTES
Literature
Taxon Core
Description
Occurrences
meta.xml
EML.xml
+
DwC Archive
Checklist
Vernacular
Distribution
Types
22. NORMALISATION DES DONNÉES
Quelle est la normalisation des données?
Raisons pour normaliser une base de données
Formes normales
http://www.essentialsql.com/get-ready-to-learn-sql-database-normalization-explained-in-simple-english/,
http://databases.about.com/od/specificproducts/a/normalization.htm, http://www.dotnet-tricks.com/Tutorial/sqlserver/756N210512-Database-Normalization-Basics.html
23. LA QUALITÉ DES DONNÉES
Encodages
Outils
Pourquoi travailler
sur l'amélioration
des données?
Importance de la
rétroaction
http://community.gbif.org/pg/pages/view/48546/precourse-activities
24. MÉTHODES DE PUBLICATION DE DONNÉES
la capacité
technique
Capacité de gestion de données
Créer vos propres DwCa
Publier avec des tableurs centre d'hébergement de données
26. MÉTHODES DE PUBLICATION DE DONNÉES –
PREMIER SONDAGE
Sondage: http://etc.ch/dQ68
Résultats: http://directpoll.com/r?XDbzPBd3ixYqg8RE6D9gU3CMFoU9fqOuh9n0P5P6
Quelles des méthodes suivantes avez-vous déjà utilisées pour
publier des données en ligne?
28. MÉTHODES DE PUBLICATION DE DONNÉES –
DEUXIÈME SONDAGE
Sondage: http://etc.ch/re74
Résultats:
http://directpoll.com/r?XDbzPBd3ixYqg8xmOHP25WFCV81TJYwb1aGgrVyX5
Quelles des méthodes suivantes utilisez-vous régulièrement à
publier des données en ligne? (à savoir l'année dernière)
29. INDEX
Le « paysage » de la publication des données
La publication des données sur la biodiversité
Les types de données
Les normes de données
La normalisation des données et la qualité des
données
Les méthodes de publication des données
La promotion de la publication de données
Les cas d'utilisation
30. PROMOTION DE LA PUBLICATION DE DONNÉES
Sujet de discussion lors de la formation de nœuds à
Berlin en 2013.
Elément clé du travail quotidien de gestionnaires de
points nodaux.
31. PROMOTION DE LA PUBLICATION DE DONNÉES
Obstacles
psychologiques
et culturels
1. Manque de connaissance
2. Manque de compréhension
3. Manque de volonté
4. Valeur perçue des données
5. Questions de confidentialité
6. Absence d’autorisation
7. Manque de temps / planning
8. Manque de moyens
9. Manque de fonds
10. Manque d’infrastructures
http://www.gbif.org/publishing-data/benefits, http://www.gbif.org/resource/81196
Obstacles
institutionnels
Obstacles liés aux
ressources
Obstacles pratiques
32. NIVEAUX DE RESTRICTION DE DONNÉES
1. Refus de partager.
2. Refus de partager jusqu'à ce que l'utilisation. prévue des
données soit terminée.
3. Partage payant des données.
4. Partage des données avec des restrictions.
5. Partage libre des données.
33. PROMOTION DE LA PUBLICATION DE DONNÉES -
STRATÉGIES
1. Faciliter l'accès à un soutien financier.
2. Appel à des engagements ou des mandats légaux.
3. Appel à un accès ouvert/principes moraux.
4. Montrer les avantages d'une meilleure gestion des données.
5. Montrer les avantages pour les carrières scientifiques.
6. Pression des pairs
7. Démarrer/soutenir de grands programmes de numérisation.
8. Démarrer/soutenir les efforts de rapatriement des données.
34. PROMOTION DE LA PUBLICATION DE DONNÉES -
DISCUSSIONS
Stratégies
• Commencez petit - seulement les
méta-données
• Promotion du fait qu’avec une
seule publication au GBIF les
données seront ensuite
exposées dans de multiples
réseaux
• Fournir des IPTs hébergés pour
éliminer les barrières technique
• Illustrer des licences avec des
exemples éloquents.
• Promouvoir et organiser des
formations sur les « data
papers »
Défis
• Ne pas voulant publier /
ne pas voulant publier
toutes les données
• Exigences/ capacities
techniques pour utliser
l’IPT
• Licences restrictives des
données
http://community.gbif.org/pg/forum/topic/48616/precourse-activity-promoting-data-publishing/
35. INDEX
Le « paysage » de la publication des données
La publication des données sur la biodiversité
Les types de données
Les normes de données
La normalisation des données et la qualité des
données
Les méthodes de publication des données
La -promotion de la publication de données
Les cas d'utilisation
36. CAS D’UTILISATION - INTRODUCTION
Explorez quatre cas d'utilisation basés sur de pratiques
de publication actuelles
• Littérature
• Données d'observation
• Collections d'histoire naturelle
• Checklistes
Remplir deux exercices
• Définition des stratégies de publication
• Publier des jeux de données
41. GB22 TRAINING EVENT FOR NODES – 4 OCTOBER 2015
Séance 02: Le paysage de la publication
de données en 2015
Laura Russell
Notes de l'éditeur
Image from Piotr Lewandowski, shared via http://www.freeimages.com/photo/learning-with-pencil-1415671
Data/chart provided by Kyle Braak, GBIF.
Data/chart provided by Kyle Braak, GBIF.
Good and needs improvement
The data publishing area is in continuous evolution and expansion. The standards are refined and expanded, the software is improved and debugged, the social norms evolve. That requires that we all recycle our knowledge periodically.
Despite biodiversity data publication in a standard way is possible for a long time now, most of the data is still published in a very simple way: just the occurrence core, single identifications, few/no connections among objects, simple metadata... Much richness of the original data is still non accessible because of the way data is published. This is one of the main reasons to organize this course.
· The data already published determines (although only to a certain extent) the technical developments in the GBIF network, namely in GBIF.org and its API. Only when a certain amount of data of certain type is published (e.g. through an extension), the priority to enable discovery and retrieval of that information raises in importance. Examples of this is the indexing of occurrences published using the occurrence extension of the taxon core, and the possibility to search and retrieve images from the simple multimedia extension.
Most data still published with simple occurrence core and is missing the known richness of the original data
Without the rich data, portal developers do not have the priority to enhance with features to support rich data
Reused slide from 1B Publishing Primary Biodiversity Data by Alberto González-Talaván1~ Data Sharing, Data Standards, and Demystifying the IPT ~ Gainesville, FL, USA. 13 January 2015
Modified from Reused slide from 1B Publishing Primary Biodiversity Data by Alberto González-Talaván1~ Data Sharing, Data Standards, and Demystifying the IPT ~ Gainesville, FL, USA. 13 January 2015
Modified from 1B Publishing Primary Biodiversity Data by Alberto González-Talaván1~ Data Sharing, Data Standards, and Demystifying the IPT ~ Gainesville, FL, USA. 13 January 2015
Review of the data types for publishing (http://www.gbif.org/publishing-data/summary#datatypes). This will be the first attempt to cover the instructional objectives 1a, 1b & 1c.
GBIF now deals with four types of biodiversity data:
Occurrences (observations, specimens etc)
Checklists (names)
Metadata (data about data) - http://www.gbif.org/dataset/search?type=METADATA
Occurrences are records that document a 'collection event'—evidence that a particular, named organism was found at a particular time and place. Also known as primary biodiversity data, occurrences document the 'what, where, when, how and by whom' of our exploration of the planet's species. An occurrence record can be based on an observation in the field, vouchered (labeled) specimen in a museum or herbarium, or other evidence.
Checklists are lists of scientific names of organisms grouped into taxonomic hierarchies. They serve two main functions: first, they provide data that help to enrich information about particular species, for example by including them on national checklists, and on lists of invasive or threatened species; and they provide taxonomic 'backbones' around which species information can be organized.
Metadata are structured descriptions of datasets giving essential details such as the geographic and taxonomic scope of the data, methods of collection or observation, contact details and citation requirements. They help to give context to datasets and enable users to assess whether data are fit for use in a particular research project or application.
introduce the need/push for sample-based datasets (introduction of the event core) (http://www.gbif.org/page/82105) - released March 24, 2015
beyond “presence only” data -- more quantitative information used in other areas of scientific discovery and research, particularly ecological monitoring and assessment.
Sample-based data (ecological monitoring and assessment data)
Sample-based data are records from thousands of different kinds of environmental, ecological, and natural resource monitoring and assessment investigations. These events range from one-off surveys to ongoing monitoring and includes activities like freshwater and marine sampling, plant cover and vegetation plots, and citizen science bird counts, among others.
Modified from 1B Publishing Primary Biodiversity Data by Alberto González-Talaván1~ Data Sharing, Data Standards, and Demystifying the IPT ~ Gainesville, FL, USA. 13 January 2015
This section will cover the instructional objective 2a.
Biodiversity Information Standards (TDWG), also known as the Taxonomic Databases Working Group, is a not for profit scientific and educational association that is affiliated with the International Union of Biological Sciences.
TDWG was formed to establish international collaboration among biological database projects. TDWG promoted the wider and more effective dissemination of information about the World's heritage of biological organisms for the benefit of the world at large. Biodiversity Information Standards (TDWG) now focuses on the development of standards for the exchange of biological/biodiversity data.
Our Mission
Develop, adopt and promote standards and guidelines for the recording and exchange of data about organisms
Promote the use of standards through the most appropriate and effective means and
Act as a forum for discussion through holding meetings and through publications
Modified from 1B Publishing Primary Biodiversity Data by Alberto González-Talaván1~ Data Sharing, Data Standards, and Demystifying the IPT ~ Gainesville, FL, USA. 13 January 2015
It includes a glossary of terms intended to facilitate the sharing of information about biological diversity by providing reference definitions, examples, and commentaries.
It is primarily based on taxa, their occurrence in nature as documented by observations, specimens, and samples, and related information.
Modified from 1B Publishing Primary Biodiversity Data by Alberto González-Talaván1~ Data Sharing, Data Standards, and Demystifying the IPT ~ Gainesville, FL, USA. 13 January 2015
Flat table
Few restrictions
A data file (occurrence.txt) conforming to the SIMPLEDWC in a CSV format. The first row includes Darwin Core standard term names.
A meta file (meta.xml) in an XML format. It contains technical details to instruct a computer on how to use the data file.
A meta file (eml.xml) in an XML format. It contains explanatory details about the records contained within the data file to instruct a user if the data will be fit for their use.
Modified from 1B Publishing Primary Biodiversity Data by Alberto González-Talaván1~ Data Sharing, Data Standards, and Demystifying the IPT ~ Gainesville, FL, USA. 13 January 2015
Cores updated based on updated
Modified from Standards and sharing complex primary biodiversity data; and what is an extension anyway? ~ Deb Paul ~ Data Sharing, Data Standards, and Demystifying the IPT Workshop – Day 1, Jan. 13, 2015 ~ Gainesville, FL
Modified from 1B Publishing Primary Biodiversity Data by Alberto González-Talaván1~ Data Sharing, Data Standards, and Demystifying the IPT ~ Gainesville, FL, USA. 13 January 2015
Modified from 1B Publishing Primary Biodiversity Data by Alberto González-Talaván1~ Data Sharing, Data Standards, and Demystifying the IPT ~ Gainesville, FL, USA. 13 January 2015
Modified from 1B Publishing Primary Biodiversity Data by Alberto González-Talaván1~ Data Sharing, Data Standards, and Demystifying the IPT ~ Gainesville, FL, USA. 13 January 2015
Database normalization is process used to organize a database into tables and columns. The idea is that a table should be about a specific topic and that only those columns which support that topic are included.
There are three main reasons to normalize a database. The first is to minimize duplicate data, the second is to minimize or avoid data modification issues, and the third is to simplify queries.
To assist in achieving these objectives, some rules for database table organization have been developed. The stages of organization are called normal forms; there are three normal forms most databases adhere to using.
First Normal Form – The information is stored in a relational table and each column contains atomic values, and there are not repeating groups of columns.
Second Normal Form – The table is in first normal form and all the columns depend on the table’s primary key.
Third Normal Form – the table is in second normal form and all of its columns are not transitively dependent on the primary key
There are further norms if there is interest in learning more.
For the purposes of the Star Schema, you’ll find your data adhering to the…
Tweet image - https://twitter.com/Iteration23/status/646085874963337216
GBIF community group in conjunction with TDWG group on Data Quality
Excel is a wonderful tool, but you must understand how Excel works or it can change your data in unexpected ways! Suggest watching --
Encoding
Excel
OpenRefine – Tutorials
See pre-course activities for some recommendations/tutorials
Slide from 1B Publishing Primary Biodiversity Data by Alberto González-Talaván1~ Data Sharing, Data Standards, and Demystifying the IPT ~ Gainesville, FL, USA. 13 January 2015
Ways to publish (strengths and weaknesses of each; include stats for numbers of datasets published via each way; how to identify what method was used when viewing datasets on gbif.org). This will cover the instructional objective 2b.
simple spreadsheets
IPT
custom-created DwCA
Slide from 1B Publishing Primary Biodiversity Data by Alberto González-Talaván1~ Data Sharing, Data Standards, and Demystifying the IPT ~ Gainesville, FL, USA. 13 January 2015
IPT currently under development with future planned updates
Web tools and templates for excel tools were contracted for development in ???? And have not been updated since then.
DiGIR protocol development ceased in 2006
TAPIR protocol last updated in 2010
BioCASE protocol last updated 2015
Online poll
Which of the following methods have you ever used to publish data online (or to help others to do so)?
o DiGIR provider
o TAPIR provider
o BioCASe provider
o IPT
o DwC-A through “DwC-A spreadsheet processor”
o Customized DwC-A through “DwC-A Assistant”
Other custom created DwC-A
o None
There are simple online poll tools that show the progress of the voting as you speak and can be displayed in the screen as people vote. It communicates very well and makes the exercise very dynamic.
Online poll
Which of the following methods do you use REGULARLY to publish data online (i.e. in the last year)
o DiGIR provider
o TAPIR provider
o BioCASE provider
o IPT
o DwC-A via“DwC-A spreadsheet processor”
o Customized DwC-A via“DwC-A Assistant”
Other custom created DwC-A
o None
Which of the following methods do you use regularly to publish data online (or to help others to do so) (i.e. used at least once in the last year)
There are simple online poll tools that show the progress of the voting as you speak and can be displayed in the screen as people vote. It communicates very well and makes the exercise very dynamic.
Online poll
Which of the following methods do you use REGULARLY to publish data online (i.e. in the last year)
o DiGIR provider
o TAPIR provider
o BioCASE provider
o IPT
o DwC-A via“DwC-A spreadsheet processor”
o Customized DwC-A via“DwC-A Assistant”
Other custom created DwC-A
o None
Which of the following methods do you use regularly to publish data online (or to help others to do so) (i.e. used at least once in the last year)
There are simple online poll tools that show the progress of the voting as you speak and can be displayed in the screen as people vote. It communicates very well and makes the exercise very dynamic.
This section will aim to start covering the instructional objective 3.
Core element for Nodes managers to do
Review from Berlin
Extended documents --- review prior to use cases and exercises on day 2
Identify and assess data holders
Slide from Module 3 – Knowledge exchange I Supporting data digitization and publishing ~ Alberto González-Talaván ~ 4 October 2013, GBIF Nodes Training ~ Berlin, Germany
Barriers to publishing
On these points:
Lack of knowledge: The holder may not be aware how sharing on the internet works, and the existence of initiatives such as GBIF.
Lack of understanding: the holder may have heard about GBIF and data publishing, but thinks it must be complicated, bureaucratic, very technical…
Lack of will: The holder understand the process but does not want to go through it because of cultural issues, perceived sensitivity of the data,
Perceived data value: the holder thinks that the data has economic or intrinsic value that (s)he wants to exploit.
Privacy concerns:
Lack of authorization: The holder would like to share the data, but institutional policies prevent it.
Lack of time / planning: The holder never finds an appropriate moment to start the digitization, data transformation or publishing. Or got discouraged after not properly planned attempts.
Lack of capacity: the holder would like to digitize and share the data, but (s)he doesn’t know what is the best (or any) way to do it.
Lack of resources/funding: the holder would like to digitize and share the data, but there is no spare capacity in the institution to carry out such tasks.
Lack of infrastructure: the holder would like to digitize and share the data, but (s)he does not have the technical infrastructure to do it.
----- Meeting Notes (10/3/15 07:09) -----
Least to most open
Objective is to get to 5 or any advancement on the scale is positive
Slide from Module 3 – Knowledge exchange I Supporting data digitization and publishing ~ Alberto González-Talaván ~ 4 October 2013, GBIF Nodes Training ~ Berlin, Germany
Least to most open
Objective is to get to 5 or any advancement on the scale is positive
Slide from Module 3 – Knowledge exchange I Supporting data digitization and publishing ~ Alberto González-Talaván ~ 4 October 2013, GBIF Nodes Training ~ Berlin, Germany
Strategies and arguments to overcome barriers/Incentives for publishing
On these points:
Facilitate access to financial support: provide digitization grants or help the data holders to obtain funding that funds directly or indirectly the digitization.
Call upon commitments or legal mandates: Try to use commitments or legal mandates that apply to the institution or the country as a way to convince the data holder.
Call upon open access / moral principles: the results of publicly funded research should be made public, access to science should not be restricted, etc.
Show the benefits of a better data management: management of digital information can facilitate the data holder’s daily work.
Show the benefit for their scientific careers: publishing data can provide scientific credit through data papers, citations and data usage indexes.
Peer pressure: competing/fellow institutions are already sharing data and the holder’s institution is being left behind.
Start / support big digitization programmes: promote the start of big digitization programmes that will benefit many holders at the same time.
Start / support data repatriation efforts: start programmes that will allow the return of digital data describing your county’s biodiversity.
Summarize community discussion on this topic
examples publishing networks/nodes and how they’ve been successful or had difficulties in publishing data?
Cees provided some great examples and strategies
Nico introduced topic of licensing, mentioning Peter Desmet’s blog post, Why we should publish under CC0 as an illustrative example of what more restrictive licenses prevent users from doing or not doing with data.
http://www.canadensys.net/2012/why-we-should-publish-our-data-under-cc0
Faustin, Hanna, and Cees provided some additional discussion on licensing
And Anne-Sophie, introduced organizing trainings on topics like Data Papers as an easier sell to data publishers as who could observe the direct impact on the visibility and numbers of downloads of their data sets for their published data papers.
4 use cases based on current publishing practices: literature, observational data, natural history collections and checklists.
The FIRST EXERCISE will last up to 20 minutes and will be around the definition of data publishing strategies. Based in the description included in their use case, each group will work on identifying suitable technical solutions, challenges and strategies. Each group will reflect the outcome of their discussions in a single page.
The SECOND EXERCISE will use the all the remaining time and will consist on the publishing of a dataset using the test IPT installation made available for the course. There are two datasets available, depending on the level of challenge that the participant is seeking. Links to the datasets will be provided as part of the use case description document. Those seeking certification, will need to fill a template describing the process and send it to the group facilitator ONLY.
Birds occurrence records from “Birds at the Danish Lighthouses 1883-1939”
Camera trap database of Tiger sightings from India
French and English
Prairie Habitat Restoration Study
VASSY, the database of vascular plants of Syldavia and Eskeastein
Image from Piotr Lewandowski, shared via http://www.freeimages.com/photo/learning-with-pencil-1415671