GBIF is exploring strategies to guide its work towards 2030. Key areas of focus include:
1. Increasing engagement with the scientific community through training, tools, and enabling nodes to better support national and regional research.
2. Filling data gaps in taxonomy, geography, and time through prioritizing mobilization of new data resources and checklists.
3. Developing new infrastructure and services like data annotation, machine learning tools, and metrics to improve data quality, reuse, and support digitization of legacy collections.
Strategies for Landing an Oracle DBA Job as a Fresher
GBIF towards 2030 (November 2018)
1. GBIF towards
2030
Photo: ForBio/GBIF training at Baikal lake September 2018, CC-BY Dag Endresen
UiO Natural History Museum in Oslo, Department of Research and
Collections, November 8th 2018 CC-BY Dag Endresen
2. GBIF data
surpassed
1 billion species
occurrence
data points in
July 2018
So what …? “What can we do with a billion data points that we could NOT do with,
say, a hundred million?” (GBIF Science Chair Rod Page on Twitter 4 July 2018).
With this observation of a frilled anemone (Metridium dianthus) off Saint-
Pierre and Miquelon, a French archipelago in the northwestern Atlantic.
#GBIF1Billion
4. Status 7th Nov 2018
Occurrence records 1 033 809 115
Datasets 41 536; Publishing institutions 1 305
5. GBIF is a success … so, do we just
continue to deliver more of the same?
Illustration by Rod Page (former GBIF Science committee chair) 5 July 2018.
11. GBIF.no towards a permanent
research infrastructure
Funding periods (15 years, 2005-2019, 50 MNOK)
• 2005-2007 (3 years, RCN 4,5 MNOK, total 5,6 MNOK)
• 2008-2011 (4 years, RCN 6,3 MNOK, total 12,1 MNOK)
• 2012-2016 (5 years, RCN 13,0 MNOK, total 20,0 MNOK)
• 2017-2019 (3 years, RCN 9,2 MNOK, total 12,1 MNOK)
• 2020 --> permanent long-term infrastructure
Forskningsrådet (RCN)
UiO Naturhistorisk museum
Artsdatabanken (NBIC)
13. Node team at NHM, University of Oslo
Dag Endresen, Node manager
Chris/an Svindseth, Data manager
Fridtjof Mehlum, Research director
Vidar Bakken, part-8me (30%)
Artsdatabanken, Trondheim
Wouter Koch, node member
Nils Valland, board member
NTNU University Museum
Anders Finstad, GBIF Science commiCee
Solveig Bakken, board member
Research Council of Norway
Chris/an Wexels Riser
Per Backe-Hansen (un8l 2016)
Contact us at: helpdesk@gbif.no
Status 2018
14. Node team at NHM, University of Oslo
Dag Endresen, Node manager
Vidar Bakken, part-time
Vacancy, Data manager
Artsdatabanken, Trondheim
Wouter Koch, node member
Nils Valland, board member
Stein A. Hoem, IT Manager
NTNU University Museum
Anders Finstad, GBIF Science committee
Solveig Bakken, board member
Data scientist (?)
Norwegian Institute for Nature Research
Erlend B. Nilsen, Science ambassador
Roald Vang, IT Manager
Frank Hanssen, node member
Research Council of Norway
Research Infrastructure Team
16. GBIF Governing Board 2018, GB25, October 2018
GBIF Science Committee: “Focus Forward
on increase usage and relevance”.
Thomas M. Orrell (chair)
Smithsonian Institution, Washington, USA
Greg Riccardi (1st vice chair)
Florida State University, Tallahassee, USA
Anders G. Finstad (2nd vice chair )
NTNU University Museum, Trondheim, Norway
Philippe Grandcolas (3rd vice chair)
Muséum naAonal d'Histoire naturelle, Paris, France GBIF Science
CommiFee
17. Almost 700 – about 2 papers a day
Peer-reviewed publications using GBIF-mediated data
GBIF Gove r ni ng Boa rd 2018, GB25, Oc tobe r 2018
Slide from the GBIF Science Committee Report, GB25, Kilkenny, Ireland, October 2018
18. GBIF Governing Board 2018, GB25, October 2018
Who are currently using GBIF?
▇ Using GBIF data
▇ All citations
Slide from the GBIF Science CommiFee Report, GB25, Kilkenny, Ireland, October 2018
GBIF Science
CommiFee
We could focus on
increasing GBIF relevance
over here?
GBIF citation per category (status 2018-1-15)
19. ● Consolidate data indexing
● Expand data models
● Build strong linkages with
reference catalogues
GBIF infrastructure directions
Bringing data together
brings science together
GBIF Gove r ni ng Boa rd 2018, GB25, Oc tobe r 2018
Slide from the GBIF Science CommiAee Report, GB25 , Kilkenny, Ireland, October 2018
20. Engaging the (wider) science community
● Proper recogni(on of data-users as GBIF
stakeholders.
● Engage and involve through teaching and
relevant tools (e.g. R).
● Enable nodes to engage more closely on
naBonal / regional level.
Slide from the GBIF Science Committee Report, GB25, Kilkenny, Ireland, October 2018
GBIF Gove r ni ng Boa rd 2018, GB25, Oc tobe r 2018
21. Recommendations from the GBIF Nodes chair
● Focus on people (Secretariat, Nodes, Publishers
and Users).
● Training, especially for new Node managers.
● Identify a mechanism locally to:
○ Take part in the GBIF work program.
○ Invest in more sustainable Nodes:
■ Stable funding
■ Capacitated staff
■ Development plan
GBIF Gove r ni ng Boa rd 2018, GB25, Oc tobe r 2018
Slide from the GBIF Nodes Committee Report (Andre
Heughebaert), GB25, Kilkenny, Ireland, October 2018
22. GBIF Gove r ni ng Boa rd 2018, GB25, Oc tobe r 2018
Slide from the GBIF ExecuAve
Secretary (Donald Hobern)
Report, GB25, Kilkenny,
Ireland, October 2018
24. Priority 1: Empower global
network
Ensure that governments, researchers and users are
equipped and supported to share, improve and use data
through the GBIF network, regardless of geography,
language or institutional affiliation.
• Remove barriers to participation
• Increase benefits associated with publishing
biodiversity data
• Address capacity needs
G B I F S t r a t e g i c p l a n 2 0 1 7 - 2 0 2 1
25. Priority 2: Enhance biodiversity
information infrastructure
Provide leadership, expertise and tools to support the
integration of all biodiversity information as an
interconnected digital knowledgebase.
• Coordinate vision and strengthen partnerships with major
biodiversity informatics initiatives
• Promote standardization and common mechanisms for
exchange of biodiversity data
• Provide stable and persistent data infrastructure to
support research
G B I F S t r a t e g i c p l a n 2 0 1 7 - 2 0 2 1
26. Priority 3. Fill data gaps
Prioritize and promote mobilization of new data resources
which combine with existing resources to maximize the
coverage, completeness and resolution of GBIF data,
particularly with respect to taxonomy, geography and time.
• Expand checklists to cover all taxonomic groups
• Identify and prioritize gaps in spatial and temporal data
• Engage institutions and researchers with
complementary data
G B I F S t r a t e g i c p l a n 2 0 1 7 - 2 0 2 1
27. Priority 4. Improve data quality
Ensure that all data within the GBIF network are of
the highest-possible quality and associated with
clear indicators enabling users to assess their origin,
relevance and usefulness for any application.
• Enhance automated data validation
• Implement tools for expert curation
• Provide clear quality indicators for all data
G B I F S t r a t e g i c p l a n 2 0 1 7 - 2 0 2 1
28. Priority 5. Deliver relevant data
Ensure that GBIF delivers data in the form and
completeness required to meet the highest-priority
needs of science and, through science, society.
• Engage with expert communities to manage data
to the highest quality possible
• Deliver well-organized and validated data to
support key applications
G B I F S t r a t e g i c p l a n 2 0 1 7 - 2 0 2 1
29. Does GBIF provide access to the
appropriate tools needed to
address the current challenges
for biological diversity?
If you have a hammer, everything looks like a nail …
31. • Darwin Core occurrence data
provide different types of evidence
for the occurrence of a species in
6me and space.
• Museum specimens & collec6ons
• Material samples & sequence data
• Species or ecosystem monitoring data
• Ci6zen species observa6ons
• … focus on adding new data types?
32. Focus efforts on Data standards?
Genomic Standards Consortium (GSC)
MIMARKS - Minimum
information about a marker
gene sequence
Biodiversity Information
Standards TDWG
Darwin Core
occurrenceID
materialSampleID
eventIDGlobal Genome Biodiversity
Network (GGBN)
33. 79.2% (ci*zen science)
Observa*on data
14,6%
specimens
Rapid increase in GBIF of (ci*zen science) observa*on data…!
Data for natural history specimens was the beginning and remains at the core of
GBIF’s scope
Focus efforts on collection specimens and vouchered and curated physical
samples?
(biobank-samples)
Troudet et al. (2018) The Increasing Disconnection of Primary Biodiversity Data from Specimens doi:10.1093/sysbio/syy044
34. Bias in distribution from uneven reporting efforts!
Distribution of species occurrence records made available to GBIF by citizen
science data providers. https://www.gbif.org/citizen-science
Chandler et al. (2017) Contribution of citizen science towards international biodiversity monitoring. Biological Conservation doi:10.1016/j.biocon.2016.09.004
Focus efforts on filling gaps in species distribution coverage?
36. Total ≈ 8.7 millions species?
(excluding bacteria and micro-organisms)
Mora C, Tittensor DP, Adl S, Simpson AGB, Worm B (2011) How Many Species Are
There on Earth and in the Ocean? PLoS Biology doi:10.1371/journal.pbio.1001127
Caley J, Fisher R, and Mengersen K (2014) Global species richness have not
converged. Trends in Ecology and evolution doi:10.1016/j.tree.2014.02.002
Caley et al. 2014
The Catalogue of Life is a quality-assured checklist
of more than 1.8 million species known to science.
Focus efforts on mobilizing nomenclature resources?
37. New Species Concepts indexed in GBIF
Species concepts based on Opera8onal Taxon Units (OTUs) (from
PlutoF UNITE) are indexed into the GBIF taxon backbone.
Species concepts based on BOLD barcode index numbers (BINs)
are indexed into the GBIF taxon backbone.
Focus efforts on mobilizing yet unnamed species concepts?
38. Capacity building: Data capture & data publishing
• Tajikistan, Belarus, Ukraine, Armenia & Norway
• UiO NHM, ForBio, GBIF Norway & GBIF Secretariat
• 64 students & staff trained
• 8 events over three years:
– 2018 Sep Oslo Kick-off
– 2019 Feb Minsk Belarus
– 2019 Jun Dushanbe Tajikistan
– 2019 Nov Minsk Belarus
– 2020 Apr Yerevan Armenia
– 2020 Oct Kiev Ukraine
– 2021 March Oslo Norway
• DIKU/SIU grant 2018–2021
Focus efforts on capacity building & training?
40. Example of data cleaning workflow
verbatimEventDate:
18 Mayo 2016
year: 2016
month: 5
day: 18
eventDate: 2016-05-18
startDayOfYear: 139
endDayOfYear: 139
DwC-ArchiveSource
Data
cleaning
41. Biased representa,on in country membership
Focus efforts on increasing the country membership coverage?
Low membership coverage in Asia and Africa
42. Asia (gap in data coverage)
Africa (gap in data)
M
ostdataarefrom
morerecentdates
Focus on filling data coverage and gaps in space and 3me?
43. The total number of
specimens in natural history
collec4ons worldwide is
es4mated to 1.2 to 3 billion.
(Ariño 2010; Duckworth et al. 1993)
GBIF indexes 876 million records –
including 128 million specimens
=> 4% to 10% coverage?
Photo: Botany Collection, Algae, Smithsonian National
Museum of Natural History Museum, by Chip Clark.
Focus efforts on services
for supporting digitizing of
legacy specimens?
44. Data fitness depends on data being
• accessible
• timely
• easy to read
• relevant
• consistent
• complete
• specific
• comprehensive
The true value of biodiversity
data can be measured by the
extent to which it is used.
Focus efforts on data re-use metrics and other incen2ves?
47. New services for Annotating biodiversity data
Tschöpe et al. (2013) Annotating biodiversity data via the Internet.
48. "Machine learning algorithms have
successfully identified plant species in massive
herbaria just by looking at the dried
specimens. According to researchers, similar
AI approaches could also be used identify the
likes of fly larvae and plant fossils"
Researchers trained... algorithms on more than 260,000 scans of
herbarium sheets, encompassing more than 1,000 species. The
computer program eventually identified species with nearly 80%
accuracy: the correct answer was within the algorithms’ top 5 picks
90% of the time. That, says (Penn State paleobotanist Peter) Wilf,
probably out-performs a human taxonomist by quite a bit.
Carranza-Rojas J, Goeau H, Bonnet P, Mata-Montero E, and Joly A (2017) Going deeper
in the automated identification of Herbarium specimens. BMC Evolutionary Biology
17:181. https://doi.org/10.1186/s12862-017-1014-z
Ledford H (2017) Artificial intelligence identifies plant species for science: Deep-learning
methods successfully classify thousands of herbarium samples. Nature News 11 August
2017. doi:10.1038/nature.2017.22442
Carranza-Rojas J, Joly A, Bonnet P, Goëau H, Mata-Montero E (2017) Automated
Herbarium Specimen Identification using Deep Learning. Proceedings of TDWG 1:
e20302. https://doi.org/10.3897/tdwgproceedings.1.20302
Focus efforts on new machine learning services?
51. "Scien&fic irreproducibility
— the inability to repeat
others' experiments and
reach the same conclusion
— is a growing concern”.
Baker (2016) Nature
doi:10.1038/533452a
52. Open Access (OA): Research results distributed online and free
of costs or other barriers – often meaning free access to
research articles.
Open Science: researchers to share their methods, computer
code and research data in central data repositories.
Open Data: based on FAIR principles: findable, accessible,
interoperable and reusable (biodiversity) data - is the primary
objective of GBIF.
For full reproducibility we also need access to the physical
biological material – to be deposited in museum collections
and biobank-repositories.
"Scientific irreproducibility — the inability to repeat others'
experiments and reach the same conclusion” (Nature 2016)
53. "FAIR" data
• Findable
– assign persistent IDs, provide rich metadata, register in
a searchable resource (such as GBIF)
• Accessible
– Retrievable by their ID using a standard protocol,
metadata remain accessible even if data aren’t
• Interoperable
– Use formal, broadly applicable languages, use standard
vocabularies, qualified references (e.g. Darwin Core)
• Reusable
– Rich, accurate metadata, clear licences, provenance,
use of community standards (e.g. Dublin Core, EML)
www.force11.org/group/fairgroup/fairprinciples
• Wilkinson, M. D. et al. (2016) The FAIR Guiding Principles for scientific data
management and stewardship. Sci. Data 3:160018
[doi:10.1038/sdata.2016.18]
Slide source: OpenAIRE & EUDAT, CC-BY-4.0, 2013
54. Data Citation Principles
1. Data to be legitimate citable products of research.
2. Data citations giving scholarly credit and attribution.
3. In scholarly literature, whenever claims are based on data, data should
always be cited.
4. Persistent method for identification of data, that is machine actionable,
globally unique, universal.
5. Data citation facilitate access to data or at least to metadata.
6. Unique identifiers that persist even beyond the lifespan of the data.
7. Data citation identify and access the specific data that support
verification of the claim (provenance, time-slice, version).
8. Flexible, but attention to interoperability of practices across
communities.
Data Cita'on Synthesis Group: Joint Declara'on of Data Cita'on Principles. Martone M. (ed.) San Diego CA: FORCE11; 2014
55. Open research data
Forskningsrådet (2014). ISBN: 978-82-12-03361-0
The Research Council of Norway expects all research data from projects
funded by the Research Council to be made freely available as open data.
In some situations there can be valid and justified reasons for exceptions.
(2014)
56. Open Science
Kunnskapsdepartementet (2016)
EU (2016) Compe<<veness Council, 26-27/05/2016
EU (2007) INSPIRE Direc<ve
Norway is to be a careful pioneer in open access to research results.
Norway to follow the ambi6on of EU on full open access to publicly
funded research by 2020.
Results of research supported by public and public-private funds freely available to and reusable by anyone.
57. ARKIVERING AV FORSKNINGSDATA OG
MATERIALPRØVER (BIOBANK)
• Åpen arkivering og deling av data og fysiske
materialprøver sikrer at dine forskningsresultater er
reproduserbare.
• Profesjonell kuratering av data og materialprøver sparer
deg forskningstid fordi du selv, dine samarbeidspartnere
og andre finner, forstår, og får tilgang til dine
forskningsdata og prøver.
• Deling av data og materialprøver gir deg bredere
spredning og påvirkningskraft for din forskning.
• Tilrettelegging for gjenbruk av forskningsdata og
materialprøver forsterker åpen og nyskjerrighets-dreven
forskning og kan lede til uventede forsknings-
gjennombrudd!