Presentation of the Global Biodiversity Information Facility (GBIF), GBIF-Norway and the Norwegian Biodiversity Information Centre (NBIC, Artsdatabanken) at the Norwegian Institute for Forestry and Landscape (Skog og Landskap) at Ås outside Oslo on the 17th October 2013. Seminar together with the Norwegian Biodiversity Information Centre (NBIC, Artsdatabanken).
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Global Biodiversity Information Facility - 2013
1. Seminar at the Norwegian Forest and Landscape Institute
Global Biodiversity Information Facility (GBIF)
A global infrastructure for
publishing biodiversity data
Dag Endresen and Christian Svindseth
GBIF Norway, Natural History Museum of the University in Oslo (NHM-UiO)
Global Biodiversity Information Facility (GBIF)
17. October 2013
2. Topics
•
•
•
•
•
•
•
What is GBIF?
International partners
Darwin Core terminology
GBIF data portal and services
Norwegian collection portals
Persistent identifiers (PID)
Data paper
2
3. GBIF enables free and open access to
biodiversity data online.
We are an international government-initiated
and funded initiative focused on making
biodiversity data available to all and anyone,
for scientific research, conservation and
sustainable development.
Status GBIF
data-portal
Oktober 2013
3
4. Slide
by
Donald
Hobern,
2012
GBIF’s
unique
role
• Registry
of
biodiversity
data
resources.
• Tools
and
support
for
biodiversity
data
publica?on.
• Network
development
at
na?onal,
regional
and
global
levels.
• Global
virtual
natural
history
collec?on.
• Cross-‐domain
linkage
between
data
from
collec?ons,
ecology
and
genomics.
• Access
to
global
biodiversity
data
for
GIS
analysis
and
environmental
monitoring.
– Aggregated
presence
data
– Site-‐based
survey
data
(samples,
presence/absence)
4
5. Norway joined GBIF in February 2004.
The
low
membership
coverage
in
Africa
and
Asia
is
an
important
gap!
5
6. OECD
Global
Science
Forum
(1999):
“establish
and
support
a
distributed
system
of
interlinked
and
interoperable
modules
(databases,
so6ware
and
networking
tools,
search
engines,
analy:cal
algorithms,
etc.)
that
together
will
form
a
Global
Biodiversity
Informa:on
Facility
(GBIF)”.
6
7. The Millennium Ecosystem Assessment showed that human actions
often lead to irreversible losses in the diversity of life, and these losses
have been more rapid in the past 50 years than ever before in human
history.
Biological diversity is key to resilience – the ability of natural and social
systems to adapt to change, and is essential for nearly every aspect of
human well-being.
Because human threats to biodiversity occur across large spatial and
temporal scales, biodiversity and ecosystem monitoring, forecasting,
and risk assessments require data to be organised in a globallyaccessible, integrated infrastructure.
GBIF’s Data Portal provides this infrastructure.
7
8. Based
on
slide
by
Donald
Hobern,
2012
Organisa?onal
partnerships
• Some
poten?al
data
collabora?ons
– Taxon
names
and
nomenclature
• Catalog
of
Life
(CoL)
• IPT
to
publish
global
and
regional
species
databases
• GBIF
infrastructure
to
support
construc?on
of
CoL
– Biodiversity
literature
• Biodiversity
Heritage
Library
(BHL)
• User
annota?ons
to
extract
occurrence
records
• Link
original
(and
other)
descrip?ons
to
taxonomy
– Species
informa?on
and
traits
• Encyclopedia
of
Life
(EoL)
• Support
EOL
as
global
species
informa?on
aggregator
• Include
EOL
summary
box
on
each
GBIF
species
page
8
9. GBIF and GEO
Intergovernmental group on earth observations
GEO
BON
Biodiversity observation
network
Data Integration &
Interoperability
GBIF provides the infrastructure delivering
species occurrence data.
9
10. GIASIP
Global Invasive Alien Species Information Partnership
GBIF provides the infrastructure delivering species occurrence data.
Launched at CBD COP11 October 2012 in Hyderabad, India.
10
11. GBIF and IPBES
(Naturpanelet)
Intergovernmental Science-Policy Platform on
Biodiversity and Ecosystem Services (IPBES)
IPBES
provides
informa?on
to
support
policy
decisions
and
scien?fic
research
on
biodiversity.
GBIF
operate
within
data,
informa?on
and
knowledge
domain
of
biodiversity
informa?cs.
GBIF
GBIF
provides
the
infrastructure
delivering
species
occurrence
data
in
IPBES.
Science
Biodiversity
Policy
IPBES
Data,
informa?on
and
knowledge
GBIF
11
12. 1. Information infrastructure –
an Internet-based index of a
globally distributed network of
interoperable databases that
contain primary biodiversity
data.
2. Community-developed tools,
standards and protocols – the
tools data providers need to
format and share their data.
3. Capacity-building and training
– and access to a global expert
community.
12
13. Based
on
slide
by
David
Remsen,
GBIF,
January
2012
Common discovery system
http://gbrds.gbif.org
gbrds.gbif.org
www.gbif.org
13
14. Slide
by
David
Remsen,
GBIF,
November
2011
Architecture
• Global
Registry
for
resource
discovery.
• Common
and
documented
data
standards.
– Metadata
– Data
– Vocabularies
• Data
Sharing
tools.
• Common
web
service
methods.
• Resolvable
iden?fiers.
14
15. Darwin Core – a vocabulary of terms
Wieczorek J, Bloom D, Guralnick R, Blum S, Döring M, De Giovanni R, Robertson T, and
Vieglais D (2012) Darwin Core: An Evolving Community-Developed Biodiversity Data Standard.
PLoS ONE 7(1): e29715. (doi:10.1371/journal.pone.0029715)
15
17. Slide
by
Donald
Hobern,
2012
Unifying
species
data
Ecological
Monitoring
Genomics
Darwin
Core
Integrated access for
records of the
occurrence of any
species:
•
•
•
•
•
•
What?
When?
Where?
What evidence?
Data owner?
Link to full record
Presence only
Collec/ons
17
18. Slide
by
Donald
Hobern,
2012
Unifying
species
data
Ecological
Monitoring
Integrated access for
records of the
occurrence of any
species:
•
•
•
•
•
•
What?
When?
Where?
What evidence?
Data owner?
Link to full record
Presence only
Darwin
Core
+
Core
Survey
Fields
Darwin
Core
Sample
Id
Method
Id
Rela?ve
abundance
...
Collec/ons
Genomics
Fully compatible with
existing Darwin Core
data, plus:
• Which species were
recorded together?
• Which sets of data are
directly comparable?
• Which species were
most abundant in each
sample?
Presence/absence
18
19. Darwin Core Archive (DwC-A)
v
v
v
DwC-A publish DwC records including terms
from DwC-A extensions.
Simple text based format.
Zipped single file archive.
Germplasm.txt
19
20. Darwin Core Archive Assistant (GBIF, 2010)
The Darwin Core Archive Assistant is a web application that presents a
simple interface for describing the data elements a data publisher wishes to
serve to the GBIF network as basic text files and composes the appropriate
XML descriptor file as defined in the Darwin Core Text Guidelines to
accompany them. It communicates with the GBIF registry to provide an upto-date listing of all relevant Darwin Core terms and available extensions
and presents these in a simple checklist format.
http://tools.gbif.org/dwca-assistant/
20
22. Slide
by
Laura
Russell,
VertNet,
September
2011
Fitness
for
use
Defini?on
"The
general
intent
of
describing
the
quality
of
a
par:cular
dataset
or
record
is
to
describe
the
fitness
of
that
dataset
or
record
for
a
par:cular
use
that
one
may
have
in
mind
for
the
data."
Chrisman,
1991
22
23. Slide
by
Donald
Hobern,
2012
Improving
fitness-‐for-‐use
Aggregate
• Progressive
improvement
– Data
indexes
Data
Indexes
• Centralised
discovery
• Standardisa?on
of
persistent
iden?fiers
• Consistent
metadata
– Data
quality
Data
Quality
•
•
•
•
Inconsistencies
within
records
Valida?on
against
metadata
Outlier
detec?on
Metrics
per
record
and
per
data
set
– Expert
cura?on
Expert
Cura/on
• Interface
with
taxon
expert
groups
• Incorporate
findings
of
data
users
• Need
efficient
researcher-‐friendly
tools
23
24. Slide
by
Laura
Russell,
VertNet,
September
2011
Taxonomic
data
Names
are
oeen
the
first
point
of
entry
to
biodiversity
databases.
=>
Risk
of
error
propaga?on
Possible
errors:
• Wrong
iden?fica?on
• Wrong
format
• Spelling
errors
24
25. Slide
by
David
Shorthouse,
Canadensys,
January
2013
The problem with scientific names
•
•
•
•
•
•
•
No
comprehensive
catalog
of
species
Names
≠
species
The
species
problem
–
species
concepts
Compe?ng
classifica?ons
/
phylogenies
Many
names
for
one
taxon
One
name
for
many
taxa
‘Names’
are
more
than
code-‐compliant
scien?fic
names
25
26. Slide
by
David
Shorthouse,
Canadensys,
January
2013
Proposed solution
• Inclusive
– Accommodate
alternate
perspec?ves
• Reconcilia?on
– Map
names
among
and
between
each
other
• Disambigua?on
– Context
to
assign
homonymic
names
to
righmul
place
26
27. Improving
data
quality
The fish collection at
NHM has some
longitude latitude
columns swapped…
Indexed by GBIF 14 January 2013
Noticed and
corrected in April
2013.
(dataset 8102)
Indexed by GBIF 3 May 2013
27
29. Data published through GBIF
440
420
400
380
Primary biodiversity records (millions)
360
340
320
300
280
260
240
220
200
180
160
140
120
100
80
A modest decline in the total number of data records in January 2013 resulted from deletion of duplicates and withdrawn data,
identified through software and processing upgrades.
Last
updated:
2013-‐10-‐02
29
30. GBIF data publishers
580
560
Number of institutions registered as GBIF data publishers
540
520
500
480
460
440
420
400
380
360
340
320
300
280
260
240
220
200
A sharp rise in the number of data publishers in September 2013 results from institutions choosing to register as separate entities rather
than sharing datasets through a single publisher at their national node institution. This helps to raise the visibility and branding of the
institutions, and provides more accurate attribution, especially in the new GBIF portal coming online shortly.
Last
updated:
2013-‐10-‐02
30
31. GBIF citation in research
250
232
GBIF
men?oned
GBIF
discussed
No.
of
peer-‐reviewed
publica?ons
200
197
GBIF-‐mediated
data
used
170
148
150
100
90
89
66
66
61
57
52
43
50
63
64
48
35
25
17
0
2008
Last
updated:
2013-‐10-‐2013
2009
2010
2011
2012
2013
(Jan-‐Sep)
31
33. GBIF portal:
12,5 million occurrences published form Norwegian institutes.
Covering 180 countries worldwide.
34. Danmark
Finland
Norway
Sweden
Oct
2013
Data
set
Occurences
Denmark
45
9
311
741
Finland
57
14
666
474
Iceland
4
458
705
Norway
85
12
531
207
Sweden
47
43
374
550
Status
Nordic
GBIF
data
sets
(data
hosted
by…)
Iceland
34
37. •
•
•
•
•
Custom data portals for Norwegian collections.
Upgrade to Darwin Core archives across Norway.
Persistent identifiers (UUID, QR code).
Data set metadata descriptions (data paper).
GIS data server for spatial environment data.
37
39. • Soeware
from
GBIF
to
implement
online
data
portals
for
biodiversity
data.
– Na?onal,
thema?c
or
regional.
– Based
on
data
published
using
GBIF
standards.
39
40. Slide
by
David
Remsen
(2011)
Different
data
portals
will
implement
very
different
modules
and
func?onality
to
meet
their
own
needs.
40
41. Opportunities with Darwin Core:
UiB
Artskart
UiT
GBIF
Portal
Darwin Core
Archive
S&L
Data portal
for institute,
region, or
theme?
Collec?ons
and
data
sets
published
from
the
data
owner
as
one
single
Darwin
Core
archive
(DwC-‐A).
Different
data
types
from
the
same
DwC-‐A
can
be
included
to
different
data
portals.
41
42.
43. The purpose of identifiers
…is to name things,
making it possible to refer to them.
What is an identifier:
“Each identifier refers to one and only one thing” (Coyle 2006).
“An association between a string and a thing” (Kunze 2003).
“A stated association between a symbol and a thing; that the
symbol may be used to unambiguously refer to the thing
within a given context” (Campbell 2007).
43
44. UUID QR codes for all
museum objects at NHMUiO would provide:
• Machine-readable using an
ordinary smart phone (or PDA).
• Allows for new and efficient
workflows for collection
management.
• Deployment for stable identifiers
appropriate for data-basing.
44
49. •
•
•
•
•
Peer
review
op?on
for
biodiversity
data.
Authors
get
scien?fic
credit
for
data
publica?on.
Mee?ng
concerns
over
data
quality.
Mee?ng
concerns
over
data
cita/on
mechanism.
Metadata
formats:
Ecological
Metadata
Language
(EML),
Dublin
Core,
Darwin
Core,
Natural
Collec?ons
Descrip?ons
(NCD)…
• Towards
à
Each
data
set
published
through
GBIF
accompanied
by
a
data
paper…?
49
51. Why
publish
your
data
•
•
•
•
•
•
Citable
publica?on
Establish
scien?fic
priority
Increase
collabora?on
Link
data
to
bigger
network
Re-‐use
and
mul?ply
effect
Respond
to
funding
requirements
hqp://biodiversitydatajournal.com/
Smith V, Georgiev T, Stoev P, Biserkov J, Miller J, Livermore L,
Baker E, Mietchen D, Couvreur T, Mueller G, Dikow T, Helgen K,
Frank J, Agosti D, Roberts D, Penev L (2013) Beyond dead trees:
integrating the scientific process in the Biodiversity Data Journal.
Biodiversity Data Journal 1: e995. DOI: 10.3897/BDJ.1.e995
52. Data rescue activity:
Many species occurrence data are
“hidden” in reports and
documents produced by
universities, research institutes,
public agencies and the university
museums.
Project with Artsdatabanken
Photo by: Niklas Bildhauer
55. PCA analysis of 54 environmental variables across
Norway versus the National Vegetation Atlas.
PCA
Component 1
PCA
component 2
Bakkestuen, V., Erikstad, L., and Økland, R.H. (2008). Step-less models for
regional environmental variation in Norway. J. Biogeography 35: 1906-1922.
Norwegian Vegetation
Atlas (Moen 1999)
Sections
(Moen 1999)
Zones
(Moen 1999)
Based on a slide
by Vegar Bakkestuen
“PCA
Norway”
55
56. Modeling
Norwegian
fungi
• 83
fungi
species.
• 10.500
occurrences
from
the
GBIF
portal.
• Predic?ve
modeling
of
species
distribu?on.
Amanita phalloides
Catathelasma imperiale
Wollan,
A.
K.,
Bakkestuen,
V.,
Kauserud,
H.,
Gulden.,
G
and
Halvorsen,
R.
2008.
Modelling
and
predic?ng
fungal
distribu?on
paqerns
using
herbarium
data.
J.
Biogeography
35:2298-‐2310.
Slide
by
Vegar
Bakkestuen
Hygrocybe vitellina
Marasmius_siccus
56
57. Node Personnel
Dag Endresen, Node Manager
Christian Svindseth, Database manager
Fridtjof Mehlum, Research Director
Einar Timdal, Associate Professor
Vegar Bakkestuen, Researcher
Geir Søli, Associate Professor
Nils Valland, Artsdatabanken
Wouter Koch, Artsdatabanken
57
58. Thanks for listening!
GBIF Norway
Dag Endresen
dag.endresen@nhm.uio.no
Christian Svindseth
christian.svindseth@nhm.uio.no
58