2. Overview
1. Background
–
the
biodiversity
informa9cs
domain
• The
problem
(i.e.
why
are
we
here)
• Representa6ons
of
the
domain
(data,
infrastructures,
projects…)
• Toward
an
integrated
view
(strategy)
2. Social
challenges
• Openness
• Collabora6on
and
communi6es
• Standards,
iden6fiers
&
protocols
3. (Big)
data
challenges
• Mobilizing
exis6ng
data
(metadata,
literature,
collec6ons)
• New
forms
of
data
([meta]genomics
&
observatories)
4. Synthe9c
challenges
• Data
Aggrega6on
&
linking
• Visualisa6on
• Modeling
5. Next
steps
(data
infrastructures
&
funding)
• Lessons
learned:
new
informa6cs
opportuni6es
in
H2020
4. The problem – integrating biodiversity research
How
to
we
join
up
these
ac0vi0es?
How
do
we
use
this
as
a
tool?
Species
conserva6on
&
protected
areas
Impacts
of
human
development
Biodiversity
&
human
health
Impacts
of
climate
change
Food,
farming
&
biofuels
Invasive
alien
species
What
infrastructures
do
we
need?
(technologies,
tools,
standards…)
What
processes
do
we
need?
(Modelling,
workflows…)
What
data
do
we
need?
(Genes,
locali6es…)
5. Natural History – the foundation
"It
is
interes0ng
to
contemplate
a
tangled
bank,
clothed
with
many
plants
of
many
kinds,
…,
so
different
from
each
other,
and
dependent
upon
each
other
in
so
complex
a
manner,
have
all
been
produced
by
laws
ac0ng
around
us.”
C.
Darwin
"On
the
Origin
of
Species”,
1859
Darwin’s
“tangled
bank”…
Systema9cs,
a
founda9onal
“law”
7. A granular understanding of biodiversity
Genes
GCGC
GTAC
CTAG
Individuals
i
ii
iii
iv
v
vi
Populations
1
2
1
2
3
Local populations
Species
A
B
C
D
E
F
Global
biodiversity
Interactions
A B C D E F
- + + + + +
+ - + + +
+ + -
+ -
+ -
+ -
Biological
networks
GenBank
8. Key
problems
• Landscape
is
complex,
fragmented
&
hard
to
navigate
• Many
audiences
(policy
makers,
scien6sts,
amateurs,
ci6zen
scien6sts)
• Many
scales
(global
solu6ons
to
local
problems)
Figure
adapted
from
Peterson
et
al
2010
Genotype Phenotype
Biotic
Interactions
Environment Human Effects
Niche & Pop.
Ecology
Biodiversity
Loss
Phylogenetic
Trees
Taxonomy
Geographic
Dsitributions
Range Maps
Forecasts of
Change
Conservation &
management
Products
Data
GenBank MorphBank Interactions Geospatial Census
IUCN
TreeBase
IPNI, Zoobank
Pop. data
GBIF
Extent of Occurrence AquaMaps
AquaMaps
Systems
An informaticians view of biodiversity
9. A project centric view of biodiversity
Nomenclators
Index Fungorum
ZooBank
IPNI
(Kew/AUS/Harvard)
ING
AFD/APC/APUI
NZOR
CoL (Sp2000& ITIS)
ZooRecord
PESI:
ERMS
Fauna Europea
Euro+Med Plantbase
ORBIS
WORMS
Flora Europea
Checklists
Phylogenetic
Tree of Life
TreeBase
CIPRES
Molecular
Databases
NCBI/EMBL/DDBJ
CBoL
Barcode of Life
Initiative
Biodiversity
ALA
CONABIO
CRIA (Brazil)
IUCN
SEEK
OPAL
DAISIE
iNaturalist
uBio
PLAZI
Inotaxa
BHL
eFloras
Scan / Mark/up
Identification
Key2Nature
IdentifyLife
Inter-Institutional
Synthesis
BCI
BioCASE
GeoCASE
MaNIS
Institutional
EMu (=MOA)
Recorder
TDWG
LifeWatch
GBIF
CDM
GNA (NameBank) IPNI
Google Scholar
Connotea
ViTaL
ISI
Bibliographic
Descriptive /
classification
EoL
Scratchpads
CATE
MorphoBank
Wikipedia
A
snapshot
from
2009,
“the
dance
of
the
ini0a0ves”
10. The strategic view: community informatics challenges
GBIF
GBIC
Report
(Coming
soon)
EU
Biodiversity
Strategy
(2011)
Biodiv.
Inf.
Challenges
(2013)
Grand
Challenges
for
Biodiversity
Informa6cs
(integra6ng
ac6vi6es
for
H2020)
11. 2.
Social
challenges
-
Openness
-
Collabora6on
and
communi6es
-
Standards,
iden6fiers
&
links
12. Openness in biodiversity informatics
E.
Archambault
et.
al.,
Propor9on
of
Open
Access
Peer-‐Reviewed
Papers
at
the
European
and
World
Levels-‐-‐2004-‐2011,
June
2013,
Science-‐Metrix
Inc.
“One-‐half
of
all
papers
are
now
freely
available
within
a
year
or
two
of
publica0on”
“A
piece
of
data
or
content
is
open
if
anyone
is
free
to
use,
reuse,
and
redistribute
it
-‐
subject,
at
most,
to
the
requirement
to
aOribute
and/or
share-‐alike.”
hfp://opendefini6on.org/
Many
kinds
of
openness:
• Open
Access
• Open
Data
• Open
Science
• Open
Source
• Sharing
data
is
a
founda6on
for
our
ac6vi6es
• Normal
prac6ce
in
some
communi6es
(molecular)
• Mandated
by
some
funders
&
governments
13. Openness in biodiversity informatics
Many
kinds
of
openness:
• Open
Access
• Open
Data
• Open
Science
• Open
Source
Need
to
con0nue
to
incen0vise
openness
“A
piece
of
data
or
content
is
open
if
anyone
is
free
to
use,
reuse,
and
redistribute
it
-‐
subject,
at
most,
to
the
requirement
to
aOribute
and/or
share-‐alike.”
• Sharing
data
is
a
founda6on
for
our
ac6vi6es
• Normal
prac6ce
in
some
communi6es
(molecular)
• Mandated
by
some
funders
&
governments
hfp://opendefini6on.org/
Incen6vise
through
credit
via
cita6on
(e.g.
BDJ)
14. What
are
Scratchpads?
(hfp://scratchpads.eu)
Taxa
Projects
Regions
Socie9es
544
Scratchpad
Communi6es
by
6,644
ac6ve
registered
users
covering
91,631
taxa
in
535,317
pages.
81
paper
cita9ons
in
2012
In
total
more
than
1,300,000
visitors
e.g.,
Scratchpad
Virtual
Research
Communi0es
Collaboration & communities
Making
taxonomy
a
team
sport
Our
infrastructures
need
to
facilitate
collabora0on
15. Standards, identifiers & protocols
Standards
can’t
be
developed
in
isola0on
–
they
must
be
used
Key
requirements:
• Need
to
be
inclusive,
prac6cal
&
extensible
• Readable
by
humans
&
machines
• Widely
used
Good
examples:
• Darwin
Core
• CrossRef
&
DataCite
DOIs
• ORCHID
Author
iden6fiers
Gaps
/
Problems
• Reuse
&
persistence
of
iden6fiers
• Vocabularies
&
ontologies
(6me
consuming
/
lifle
reward)
Poten0al
solu0ons
• Build
them
into
our
credit
systems
• Show
sema6c
reasoning
poten6al
(LOD
&
RDF
demonstrators)
A
founda6on
for
integra6on
Facilita9ng
data
sharing
across
communi9es
16. 3.
(Big)
data
challenges
-
Mobilising
exis6ng
data
-
New
forms
of
data
17. Mobilising existing data
Collec0ons
• 1.5-‐3B
specimens
in
collec6ons
worldwide
• Fragments
efforts
/
heterogeneity
of
process
• Needs
ambi6on
(NHM:
20M
in
5
yrs.)
&
coord.
Literature
• >300M
pages
of
biodiversity
literature
• BHL
(41M
pp.)
an
example
of
what
can
be
done
• Needs
a
sustainability
&
ar6cle
metadata
Metadata
registries
• Data
about
data
(cheaper
&
scalable)
• e.g.
bibliographic
data,
dataset
portals
Informa0cs
challenges
• Storage
&
persistence
• Automa6on
&
annota6on
• Incen6ves
to
digi6se
&
fitness
for
use
Collec9ons,
literature
&
metadata
How
can
we
quickly,
efficiently
and
cost
effec6vely
mobilise
biological
data
at
scale?
Bibliography
of
Life
(RefFinder
&
RefBank)
BHL
literature
NHM
Digi0sa0on
18. Mobilising & managing new forms of data
New
Molecular
approaches
• Molecular
detec6on
&
monitoring
of
organisms
is
rou6ne
• Metagenomics
(env.
sequencing)
commonplace
• Becoming
the
1°
route
to
understanding
biodiversity
Ecological
observatories
• Automated
biodiversity
detec6on
• Remote
sensing
(e.g.
satellite
&
acous6c
data,
drones,
camera
traps)
• Monitoring
conspicuous,
rare
or
invasive
spp.
(algal
blooms,
palms)
• Monitoring
human
ac6vity
Informa0cs
challenges
• Very
large
quan66es
of
data
(2.5-‐10TB
per
researcher
per
yr.)
• Doesn’t
map
well
to
exis6ng
data
infrastructures
• Challenge
current
networking
&
storage
capacity
• Digital
and
physical
collec6ons
become
equally
important?
3-‐4
June
2013,
NHM
22
July,
2013
Metagenomics
&
ecological
observatories
These
new
data
types
do
not
depend
on
tradi6onal
taxonomy
&
systema6cs
20. Aggregation & linking
Portals
bringing
together
distributed
&
diverse
forms
of
data
Giving
consistent
and
comprehensive
access
to
all
biological
data
Several
approaches,
with
different
advantages
• Tightly
coupled
to
a
few
data
sources
• (e.g.
eMonocot,
CDM)
• Loosely
coupled
to
many
sources
• (e.g.
BioNames,
Wikipedia)
• Hybrid
forms
(e.g.
Canadensys,
EOL,
GBIF)
Informa0cs
challenges
• Portals
are
hard
to
sustain
• New
methods
of
data
discovery
&
access
• Create
new
windows
(views)
on
content
• New
data
structures,
new
types
of
database
Scalable
but
less
accurate
(3M
taxon
names,
93k
phylogenies
&
28k
ar6cles)
BioNames
Selec0ve
&
accurate
but
hard
to
scale
(276k
taxa,
8k
images,
13
keys
&
3
phylogenies)
eMonocot
21. Visualisation
Visually
synthesizing
large,
linked
biodiversity
datasets
Making
biodiversity
data
accessible
&
understandable
NHM
specimen
records
hfp://data.nhm.ac.uk/globe/
Research
opportuni0es
• Tools
integra6on
(e.g.
GeoCat,
CartoDB)
• Span
mul6ple
audiences
Outreach
opportuni0es
• Visually
compelling
story
telling
• Crowdsourcing
tools
(e.g.
Notes
From
Nature)
Exploi0ng
new
technologies
• Touch
screens
• Mobile
• Loca6on
awareness
Informa0cs
challenges
• Very
specific
to
individual
use
cases
• Sustainability
issues
22. Modeling the biosphere: a (the) 30 year goal?
Conceptually
has
many
poten0al
uses
• Iden6fying
trends
• Explaining
paferns
• Making
predic6ons
• Real
6me
alerts
-‐
when
data
contradicts
current
knowledge
• The
ul6mate
policy
tool
Major
informa0cs
challenges
• Technical
very
difficult
(many
years
off)
• Needs
effec6ve
prototypes
&
plarorms
• Some
first
steps
e.g.
OBOE,
LEFT
Nature
2013,
doi:10.1038/493295a
Reasoning
across
large,
linked
biodiversity
datasets
A
clear,
singular,
long-‐term
vision,
which
biodiversity
data
can
contribute
too
24. Lessons learned: new opportunities in H2020
PATHWAYS
TO
INTEGRATION
(by
addressing
these
social,
data
&
synthe0c
challenges)
• Break
out
of
the
discipline,
technical
&
project
centric
ac9vi9es
(it
is
unsustainable,
inefficient
&
bad
for
science)
• Integrate
&
build
on
exi9ng
programmes
where
possible
(LifeWatch
is
a
poten6al
umbrella
for
these
ac6vi6es)
• Bridge
the
disconnect
between
informa9cians
&
users
(make
the
users
informa6cians
&
in
informa6cians
users)
• Our
products
well
suited
to
address
these
challenges
• Use
H2020
as
a
mechanism
to
achieve
integra9on
How
do
we
join
up
these
ac0vi0es?
26. Possible biodiversity informatics design principles*
1. Start
with
needs
-‐
focus
on
real
user
needs
(not
just
the
‘official
process’)
2. Do
less
-‐
if
someone
else
is
doing
it,
link
to
it
or
use
it
3. Design
with
data
-‐
prototype
and
test
with
real
users
on
the
live
website
4. Do
the
hard
work
to
make
it
simple
-‐
let
the
computer
take
the
strain
5. Iterate.
Then
iterate
again.
-‐
itera0on
reduces
risk
&
is
more
sustainable
6. Build
for
inclusion
–
it’s
easier
in
the
long
run
7. Understand
context
-‐
we
are
designing
for
people,
not
a
screen
or
a
brand
8. Build
digital
services,
not
websites
-‐
there
is
life
beyond
the
website
9. Be
consistent,
not
uniform
-‐
every
circumstance
is
different
10. Make
things
open:
it
makes
things
bejer
-‐
it’s
more
sustainable
=
experience
from
7-‐years
with
the
Scratchpads
=
lessons
for
infrastructures
in
H2020?
*hfps://www.gov.uk/designprinciples
27. Mobilising existing data: how to prioritise
Nick
Poole,
UK
Collec6ons
Trust
CONTENT
METADATA
A
LITTLE
A
LOT
Digi6se
a
few
things
&
invest
in
depth,
descrip6on
&
promo6on
Digi6se
lots
of
things,
put
lifle
effort
into
descrip6on
&
promo6on
FUN
OUTREACH
LEARNING
RESEARCH
AGGREGATION
DATA
MINING
COLECTIONS
MANAGEMENT
28. Collaboration & communities
• Very
few
recent
single
author
papers
• Most
(fundable)
science
is
cross-‐disciplinary
• Need
to
incen6vise
data
cura6on
&
annota6on
• Need
mechanisms
to
share
annota6ons
Our
infrastructures
need
to
facilitate
collabora0on
Joppa et al, 2011
CONE
SNAILS
BIRDS
MAMMALS
AMPHIBIANS
SPIDERS
PLANTS
Average
dates
when
increasing
numbers
of
taxonomists
were
involved
in
describing
species
Making
taxonomy
a
team
sport