Combining Data in Species Distribution Models

Combining Data in Species Distribution Models
Bob O’Hara1 Petr Keil 2 Walter Jetz2
1BiK-F, Biodiversity and Climate Change Research Centre
Frankfurt am Main
Germany bobohara
2Department of Ecology and Evolutionary Biology
Yale University
New Haven, CT, USA

Motivation
Map Of Life
www.mol.org/

The Problem
Diﬀerent data sources
GBIF
expert range maps
eBird and similar citizen science eﬀorts
organised surveys (BBS, BMSs)

Pointed Process Models
Point process representation of actual distribution
Continuous space models
Build diﬀerent sampling models on top

Point Processes: Model
Intensity ρ(ξ) at point s. Assume covariates (features?) X(ξ), and
a random ﬁeld ν(ξ)
log(ρ(ξ)) = η(ξ) = βX(ξ) + ν(ξ)
then, for an area A,
P(N(A) = r) =
λ(A)r e−λ(A)
r!
where
λ(A) =
A
eη(s)
ds

In practice...
Constrained refined Delaunay triangulation
λ(A) ≈
N
s=1
|A(s)|eη(s)
Approximate λ(ξ) numerically:
select some integration points,
and sum over those

Some Data Types
Abundance
e.g. Point counts
Presence/absence
surveys, areal lists
Point observations
museum archives, citizen science observations
Expert range maps

Abundance
Assume a small area A, so that η(ξ) is constant, and observation
for a time t, then n(A, t) ∼ Po(eµ(A,t)) with
µA(A, t) = η(A) + log(|A|) + log(t) + log(p)
where p is the proability of observing each indidivual.
Don’t know all of |A|, t and p, so estimate an intercept
Can also add a sampling model to log(p)

Presence/Absence for ’points’
As n(A, t) ∼ Po(µ(A, t)),
cloglogPr(n(A, t)) = µI (A, t)
with µI (A, t) as before
Again, can make log(|A|) + log(t) + log(p) an intercept

Presence only: point process
log Gaussian Cox Process
Likelihood is a Poisson GLM (but with non-integer response)

Areal Presence/absence
If an area is large enough, we can’t assume constant covariates, so
Pr(n(A) > 0) = 1 − e A eρ(ξ)dξ
in pracice this is calculated as
1 − e s |A(s)|eρ(s)
which causes problems with the ﬁtting

Expert Range Maps
Not the same as areal presence.
Instead, use distance to range as
a covariate
within range, this is 0.
Have to estimate the slope
for outside the range
Use informative priors to force
the slope to be negative 0 20 40 60 80 100
0.00.20.40.60.81.0
Space (1d)
Intensity
Species'
Range

Put these together with INLA
Quicker than MCMC
SolTim.res <- inla(SolTim.formula,
family=c('poisson','binomial'),
data=inla.stack.data(stk.all),
control.family = list(list(link = "log"),
list(link = "cloglog")),
control.predictor=list(A=inla.stack.A(stk.all)),
Ntrials=1, E=inla.stack.data(stk.all)$e, verbose=FALSE)

The Solitary Tinamou
Photo credit: Francesco Veronesi on Flickr
(https://www.ﬂickr.com/photos/francesco veronesi/12797666343)

Data
Whole Region
Expert range
Park, absent
Park, present
eBird
GBIF
expert range
2 point
processes (49
points)
28 parks

A Fitted Model
mean sd mode
Intercept -0.30 0.09 -0.30
b.PP 1.37 0.40 1.37
b.GBIF 1.43 0.26 1.43
Forest -0.03 0.04 -0.03
NPP 0.15 0.05 0.15
Altitude -0.02 0.04 -0.02
DistToRange -0.01 0.02 -0.01

Predicted Distribution
−0.10
−0.05
0.00
0.05
0.10
0.15
0.20
0.25
Whole Region
Expert range
Park, absent
Park, present
eBird
GBIF

Individual Data Types
Expert Range
−10
−8
−6
−4
−2
0
GBIF
−0.060
−0.058
−0.056
−0.054
−0.052
−0.050
−0.048
eBird
−0.060
−0.058
−0.056
−0.054
−0.052
−0.050
−0.048
Parks
−10
−8
−6
−4
−2
0
all data
−0.10
−0.05
0.00
0.05
0.10
0.15
0.20
0.25

Summary
Parks and expert range seem to drive distribution
NPP is main covariate, not forest or altitude

What Next
Multiple species
already being done elsewhere
estimate sampling biases
More Data
Point counts (have it working)
Can we estimate absolute probability of presence?
Distance sampling?
Mark-recapture?
scaling issues (in time and space)

Not the ﬁnal answer...
http://www.gocomics.com/nonsequitur/2014/06/24

Combining Data in Species Distribution Models

Recommandé

Recommandé

Contenu connexe

Similaire à Combining Data in Species Distribution Models

Similaire à Combining Data in Species Distribution Models (20)

Plus de Bob O'Hara

Plus de Bob O'Hara (14)

Dernier

Dernier (20)

Combining Data in Species Distribution Models