Machine Learning for Understanding and Managing Ecosystems

Machine Learning for
Understanding and Managing
Ecosystems
Tom Dietterich
Oregon State University
In collaboration with
Postdocs: Dan Sheldon (now at UMass, Amherst), Mark Crowley (now at U.
Waterloo)
Graduate Students: Majid Taleghan, Kim Hall, Liping Liu, Akshat Kumar, Tao
Sun, Rachel Houtman, Sean McGregor, Hailey Buckingham
Economists: H. Jo Albers, Claire Montgomery
Cornell Lab of Ornithology: Steve Kelling, Daniel Fink,
Andrew Farnsworth, Wes Hochachka, Benjamin Van Doren,
Kevin Webb
1
IBM Cognitive Computing

The World Faces Many
Sustainability Challenges
Species Extinctions
Invasive Species
Effects of Climate Change on these
2

Computational Sustainability
The study of computational
methods that can contribute
to the sustainable
management of the earth’s
ecosystems
Data  Models  Policies
Data
Integration
Data
Interpretation
Model Fitting
Policy
Optimization
Data
Acquisition
Policy
Execution
3

Outline:
Three Projects at Oregon State
Models of Bird Migration
 Collective Graphical Models
Policy Optimization
 Controlling Invasive Species
 Managing Wildland Fire
Data
Integration
Data
Interpretation
Model Fitting
Policy
Optimization
Data
Acquisition
Policy
Execution
4

BirdCast Project
Understanding Bird Migration
Goal:
 Develop a scientific model of bird migration
 Produce 24- and 48-hour bird migration forecasts
Understanding bird decision making
 Absolute timing (e.g., based on day length)
 Temperature
 Wind speed and direction
 Relative humidity
 Food availability
5

Data (1): www.ebird.org
Volunteer Bird
Watchers
 Stationary Count
 Travelling Count
Time, place,
duration, distance
travelled
Checklist of
species seen
8,000-12,000
checklists
uploaded per day
6

Data (2): Doppler Weather Radar
 Radar detects
 weather (remove)
 smoke, dust, and
insects (remove)
 birds and bats
7

Data (3): Acoustic monitoring
Night flight calls
People can identify species or
species groups from these
calls
8

Modeling Goal:
Spatial Hidden Markov Model
 Define a grid over the US
 Consider a single bird
 We say the bird is in state 𝑖𝑖 on day 𝑡𝑡 if it is
located inside cell 𝑖𝑖 on that day
 Let 𝑃𝑃𝑡𝑡(𝑖𝑖 → 𝑗𝑗) be the probability that the
bird will fly from cell 𝑖𝑖 to cell 𝑗𝑗 on the night
from day 𝑡𝑡 to day 𝑡𝑡 + 1
 We will represent this probability in terms
of variables such as
 wind speed and direction
 distance from 𝑖𝑖 to 𝑗𝑗
 air temperature
 relative humidity
 day of the year
 etc.
 Let Θ be the coefficients of the probability
model.
9

Simulating the Migration of a
Single Bird
 Assume we know the value of Θ
 The bird starts in cell 4 at time 𝑡𝑡 = 1
 𝑛𝑛1 4 = 1
 Simulate the first night by drawing a
cell 𝑗𝑗 according to 𝑃𝑃𝑡𝑡 4 → 𝑗𝑗
 “rolling a dice”
 Repeat this for 𝑇𝑇 time steps
 If we had enough bird watchers, we
could map out the trajectory of the bird
 Then we could match that against our
simulated trajectory and adjust Θ until
the simulations matched the observed
behavior
10

Single Bird
 𝑛𝑛1 4 = 1
behavior
11

Single Bird
 𝑛𝑛1 4 = 1
behavior
12

Population of Birds
Consider a population of 𝑀𝑀 birds
The state of this population is a vector 𝐧𝐧𝑡𝑡 such that 𝐧𝐧𝑡𝑡(𝑖𝑖) is
the number of birds in cell 𝑖𝑖 on day 𝑡𝑡
We can simulate each of these birds moving simultaneously
 each bird “rolls a dice” every night to decide where to go
If we have enough bird watchers, we can get a good estimate
of 𝐧𝐧𝑡𝑡 every day
We can compare our simulations against the observations
and adjust Θ until they match
13

This is very slow
Computer Science to the rescue
Formulate the problem mathematically
Formalism is called the “Collective Graphical Model”
(CGM)
Develop algorithms for probabilistic inference
Use these algorithms to fit the model to the observations
14

16 grid cells
Probabilistic Inference for CGMs
Gibbs sampler + Markov
basis
[Sheldon, Dietterich, NIPS 2011]
15

16 grid cells
49 grid cells
basis
16

16 grid cells
49 grid cells
basis
Convex optimization
[Sheldon, Sun, Kumar, ICML 2013]
17

16 grid cells
49 grid cells
basis
Asymptotic Gaussian
approximation
[Liu, Sheldon, Dietterich ICML 2014]
No Data
18

16 grid cells
49 grid cells
basis
approximation
Non-linear belief
propagation
[Sun, Sheldon, Kumar, ICML 2015]
19

16 grid cells
basis
approximation
Non-linear belief
propagation
[Sun, Sheldon, Kumar, ICML 2015]
Proximal algorithm
[Vilnis, Belanger, Sheldon, McCallum UAI
2015]
49 grid cells
20

Initial Results:
Ruby-throated Humming Bird
21

Need to Constrain the Model
Problem: The migration model tends to “store” birds in
Canada
 There are no observations there, so the model is not constrained by
the data
Solution: Constrain the model
 Specify the times and places where the CGM is allowed to have birds
22

Constrained Results:
Ruby-Throated Humming Bird
23

Fitted Transition Parameters Θ
Distance and direction traveled:
northness: −0.4808
distance: 0.1895
stayput: 3.5058
time: 0.5217
temperature: −0.1556
wind profit: 0.2754
24

Next Steps: Integrating Multiple
Data Sources
25
𝒏𝒏𝑡𝑡
𝑠𝑠 𝒏𝒏𝑡𝑡,𝑡𝑡+1
𝑠𝑠
𝑒𝑒𝑡𝑡
𝑠𝑠
(𝑖𝑖, 𝑜𝑜)
𝑠𝑠 = 1, … , 𝑆𝑆
𝑚𝑚𝑡𝑡,𝑡𝑡+1
𝑠𝑠
(𝑘𝑘)
𝑦𝑦𝑡𝑡,𝑡𝑡+1
𝑠𝑠
(𝑘𝑘)
𝑟𝑟𝑡𝑡,𝑡𝑡+1(𝑣𝑣)
𝑧𝑧𝑡𝑡,𝑡𝑡+1(𝑣𝑣)
……
𝑜𝑜 = 1, … , 𝑂𝑂(𝑖𝑖, 𝑡𝑡)
𝑠𝑠 = 1, … , 𝑆𝑆
𝑖𝑖 = 1, … , 𝐿𝐿
𝑠𝑠 = 1, … , 𝑆𝑆
𝑘𝑘 = 1, … , 𝐾𝐾 𝑣𝑣 = 1, … , 𝑉𝑉
eBird acoustic radar
birds
𝒙𝒙𝑡𝑡,𝑡𝑡+1𝒙𝒙𝑡𝑡

Outline:
Data
Integration
Data
Interpretation
Model Fitting
Policy
Optimization
Data
Acquisition
Policy
Execution
26

Invasive Species Management in
River Networks
Tamarisk: invasive tree from the
Middle East
 Out-competes native vegetation for
water
 Reduces biodiversity
What is the best way to manage
a spatially-spreading organism?
27

Mathematical Model
Tree-structured river network
 Each segment 𝑒𝑒 has 𝐻𝐻 “sites” where a tree
can grow.
 Each site can be
 {empty, occupied by native, occupied by
invasive}
Management actions
 Each segment: {do nothing, eradicate,
restore, eradicate+restore}
𝑒𝑒1 𝑒𝑒2
𝑒𝑒3
𝑒𝑒4
𝑒𝑒5
n
28

Dynamics and Objective
Dynamics:
 In each time period 𝑒𝑒1 𝑒𝑒2
𝑒𝑒3
𝑒𝑒4
𝑒𝑒5
n
29

Dynamics:
 In each time period
 Natural death
𝑒𝑒1 𝑒𝑒2
𝑒𝑒3
𝑒𝑒4
𝑒𝑒5
n
30

Dynamics:
 Natural death
 Seed production
𝑒𝑒1 𝑒𝑒2
𝑒𝑒3
𝑒𝑒4
𝑒𝑒5
n
31

Dynamics:
 Natural death
 Seed production
 Seed dispersal (preferentially downstream)
𝑒𝑒1 𝑒𝑒2
𝑒𝑒3
𝑒𝑒4
𝑒𝑒5
n
32

Dynamics:
 Natural death
 Seed production
 Seed competition to become established
𝑒𝑒1 𝑒𝑒2
𝑒𝑒3
𝑒𝑒4
𝑒𝑒5
tnnnn
33

Dynamics:
 Natural death
 Seed production
 Couples all edges because of spatial spread
 Inference is intractable
𝑒𝑒1 𝑒𝑒2
𝑒𝑒3
𝑒𝑒4
𝑒𝑒5
tnnnn
34

Dynamics:
 Natural death
 Seed production
 Couples all edges because of spatial spread
 Inference is intractable
Objective:
 Minimize expected discounted costs
(sum of cost of invasion plus cost of
management)
 Subject to annual budget constraint
𝑒𝑒1 𝑒𝑒2
𝑒𝑒3
𝑒𝑒4
𝑒𝑒5
tnnnn
35

Finding the Optimal Management
Policy
Formalize as a Markov Decision Process
Solve by Stochastic Dynamic Programming
SDP requires transition matrix 𝑇𝑇 𝑖𝑖, 𝑗𝑗, 𝑎𝑎 = 𝑃𝑃(𝑗𝑗|𝑖𝑖, 𝑎𝑎)
We don’t know 𝑇𝑇
Solution:
 Write a simulator
 Draw Monte Carlo samples from simulator to estimate 𝑇𝑇[𝑖𝑖, 𝑗𝑗, 𝑎𝑎]
36

Solving the Tamarisk MDP using
Monte Carlo Samples
Repeat
 Use the current policy to choose a state 𝑖𝑖 and management action 𝑎𝑎
 Invoke the simulator
 𝑖𝑖, 𝑎𝑎 → (𝑗𝑗, 𝑐𝑐)
 𝑗𝑗 is the resulting state
 𝑐𝑐 is the cost of the action and the resulting state
 Update our model of 𝑇𝑇
 Apply stochastic dynamic programming to compute an improved policy
Until the policy has converged
Key question: What 𝑖𝑖, 𝑎𝑎 should we choose?
Our answer: The DDV heuristic
37

Comparison against best previous
Monte Carlo MDP planning method
38
1.E+05
1.E+06
1.E+07
NumberofSamples
MDP
DDV
Fiechter

Published Rule of Thumb Policies
for Invasive Species Management
Triage Policy
 Treat most-invaded edge first
 Break ties by treating upstream first
Leading edge
 Eradicate along the leading edge of invasion
Chades, et al.
 Treat most-upstream invaded edge first
 Break ties by amount of invasion
DDV
 Our PAC solution
39

Cost Comparisons:
Rule of Thumb Policies vs. DDV
0
50
100
150
200
250
300
350
400
450
Large pop, up
to down
Chades Leading Edge Optimal
Total Costs
Triage DDVChades Leading
Edge
40

Outline:
Data
Integration
Data
Interpretation
Model Fitting
Policy
Optimization
Data
Acquisition
Policy
Execution
41

Managing Wildfire in Eastern
Oregon
 Natural state:
 Large Ponderosa Pine trees with
open understory
 Frequent “ground fires” that remove
understory plants (grasses, shrubs)
but do not damage trees
 Fires have been suppressed since
1920s
 Heavy accumulation of fuels in
understory
 Large catastrophic fires that kill all
trees and damage soils
 Huge firefighting costs and lives lost
42

Study Area: Deschutes National
Forest
Goal: Return the landscape
to its “natural” fire regime
Management Question:
 LET-BURN: When lightning
ignites a fire, should we let it
burn?
43

Formulating LETBURN as a Markov
Decision Process 〈𝑆𝑆, 𝐴𝐴, 𝑅𝑅, 𝑇𝑇, 𝛾𝛾〉
 State space: 𝑆𝑆
 4000 management units; each unit is in one of 25 local states
 Weather
 Ignition site
 Action space: 𝐴𝐴
 At fire ignition time 𝑡𝑡, 𝑎𝑎𝑡𝑡 ∈ 𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿, 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆
 Reward function: 𝑅𝑅(𝑠𝑠, ℓ, 𝑎𝑎)
 Cost of lost timber value
 Cost of lost species habitat
 Cost of fire suppression
44
𝑠𝑠𝑡𝑡
ignition
𝑎𝑎𝑡𝑡
action
ℓ𝑡𝑡
fire outcome
𝑠𝑠𝑡𝑡+1
new ignition
fire simulator lightning
simulator
𝑟𝑟𝑡𝑡

The Simulator is Very Expensive
Simulating one fire can take from 5 to 60 minutes (depending
on the size of the fire)
 FARSITE
 Forest Vegetation Simulator (FVS)
 Lightning Strike model
 Weather Simulator
Monte Carlo methods require at least 106 simulator calls
What can we do?
45

Current Strategy:
Policy Search using a Surrogate
Model
Define a parameterized space of policies: 𝜋𝜋𝜃𝜃 𝑠𝑠 = 𝑎𝑎
Simulate an initial set of 100-year trajectories under a variety
of policies
Apply Bayesian Optimization (SMAC; Hutter, et al., 2011) to
find the optimal value of 𝜃𝜃
To simulate 𝜋𝜋𝜃𝜃′ for some new 𝜃𝜃′
, apply the Model-Free
Monte Carlo algorithm (Fonteneau, et al., 2013)
46

A Simpler Problem:
LETBURN one year
Is there any benefit to allowing fires to burn for just
one year?
Year 1: LETBURN
Years 2-100: SUPPRESS ALL
Evaluate via Monte Carlo trials
47

Expected Benefit of LETBURN
(Suppress all fires after year 1)
0
5
10
15
20
25
30
35
-2 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60
Frequency
Expected Benefit (x $100,000)
mean = $2.47
million
median =
$2.74
million
48[Houtman, Montgomery, Gagnon, Calkin, Dietterich, McGregor, Crowley 2013]IBM Cognitive Computing

Summary
Data
Integration
Data
Interpretation
Model Fitting
Policy
Optimization
Data
Acquisition
Policy
Execution
49

Common Threads
Spatially-spreading processes
 Bird migration
 Invasive species
 Fire spread
Dynamical model
 CGM: Spatial HMM with clever inference
 Simulator of seed spread
 Simulator of fire spread
Computational challenges
 Efficient probabilistic inference
 Minimize calls to expensive simulators
 Value of information heuristics + PAC guarantees
 Bayesian optimization
50

Thank-you
 Dan Sheldon, Akshat Kumar, Tao Sun: Collective Graphical Models
 Steve Kelling, Andrew Farnsworth, Wes Hochachka, Daniel Fink:
BirdCast
 H. Jo Albers, Kim Hall, Majid Taleghan, Mark Crowley: Tamarisk
 Claire Montgomery, Sean McGregor, Mark Crowley, Rachel Houtman
 Carla Gomes for spearheading the Institute for Computational
Sustainability
 National Science Foundation Grants 0832804 (CompSust), 1331932
(CyberSEES), 1125228 (Birdcast), 1521687 (CompSustNet)
51

Common Threads
Spatially-spreading processes
 Bird migration
 Invasive species
 Fire spread
Dynamical model
 CGM: Spatial HMM with clever inference
 Simulator of seed spread
 Simulator of fire spread
Computational challenges
 Efficient probabilistic inference
 Minimize calls to expensive simulators
 Value of information heuristics + PAC guarantees
 Bayesian optimization
52

Machine Learning for Understanding and Managing Ecosystems

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (15)

Similaire à Machine Learning for Understanding and Managing Ecosystems

Similaire à Machine Learning for Understanding and Managing Ecosystems (20)

Plus de diannepatricia

Plus de diannepatricia (20)

Dernier

Dernier (20)

Machine Learning for Understanding and Managing Ecosystems