Curses, tradeoffs, and scalable management: advancing evolutionary direct policy search to improve water reservoir operations
1. Curses, tradeoffs, and scalable management: advancing evolutionary
direct policy search to improve water reservoir operations
M, Giuliani1, A. Castelletti1, F. Pianosi2, E. Mason1, and P.M. Reed3
1. Dept. Electronics, Information, and Bioengineering - Hydroinformatics Lab, Politecnico di Milano
2. Dept. Civil Engineering, Bristol University
3. School of Civil and Environmental Engineering, Cornell University
2. Is it still worth studying reservoir operations?
Rippl, W. (1883). The capacity of storage
reservoirs for water supply. Minutes of the
Proceedings, Institution of Civil Engineers,
Vol. 71, Thomas Telford. 270-278.
Maass, et al. [1962]. Design of water-
resource systems. Harvard University
Press Cambridge, Mass.
Loucks et al. [2005]. Water Resources
Systems Planning and Management: An
Introduction to Methods, Models and
Applications. UNESCO, Paris, France.
3. Is it still worth studying reservoir operations?
Rippl, W. (1883). The capacity of storage
reservoirs for water supply. Minutes of the
Proceedings, Institution of Civil Engineers,
Vol. 71, Thomas Telford. 270-278.
Maass, et al. [1962]. Design of water-
resource systems. Harvard University
Press Cambridge, Mass.
Loucks et al. [2005]. Water Resources
Systems Planning and Management: An
Introduction to Methods, Models and
Applications. UNESCO, Paris, France.
4. Is it still worth studying reservoir operations?
Climate change is affecting hydrologic regimes
source: Euro CORDEX
Socio-economic change is impacting energy demand
source: EUROSTAT 2012
† 79 percent in the Middle East and North Africa
† 78 percent in Europe and Central Asia
† 75 percent in South Asia
† 62 percent in Latin America and the Caribbean*
As a matter of scale, if Africa were to develop the same share of hydropower
potential as Canada, it would realize an eight-fold increase in electricity supply
and, with complementary investments in transmission and distribution, bring
electricity to the entire continent with multiple additional benefits for water man-
agement and regional integration.
These estimates cover potential new (greenfield) site developments only. Sig-
nificant additional amounts of energy and capacity are available from rehabili-
tation of existing energy and water assets, from redesign of infrastructure to meet
emerging demands and opportunities, and from modification of water alloca-
tions and management (reoperation) for a different set of outcomes.
Notwithstanding the strong development rationale, the enormous technical
potential, and the improved understanding of good practices, scaling up hy-
dropower faces important constraints and barriers:
amount of untapped hydro-
er in the developing world is
mendous—nearly four times
apacity currently installed in
Europe and North America.
mically Feasible Hydro
tial & Production
orld Bank Region)
omically feasible hydropower
ntial
uction by hydro plants
004–5
2,000,000
1,600,000
1,200,000
800,000
400,000
0 EAP
(without
China)
China ECA High
Non-
OECD
LCR MENA OECD SAR Africa
GWh/year
n International Journal on Hydro-
nd Dams, World Atlas 2006 and vari-
nal statistics.
Future potential for hydropower
source: World Bank
EAP China ECA High
Non-
OECD
LCR MENA OECD SAR Africa
GWh/y
economically feasible potential
production in 2005-2005
5. Classical approach: Stochastic Dynamic Programming
The long-term optimal operation of water reservoirs systems can be formulated as a
q-objective stochastic optimal control problem:
min
p
J = |J1
, J2
, . . . , Jq
|
xt+1 = ft(xt, ut, t+1)
"t+1 ⇠ (·)
ut = p(xt)
subject to
delay&
t reservoir&
water&
system&
Feedback control
scheme
p
ut
"t+1
xt+1
SDP provides an optimal solution under the following assumptions:
1. Discrete variable domain
2. Objectives and constraints time-separable
3. Disturbance process time-independent
Richard Bellman
6. SDP and the 3 curses
Bellman [1957]. Dynamic Programming. Princeton University Press.
1. Curse of dimensionality: computational cost grows exponentially with state, control
and disturbance dimension [Bellman, 1967]
# reservoirs
1 2 3 4 5
computationaleffort
(Nx)nx
· (Nu)nu
· (N")n"
7. SDP and the 3 curses
Bertsekas and Tsitsiklis [1996]. Neuro-dynamic programming. Athena Scientific.
2. Curse of modelling: any variable considered among the operating policy arguments
has to be described by a dynamic model to fully predict the one-step ahead
model transition for the computation of the value function [Bertsekas and
Tsitsiklis, 1996]
How to represent exogenous
information (e.g., inflow,
precipitation, SWE)?
• identify a dynamic model which
adds to the curse of dimensionality
• stochastic disturbance with an
associated pdf
8. SDP and the 3 curses
Powell [2007] Approximate Dynamic Programming: Solving the curses of dimensionality. Wiley.
3. Curse of multiple objectives: computational cost has a factorial grows rate with the
number of objectives considered [Powell, 2007]
# objectives
1 2 3 4 5 6 7 8 9 10
#sub-problems
0
200
400
600
800
1000
1200
min f1
min |f1, f2|
min |f1, f2, f3|
kX
i=1
i!
i!(k i!)
+ k
9. Direct Policy Search
Assume the operating rule belongs to a given family of functions and search the
optimal solution in the policy parameters space.
storage
release
✓1
✓3
✓4
✓2
Traditional rules defined empirically
for small systems and one single
objectives [Oliveira and Loucks,
1999; Lund and Guzman, 1999]:
• New York City Rule
• Space Rule
• Standard Operating Policy
can this approach be generalized to manage large scale systems accounting
for multiple competing objectives under uncertainty?
Oliveira and Loucks [1997]. Water Resources Research, 33(4), 839-852.
Lund and Guzman [1999]. Journal of Water Resources Planning and Management, 125(3). 143-153.
10. Universal approximators, such as nonlinear approximating networks, can be used to
increase the flexibility of the operating policy
How to select the policy approximation?
x1
x2
x3
u1
x1
x2
x3
u1
Artificial Neural Networks Gaussian Radial Basis Functions
n✓ = nu(N(nx + 2) + 1) n✓ = N(2nx + nu)
11. Selecting the optimization algorithm
Challenges associated to complex policy approximations
• High dimensional search spaces (rich parameterizations)
• Complex search spaces (many local minima)
• Sensitivity to algorithm initialization (no-preconditioning)
• Non differentiable objective functions
• Multiple objectives
• Sensitivity to noise
12. Selecting the optimization algorithm
Challenges associated to complex policy approximations
• High dimensional search spaces (rich parameterizations)
• Complex search spaces (many local minima)
• Sensitivity to algorithm initialization (no-preconditioning)
• Non differentiable objective functions
• Multiple objectives
• Sensitivity to noise
We used the Borg MOEA [Hadka and Reed 2013]
which has been shown to be highly robust across
a diverse suite of challenging multi-objective
problems.
Hadka and Reed (2013). Evolutionary Computation, 21(2), 231-259.
13. EMODPS diagnostic framework
The best policy approximation is identified by evaluating the following criteria:
• Pareto approximate front quality
• Search process reliability
• Performance over validation horizon
• Policy analysis
14. The Red River system - Vietnam
Hanoi
HoaBinh
VIETNAM
CHINA
LAOS
CAMBODIA
THAILAND
Da
Thao Lo
Hoa
D
(a) (b)
Modeled reservoir
Recently construced reservoirs
15. The Hoa Binh reservoir
Main characteristrics
Catchment area 52,000 km2
Active capacity 6 x 109 m3
8 penstocks 2,360 m3/s (240 MW)
12 bottom gates 22,000 m3/s
6 spillways 14,000 m3/s
Primary operating objectives:
1. hydropower production
2. flood control
flooding
point
hydropower
plant
HoaBinh
Hanoi
Da River
Thao River
Lo River
16. flooding
point
hydropower
plant
HoaBinh
Hanoi
Da River
Thao River
Lo River
• 7 ANN and 7 RBF policy approximations
• NFE = 500,000 per replication with
default Borg MOEA parameterization
• 20 replications to avoid dependence on
randomness
• Optimization over historical horizon
1962-1969, which comprises normal, wet
and dry years
• Validation via simulation over historical
horizon 1995-2004
Experiment setting
17. ANN vs RBF policy performance
0.3
0.35
0.6
0.7
0.6
0.7
0.8
(b) Generational distance (c) Additive ε-indicator (d) Hypervolume
0 100 200 300 400 500 600 700
1.8
1.9
2
2.1
2.2
2.3
2.4
2.5
2.6
x 10
7
J [cm /day]flo 2
J[kWh/day]
hyd
4
6
8
10
12
14
16
Legend: ANNs - RBFs
number of neurons/basis
overall reference set
(a) Policy Performance with different ANNs and RBFs architectures
23. EMODPS policy analysis
Multivariate representation of the RBF policy approximated via Latin Hypercube Sampling of the
policy input domains.
monsoon
season
small lateral
flow
medium-high
inflow
24. Conclusions and further research
• RBF and ANN solutions seem to perform relatively well when evaluated in terms of
policy performance.
• RBFs outperform ANNs in terms of quality of the Pareto approximate front and
reliability of the policy design process.
• EMODPS shows the potential to overcome most of the limitations of DP family
solutions.
25. Scalability - system dimensionality
DELTA
SonLa
reservoir
HoaBinh
reservoir
TuyenQuang
reservoir
Hanoi
SonTay
Day
diversion
Da River
Thao River
Lo River
Sea
Tide
Gam River
Chay River
Problem complexity:
• hourly regulation of 3 reservoirs
• 5 state variables
• 5 sub-catchments
• 117 decision variables (policy parameters)
• 3 competing objectives
IMRR research project: Integrated and sustainable Management of the Red-Thai Binh River system in a changing climate
26. Scalability - use of information
1. Quantifying the Expected Value of
Perfect Information (EVPI)
a) Design the Basic Operating Policy (BOP);
b) Design the Perfect Operating Policy (POP);
c) Determine EVPI as the difference in the
systemʼs performance between BOP and POP.
2. Information selection
a) Build a set of exogenous variables;
b) Build a sample data set including variables at
point 2.a, time, and current system conditions;
c) Automatically select the most valuable
variables to explain the optimal release
decisions' sequence associated to the POP.
3. Design the Improved Operating Policy
(IOP)
a) Design an operating policy conditioned upon
the information selected at step 2.
4. Assessing the Expected Value of Sample
Information (EVSI)
a) Contrasting the systemʼs performance of IOP
and the references of BOP and POP.
b) Contrasting IOP, BOP and POP using the
metrics for the assessment of the value of
information.
POP | EVPI
most valuable variables
IOP
EVPI termination
test
EVSI
EVPI
no
yes
best IOP
Is EVPI big
enough?
no
yes
use BOP
POP
J2
J1
J2 POP
BOP
- Multi objective
J
POP BOP
- Single objective
IOP
IOP
J1
POP
BOP
- Multi objective
J
POP BOP
- Single objective
0 100 200 300 400 500 600 700 800 900
2.05
2.1
2.15
2.2
2.25
2.3
2.35
2.4
2.45
2.5
2.55
x 10
7
POP
BOP(t)
IOP(t,st)
IOP(t,st,qt )
IOP(t,st,qt ,qt )
VQ
VQ TB
target
solution
J [cm /day]
flo 2
J[kWh/day]
hyd
0.5
0.6
0.7
(c) minimum distance from target
0.5
0.6
0.7
(d) average distance from target
0.6
0.7
0.8
0.9
1
(b) Hypervolume indicator
(a) Comparison of Perfect, Basic, and Improved Operating Policies performance in the objective space
Giuliani et al. (2015). Water Resources Research (under review)
Performance of operating policies using increasing information
27. Scalability - number of objectives
Giuliani et al. (2014). Water Resources Research. doi:10.1002/2013WR014700