3 main algorithms from the state of the art:
- Model Predictive Control
- Stochastic Dynamic Programming
- Direct Policy Search
==> and our proposal, a modified Direct Policy Search
termed Direct Value Search
Tools for Discrete Time Control; Application to Power Systems
1. I do not speak Chinese ! ! !
●
And my English is extremely French
(when native English speakers listen to my
English, they sometimes believe that they
suddenly, by miracle, understand French)
2. I do not speak Chinese ! ! !
●
●
And my English is extremely French
(when native English speakers listen to my
English, they sometimes believe that they
suddenly, by miracle, understand French)
For the moment if I gave a talk in Chinese it
would be boring, with only:
● hse-hse
● nirao
● pukachi
3. I do not speak Chinese ! ! !
●
●
●
And my English is extremely French
(when native English speakers listen to my
English, they sometimes believe that they
suddenly, by miracle, understand French)
For the moment if I gave a talk in Chinese it
would be boring, with only:
● hse-hse
● nirao
● pukachi
Interrupt me as much as you want for
facilitating understanding :-)
4. High-Scale Power Systems:
Simulation & Optimization
Olivier Teytaud + Inria-Tao + Artelys
TAO project-team
INRIA Saclay Île-de-France
O. Teytaud, Research Fellow,
olivier.teytaud@inria.fr
http://www.lri.fr/~teytaud/
5. Ilab METIS
www.lri.fr/~teytaud/metis.html
●
Metis = Tao + Artelys
TAO tao.lri.fr, Machine Learning & Optimization
● Joint INRIA / CNRS / Univ. Paris-Sud team
● 12 researchers, 17 PhDs, 3 post-docs, 3 engineers
● Artelys www.artelys.com SME
- France / US / Canada
- 50 persons
==> collaboration through common platform
●
●
Activities
● Optimization (uncertainties, sequential)
● Application to power systems
O. Teytaud, Research Fellow,
olivier.teytaud@inria.fr
http://www.lri.fr/~teytaud/
6. Importantly, it is not a lie.
●
●
●
●
It is a tradition, in research institutes, to claim
some links with industry
I don't claim that having such links is necessary
or always a great achievement in itself
But I do claim that in my case it is true that I
have links with industry
My four students here in Taiwan, and others in
France, all have real salaries based on
industrial fundings.
7. All in one slide
Consider an electric system.
Decisions =
● Strategic decisions (a few time steps):
● building a nuclear power plant
● build a Spain-Marocco connection
● build a wind farm
●
tactical decisions (many time steps):
● switching on hydroelectricity plant #7
● switching on thermal plant #4
● ....
Based on
Simulations
of the
tactical level
Depends on
the
strategic
level
8. A bit more precisely:
the strategic level
Brute force approach for strategic level:
●
●
I simulate
● each possible strategic decision (e.g. 20000);
● 1000 times;
● each of them with optimal tactical decisions
==> 20 000 optimizations, 1000
simulations each
I choose the best one.
Better: More simulations on the best strategic decisions.
However, this talk will not focus on that part.
9. A bit more precisely:
the tactical level
Brute force approach for tactical level:
●
●
Simplify
● Replace each random process by
expectation
● Optimize decisions deterministically
But reality is stochastic:
● Water inflows
● Wind farms
Better: optimizing a policy
(i.e. reactive, closed-loop)
10. Specialization on Power Systems
●
Planning/control (tactical level)
●
Pluriannual planning: evaluate marginal costs of hydroelectricity
●
Taking into account stochasticity and uncertainties
==> IOMCA (ANR)
●
High scale investment studies (e.g. Europe+North Africa)
●
Long term (2030 - 2050)
●
Huge (non-stochastic) uncertainties
●
Investments: interconnections, storage, smart grids, power plants...
==> POST (ADEME)
●
Moderate scale (Cities, Factories) (tactical level simpler)
●
Master plan optimization
●
Stochastic uncertainties
==> Citines project (FP7)
12. The POST project – supergrids
simulation and optimization
Mature technology:HVDC links
(high-voltage direct current)
European subregions:
- Case 1 : electric corridor France / Spain / Marocco
- Case 2 : south-west
(France/Spain/Italiy/Tunisia/Marocco)
- Case 3 : maghreb – Central West Europe
==> towards a European supergrid
Related
ideas in Asia
13. Tactical level: unit commitment at the
scale of a coutry: looks like a game
●
Many time steps.
●
Many power plants.
●
Some of them have stocks (hydroelectricity).
●
Many constraints (rules).
●
Uncertainties (water inflows, temperature, …)
==> make decisions:
●
When should I switch on ? (for each PP)
●
At which power ?
14. Investment decisions through simulations
●
Issues
–
–
–
●
Methods
–
–
●
Demand varying in time, limited previsibility
Transportation introduces constraints
Renewable ==> variability ++
Markovian assumptions ==> wrong
Simplified models ==> Model error >> optimization error
Our approach
●
Machine Learning on top of Mathematical Programming
15. Hybridization reinforcement learning /
mathematical programming
●
Math programming (mathematicians doing discrete-time
control)
–
–
–
●
Nearly exact solutions for a simplified problem
High-dimensional constrained action space
But small state space & not anytime
Reinforcement learning (artificially intelligent people
doing discrete-time control :-) )
–
–
–
–
Unstable
Small model bias
Small / simple action space
But high dimensional state space & anytime
17. Now the technical part
Model Predictive Control,
Stochastic Dynamic
Programming,
18. Now the technical part
Model Predictive Control,
Stochastic Dynamic
Programming,
Direct Policy Search
19. Now the technical part
Model Predictive Control,
Stochastic Dynamic
Programming,
Direct Policy Search,
and Direct Value Search (new)
(3/4 of this talk is about the state of the art, only
1/4 our work)
20. Now the technical part
Model Predictive Control,
Stochastic Dynamic
Programming,
Direct Policy Search
and Direct Value Search
combining Direct Policy Search
and Stochastic Dynamic Programming
(3/4 of this talk is about the state of the art, only
1/4 our work)
21. Many optimization tools (SDP, MPC):
● Strong constraints on forecasts
● Strong constraints on model structure.
Direct Policy Search
● Arbitrary forecasts, arbitrary structure
● But not scalable / # decision variables.
→ merge: Direct Value Search
Jean-Joseph.Christophe@inria.fr
Jeremie.Decock@inria.fr
Pierre.Defreminville@artelys.com
Olivier.Teytaud@inria.fr
22. ●
Stochastic Dynamic Optimization
●
Classical solutions: Bellman (old & new)
●
●
Markov Chains
●
Overfitting
●
●
Anticipativity
SDP, SDDP
Alternate solution: Direct Policy Search
●
●
●
No problem with anticipativity
Scalability issue
The best of both worlds: Direct Value Search
23. Random
process
Random values
Stochastic Control
Controller commands
with
memory
Observation
Cost
System
State
●
●
For an optimal representation, you need access
to the whole archive
or to forecasts (generative model / probabilistic forecasts)
(Astrom 1965)
24. ●
Stochastic Dynamic Optimization
●
Classical solutions: Bellman (old & new)
●
●
Markov Chains
●
Overfitting
●
●
Anticipativity (dirty solution)
SDP, SDDP
Alternate solution: Direct Policy Search
●
●
●
No problem with anticipativity
Scalability issue
The best of both worlds: Direct Value Search
25. ●
Anticipative solutions:
●
Maximum over strategic decisions
●
Of average over random processes
●
●
Of optimized decisions, given random
processes & strategic decisions
Pros/Cons
●
●
●
Much simpler (deterministic optimization)
But in real life you can not guess November
rains in January
Rather optimistic decisions
26. MODEL PREDICTIVE CONTROL
●
Anticipative solutions:
●
Maximum over strategic decisions
●
Of pessimistic forecasts (e.g. quantile)
●
●
Of optimized decisions, given forecasts &
strategic decisions
Pros/Cons
●
●
●
Much simpler (deterministic optimization)
But in real life you can not guess November
rains in January
Not so optimistic, convenient, simple
27. MODEL PREDICTIVE CONTROL
●
Anticipative solutions:
●
Maximum over strategic decisions
●
Of pessimistic forecasts (e.g. quantile)
●
●
Of optimized decisions, given forecasts &
strategic decisions
Ok,
we have done one
Pros/Cons
of the four targets:
Much simpler (deterministic optimization)
model predictive
But in real life you can not guess November
control.
rains in January
●
●
●
Not so optimistic, convenient, simple
28. ●
Stochastic Dynamic Optimization
●
Classical solutions: Bellman (old & new)
●
●
Markov Chains
●
Overfitting
●
●
Anticipativity (dirty solution)
SDP, SDDP
Alternate solution: Direct Policy Search
●
●
●
No problem with anticipativity
Scalability issue
The best of both worlds: Direct Value Search
29. Markov solution
Representation as a Markov process (a tree):
This is the representation
of the random process.
Let us see how to
represent the rest.
30. How to solve, simple case, binary
stock, one day
It is
December
30th and I have
water
er
at )
w 0
se t =
I u os
(c
No more water,
december 31st
I do not use
I have water,
december 31st
31. How to solve, simple case, binary
stock, one day
It is
December
30th and I have
water:
Future
Cost = 0
er
at )
w 0
se t =
I u os
(c
No more water,
december 31st
I do not use
I have water,
december 31st
32. How to solve, simple case, binary
stock, 3 days, no random process
2
1
1
2
3
2
2
2
3
1
4
3
3
3
3
2
3
33. How to solve, simple case, binary
stock, 3 days, no random process
1
1
2
2
2
3
2
2
2
2
3
1
4
3
3
2
3
2
3
3
34. How to solve, simple case, binary
stock, 3 days, no random process
4
3
1
2
2
1
2
2
2
5
3
2
4
1
2
3
4
3
3
3
7
3
2
6
2
3
36. How to solve, simple case, binary
stock, 3 days, random parts
2
2
1
1
23
2 4 22 2
5
3
1
4 2 2 3
73 36 3
3
3
4
4
3
2
2
1
23
2 4 22 2
5
3
1
4 2 2 3
73 36 3
3
1
o ba
Pr
1/ 3
ility
b
Pro
bab
ilit
y 2/
3
2
2
1
23
2 4 22 2
5
3
1
4 2 2 3
73 36 3
3
4
1
3
37. Markov solution: ok you have understood
stochastic dynamic programming (Bellman)
Representation as a Markov process (a tree):
This is the representation
of the random process.
In each node, there are the
state-nodes with decision-edges.
38. Markov solution: ok you have understood
stochastic dynamic programming (Bellman)
Representation as a Markov process (a tree):
Ok,
we have done the 2nd
of the four targets:
This is the representation
stochastic dynamic
of the random process.
programming
In each node, there are the
state-nodes with decision-edges.
39. Markov solution
Representation as a Markov process (a tree):
Optimize decisions for each state.
This means you are not cheating.
But difficult to use.
Strategy optimized for
very specific forecasting
models
Might be ok for your problem ?
40. ●
Stochastic Dynamic Optimization
●
Classical solutions: Bellman (old & new)
●
●
Markov Chains
●
Overfitting
●
●
Anticipativity (dirty solution)
SDP, SDDP
Alternate solution: Direct Policy Search
●
●
●
No problem with anticipativity
Scalability issue
The best of both worlds: Direct Value Search
41. Overfitting
●
Representation as a Markov process (a tree):
How do you actually make decisions when the random values
are not exactly those observed ? (heuristics...)
●
●
●
Check on random realizations which have not been used for
building the tree.
Does it work correctly ?
Overfitting = when it works only on scenarios used in the
optimization process.
42. ●
Stochastic Dynamic Optimization
●
Classical solutions: Bellman (old & new)
●
●
Markov Chains
●
Overfitting
●
●
Anticipativity (dirty solution)
SDP, SDDP
Alternate solution: Direct Policy Search
●
●
●
No problem with anticipativity
Scalability issue
The best of both worlds: Direct Value Search
43. SDP / SDDP
Stochastic (Dual) Dynamic Programming
●
Representation of the controller with Linear Progamming
(value function as piecewise linear)
44. SDP / SDDP
Stochastic (Dual) Dynamic Programming
●
Representation of the controller with Linear Progamming
(value function as piecewise linear)
●
→ ok for 100 000 decision variables per time step
(tenths of time steps, hundreds of plants, several
decisions each)
45. SDP / SDDP
Stochastic (Dual) Dynamic Programming
●
Representation of the controller with Linear Progamming
(value function as piecewise linear)
●
●
→ ok for 100 000 decision variables per time step
but solving by expensive SDP/SDDP (curse of
dimensionality, exp. in state variables)
46. SDP / SDDP
Stochastic (Dual) Dynamic Programming
●
Representation of the controller with Linear Progamming
(value function as piecewise linear)
●
●
●
→ ok for 100 000 decision variables per time step
but solving by expensive SDP/SDDP
Constraints
●
Needs LP approximation: ok for you ?
47. SDP / SDDP
Stochastic (Dual) Dynamic Programming
●
Representation of the controller with Linear Progamming
(value function as piecewise linear)
●
●
●
→ ok for 100 000 decision variables per time step
but solving by expensive SDP/SDDP
Constraints
●
Needs LP approximation: ok for you ?
●
SDDP requires convex Bellman values: ok for you ?
48. SDP / SDDP
Stochastic (Dual) Dynamic Programming
●
Representation of the controller with Linear Progamming
(value function as piecewise linear)
●
●
●
→ ok for 100 000 decision variables per time step
but solving by expensive SDP/SDDP
Constraints
●
Needs LP approximation: ok for you ?
●
SDDP requires convex Bellman values: ok for you ?
●
Needs Markov random processes: ok for you ?
(possibly after some random process extension...)
49. SDP / SDDP
Stochastic (Dual) Dynamic Programming
●
Representation of the controller with Linear Progamming
(value function as piecewise linear)
●
●
●
→ ok for 100 000 decision variables per time step
but solving by expensive SDP/SDDP
Constraints
●
Needs LP approximation: ok for you ?
●
SDDP requires convex Bellman values: ok for you ?
●
Needs Markov random processes: ok for you ?
(possibly after some random process extension...)
●
Goal
keep scalability
but get rid of SDP/SDDP solving
50. Summary
●
●
●
Most classical solution = SDP and variants
Or MPC (model-predictive control), replacing
the stochastic parts by deterministic pessimistic
forecasts
Statistical modelization is “cast” into a tree
model & (probabilistic) forecasting modules are
essentially lost
51. ●
Stochastic Dynamic Optimization
●
Classical solutions: Bellman (old & new)
●
●
Markov Chains
●
Overfitting
●
●
Anticipativity (dirty solution)
SDP, SDDP
Alternate solution: Direct Policy Search
●
●
●
No problem with anticipativity
But scalability issue
The best of both worlds: Direct Value Search
52. Direct Policy Search
●
●
●
●
Requires a parametric controller
Principle: optimize the parameters on
simulations
Unusual in large scale Power Systems
(we will see why)
Usual in other areas (finance, evolutionary
robotics)
53. Random
process
Random values
Stochastic Control
Controller commands
with
memory
State
Cost
System
State
Optimize the controller thanks to a simulator:
●
●
●
Command = Controller(w,state,forecasts)
Simulate( w ) = stochastic loss with parameter w
w* = argmin [Simulate(w)]
54. Random
process
Random values
Stochastic Control
Controller commands
with
memory
State
●
●
●
Cost
System
Ok,
State
we have done the 3rd
of the four targets:
Optimize the controller thanks to a simulator:
Direct policy search.
Command = Controller(w,state,forecasts)
Simulate( w ) = stochastic loss with parameter w
w* = argmin [Simulate(w)]
56. Direct Policy Search (DPS)
●
Requires a parametric controller
e.g. neural network
Controller(w,x) =
W3+W2.tanh(W1.x+W0)
57. Direct Policy Search (DPS)
●
Requires a parametric controller
e.g. neural network
Controller(w,x) =
W3+W2.tanh(W1.x+W0)
●
Noisy Black-Box Optimization
58. Direct Policy Search (DPS)
●
Requires a parametric controller
e.g. neural network
Controller(w,x) =
W3+W2.tanh(W1.x+W0)
●
Noisy Black-Box Optimization
●
Advantages: non-linear ok, forecasts included
59. Direct Policy Search (DPS)
●
Requires a parametric controller
e.g. neural network
Controller(w,x) =
W3+W2.tanh(W1.x+W0)
●
Noisy Black-Box Optimization
●
●
Advantages: non-linear ok, forecasts included
Issue: too slow
hundreds of parameters for even 20 decision variables
(depends on structure)
60. Direct Policy Search (DPS)
●
Requires a parametric controller
e.g. neural network
Controller(w,x) =
W3+W2.tanh(W1.x+W0)
●
Noisy Black-Box Optimization
●
●
Advantages: non-linear ok, forecasts included
Issue: too slow
hundreds of parameters for even 20 decision variables
(depends on structure)
●
Idea: a special structure for DPS
(inspired from SDP)
61. Direct Policy Search (DPS)
●
Requires a parametric controller
e.g. neural network
Controller(w,x) =
W3+W2.tanh(W1.x+W0)
●
Noisy Black-Box Optimization
●
●
Strategy optimized given the real
Advantages: non-linear ok, forecasts included
forecasting module you have
Issue: too slow
hundreds of parameters for even 20 decision variables
(forecasts are inputs)
(depends on structure)
●
Idea: a special structure for DPS
(inspired from SDP)
62. ●
Stochastic Dynamic Optimization
●
Classical solutions: Bellman (old & new)
●
●
Markov Chains
●
Overfitting
●
●
Anticipativity (dirty solution)
SDP, SDDP
Alternate solution: Direct Policy Search
●
●
●
No problem with anticipativity
Scalability issue
The best of both worlds: Direct Value
Search
63. Direct Value Search
SDP representation in DPS
Controller(state) =
argmin Cost(decision) + V(next state)
●
V(nextState) = alpha x NextState
●
LP
alpha = NeuralNetwork(w,state)
Not LP
(or a more sophisticated LP)
==> given w, decision making solved as a LP
==> non-linear mapping for choosing the
parameters of the LP from the current state
64. Direct Value Search
SDP representation in DPS
Controller(state) =
argmin Cost(decision) + V(next state)
●
V(nextState) = alpha x NextState
●
LP
alpha = NeuralNetwork(w,state)
Not LP
(or a more sophisticated LP)
Drawback: requires the optimization of w
( = noisy black-box optimization problem)
65. Summary: the best of both worlds
Controller(w,state)
The Structure of
the Controller
(fast, scalable by structure)
●
V(w,state,.) is
non-linear
●
Optimize Cost(dec) + V(w,state,nextState) is LP
Simul(w)
●
Do a simulation with w
●
A simulator
(you can put anything you want in it,
even if it is not linear, nothing Markovian...)
Return the cost
DirectValueSearch
●
optimize w* = argmin simul(w)
●
Return Controller with w*
The optimization
(will do its best, given the simulator
and the structure)
66. Summary: the best of both worlds
Controller(w,state)
●
V(w,state,.) is
non-linear
●
Optimize Cost(dec) + V(w,state,nextState) is LP
Simul(w)
●
Do a simulation with w
●
Return the cost
3 optimizers:
● SAES
DirectValueSearch
●
●
optimize w* = argmin simul(w)
●
Return Controller with w*
●
Fabian:
● gradient descent
● redundant finite differences
Newton version
67.
68. Ok,
we have done the 4th
of the four targets:
Direct value search.
69. State of the art in discrete-time control, a few tools:
●
Model Predictive Control:
For making a decision in a given state:
(i) do forecasts
(ii) replace random procs -> pessimistic forecasts
(iii) Optimize as if deterministic problem
●
Stochastic Dynamic Programming:
●
●
●
Markov model
Compute “cost to go” backwards
Direct Policy Search:
●
Parametric controller
●
Optimized on simulations
70. Conclusion
●
Still rather preliminary (less tested than MPC or
SDDP) but promising:
●
●
●
●
Forecasts naturally included in optimization
Anytime algorithm
(the user immediately gets approximate results)
No convexity constraints
Room for detailed simulations
(e.g. with very small time scale, for volatility)
●
No random process constraints (not Markov)
●
Can handle large state spaces (as DPS)
●
Can handle large action spaces (as SDP)
==> can work on the “real” problem, without “cast”
71. Bibliography
●
●
●
●
Dynamic Programming and Suboptimal Control:
A Survey from ADP to MPC. D. Bertsekas,
2005. (MPC = deterministic forecasts)
Astrom 1965
Renewable energy forecasts ought to be
probabilistic! P. Pinson, 2013 (wipfor talk)
Training a neural network with a financial
criterion rather than a prediction criterion.
Y. Bengio, 1997 (quite practical application of direct
policy search, convincing experiments)
74. SDP / SDDP
Stochastic (Dual) Dynamic Programming
●
Representation of the controller
●
decision(current state)=
argmin Cost(decision) + Bellman(next state)
●
Linear programming (LP) if:
–
–
●
For a given current state, next state = LP(decision)
Cost(decision) = LP(decision)
→100 000 decision variables per time step