1. DigiWorld
Distributed decision making: partially
observable dynamic games and
multiobjective policy optimization
Olivier.Teytaud@inria.fr + too many people for being all cited. Includes Inria, Cnrs, Univ.
Paris-Sud, LRI
TAO, Inria-Saclay IDF, Cnrs 8623, In a nutshell:
Lri, Univ. Paris-Sud,
Digiteo Labs, Pascal
Network of Excellence. We optimize strategies,
with parallel machines,
DigiWorld
and we test on games,
September 2012.
and we apply to energy.
2. Intro: so many words...
Distributed
Decision making
Partially observable
Dynamic
Games
Multiobjective
Policy
Optimization
3. Let's explain
Decision making + policy
optimization
Decision making:
Dynamic it's all about making decisions.
Games
Humans in the loop, or not.
Partially Observable
Distributed
Multiobjective
4. Let's explain
Decision making + policy
optimization
Policy:
Dynamic we provide policies.
Games
It's not graphical interfaces
or data visualization, it's
Partially Observable providing strategies.
Distributed
Multiobjective
5. Let's explain
Decision making + policy
optimization
Optimization:
Dynamic it's numerical.
games
We have objective functions,
Partially Observable optimize. It's science,
and we
not astrology.
Distributed
Multiobjective
6. Let's explain
Games: we have rules, a system
evolves according to these
Decision making + policy rules:
optimization
- Games:
Chess, game of Go,
Dynamic draughts (roughly useless, but
games convincing and easy to experiment)
Partially Observable
Distributed
Multiobjective
7. Let's explain
Games: we have rules, a system
evolves according to these
Decision making + policy rules:
optimization
- Games:
Chess, game of Go,
Dynamic draughts (roughly useless, but
games convincing and easy to experiment)
Partially Observable
Distributed
Multiobjective
8. Let's explain
Games: we have rules, a system
evolves according to these
Decision making + policy rules:
optimization
- Games:
Chess, game of Go,
Dynamic draughts (roughly useless, but
games convincing and easy to experiment)
Partially Observable
Distributed
Multiobjective
9. Let's explain
Games: we have rules, a system
evolves according to these
Decision making + policy rules:
optimization
- Games:
Chess, game of Go,
Dynamic draughts (roughly useless, but
games convincing and easy to experiment)
Partially Observable
Distributed
Multiobjective
10. Let's explain
Games: we have rules, a system
evolves according to these
Decision making + policy rules:
optimization
- Games:
Chess, game of Go,
Dynamic draughts (roughly useless, but
games convincing and easy to experiment)
Partially Observable
Distributed
Multiobjective
11. Let's explain
Games: we have rules, a system
evolves according to these
Decision making + policy
optimization
Yes, MineSweeper
rules:
is
- Games: really important.
Chess, game of Go,
Dynamic draughts (roughly useless, but
games convincing and easy to experiment)
Partially Observable
Distributed
Multiobjective
12. Let's explain
Games: we have rules, a system
evolves according to these
Decision Nearly nobody trusts an
making + policy rules:
optimization
industrial experiment,
- Games:
(in particular if effectsgame of Go,
Chess, are supposed
Dynamicto be a draughts (roughlyrisk but
reduction of useless,
games for horizon 50 years...).
convincing and easy to experiment)
Partially Observable
Distributed
Multiobjective
13. Let's explain
Games: we have rules, a system
evolves according to these
Decision Nearly nobody trusts an
making + policy rules:
optimization
industrial experiment,
- Games:
(in particular if effectsgame of Go,
Chess, are supposed
Dynamicto be a draughts (roughlyrisk but
reduction of useless,
games for horizon 50 years...).
convincing and easy to experiment)
Partially But many people trust an
Observable
experiment on games.
Distributed
Multiobjective
14. Let's explain
Games: we have rules, a system
First wins
evolves according to these
against
rules:
professional
- Games:
players
Chess, game of Go,
draughts (roughly the game
for useless, but
convincing and easyGo
of to experiment)
==> opened various doors for us
(we are very grateful to strong pros like
Kim Myung-Wang!)
15. Games: we have rules, a system
Let's explain
evolves according to these
rules:
Decision making + policy
optimization Games:
-
Chess, game of Go,
draughts (roughly useless, but
Dynamic convincing and easy to experiment)
games
- Industrial stuff:
group of power plants
Partially Observable
Distributed
Multiobjective
16. Games: we have rules, a system
Let's explain
evolves according to these
rules:
Decision making + policy
optimization Games:
-
Chess, game of Go,
draughts (roughly useless, but
Dynamic convincing and easy to experiment)
games
- Industrial stuff:
group of power plants
Partially Observable
Renewable
Distributed energy
Multiobjective
17. Games: we have rules, a system
Let's explain
evolves according to these
rules:
Decision making + policy
optimization Games:
-
Chess, game of Go,
draughts (roughly useless, but
Dynamic
Nuclear convincing and easy to experiment)
power
games
plant
- Industrial stuff:
group of power plants
Partially Observable
Distributed
Multiobjective
18. Games: we have rules, a system
Let's explain
evolves according to these
rules:
Decision making + policy
optimization Games:
-
Chess, game of Go,
draughts (roughly useless, but
Dynamic convincing and easy to experiment)
games
- Industrial stuff: Coal
group of power plants
Partially Observable
Distributed
Multiobjective
19. Games: we have rules, a system
Let's explain
evolves according to these
rules:
Decision making + policy
optimization Games:
Hydroelectric -
power plant Chess, game of Go,
draughts (roughly useless, but
Dynamic convincing and easy to experiment)
games
- Industrial stuff:
group of power plants
Partially Observable
Distributed
Multiobjective
20. Games: we have rules, a system
Let's explain
evolves according to these
rules:
Decision making + policy
optimization Games:
Hydroelectric -
power plant Chess, game of Go,
draughts (roughly useless, but
Dynamic convincing and easy to experiment)
games Involves
- Industrial
state variablesstuff:
(stock levels)
group of power plants
Partially Observable
Distributed
Multiobjective
21. Games: we have rules, a system
Let's explain
evolves according to these
rules:
Decision making + policy
optimization Games:
-
Chess, game of Go,
draughts (roughly useless, but
Dynamic convincing and easy to experiment)
games
- Industrial stuff:
group of power plants
Partially Observable + electricity consumers
Distributed Depends on weather,
economy, ...
Multiobjective
22. Games: we have rules, a system
Let's explain
evolves according to these
rules:
Decision making + policy
optimization Games:
-
Chess, game of Go,
draughts (roughly useless, but
Dynamic convincing and easy to experiment)
games
- Industrial stuff:
group of power plants
Partially Observable + electricity consumers
+ electric network
Distributed of lines
Capacity
Demand = Production
Multiobjective >= demand!)
(certainly not just production
23. Games: we have rules, a system
Let's explain
evolves according to these
rules:
Decision making + policy
optimization- Games: So we have state variables,
uncertainties, time steps,
Chess, gameeffects...
long term of Go,
draughts (roughlya useless, but
==> this is termed dynamic game
Dynamic convincing and easy to experiment)
games
- Industrial stuff:
group of power plants
Partially Observable + electricity consumers
+ electric network
Distributed of lines
Capacity
Demand = Production
Multiobjective >= demand!)
(certainly not just production
24. Games: we have rules, a system
Let's explain
evolves according to these
rules:
Decision making + policy
optimization Games:
-
Chess, game of Go,
draughts (roughly useless, but
Dynamic convincing and easy to experiment)
games Can be modelized
- Industrial stuff: probability distribution
by a
==> not adversarial uncertainty
group of power plants
Partially Observable
+ electricity consumers
+ electric network
+ weather
Distributed
Multiobjective
25. Games: we have rules, a system
Let's explain
evolves according to these
rules:
Decision making + policy
optimization Games:
-
Chess, game of Go,
draughts (roughly useless, but
Dynamic convincing and easy to experiment)
games Modelized by a
- Industrial stuff: probability
group of power plants?
distribution
Partially Observable + electricity consumers
+ electric network
+ weather + economy
Distributed
Multiobjective
26. Games: we have rules, a system
Let's explain
evolves according to these
rules:
Decision making + policy
optimization Games:
-
Chess, game of Go,
draughts In particular,useless, but
(roughly smart grids!
Dynamic convincing and easy to experiment)
games Worst case maybe better than
- Industrial stuff:
probabilistic models;
adversarial uncertainty.
group of power plants
Partially Observable + electricity consumers
+ electric network
+ weather + economy
Distributed + technical inovations
Multiobjective
27. Climate change, peak oil,
pollution, nuclear wastes...
Important problems.
We want to work numerically
on this.
28. Let's be simple, 1
We want electricity.
We prefer no nuclear waste.
We prefer no CO2.
So why don't we
just build plenty
of wind farms ?
29. Let's be simple, 1
So why don't we just build plenty of
wind farms ?
Because we need
production = demand
Always. And we can not give orders
to winds.
30. Let's be simple, 2
“Because we need
production = demand”
Why not production >= demand ?
Because otherwise, we destroy
both production tools and electric
appliances.
31. Let's be simple, 2
In case
production > demand,
and artificial demand for
useless motors / heaters / … ?
Maybe... wasting energy for
producing winds :-)
But it's better to do storage
E.g. because sometimes there's no
wind, no sun.
32. Let's be simple 3: so we solve
everything with storage ?
Hydroelectricity:
- Pumping water from bottom to top.
- Compressed air
==> but limited
Future: electric vehicles
33. Other solutions than storage ?
Devices which can be more or less
switched on/off on demand (e.g.
electric vehicules, air conditioning,
fridges, heaters...)
==> smart grids
Also: long distance connections
(sharing resources, smoothing
production and demand).
34. How is the future ?
Maybe much more electricity
demand (electric vehicles ?)
Hopefully less coal (CO2 pollution)
Shale gas, methane clathrate ? Be careful :-)
Wind farms ++
Concentration solar plants
Photovoltaic units ?
Long distance connections
Nuclear or not ?
35. Let's explain
Decision making + policy
optimization
“Games”: we have rules, a system
Dynamic evolves according to these
rules.
games
Uncertainties:
Partially - randomness
Observableadversarial (worst case)
-
Distributed
Multiobjective
36. Let's explain
Weather = maybe theoretically
Decision making + apolicy system,
stochastic
optimization but not all variables are observed.
Dynamic From restricted variables,
games weather is partially observable
Partially Observable
Distributed
Multiobjective
37. Outline
● Complexity and ATM
● Complexity and games (incl. planning)
● Bounded horizon games
38. Classical complexity classes,
including non-determinism
P ⊂ NP ⊂ PSPACE ⊂ EXPTIME ⊂ NEXPTIME ⊂ EXPSPACE
Proved:
PSPACE ≠ EXPSPACE P ≠ EXPTIME
NP ≠ NEXPTIME
Believed, not proved:
P≠NP EXPTIME≠NEXPTIME
NEXPTIME≠EXPSPACE
39. Complexity and alternating
Turing machines
● Turing machine (TM)= abstract
computer
● Non-deterministic Turing Machine (NTM)
= TM with “for all” states (i.e. several
transitions, accepts if all transitions
accept)
● Co-NTM: TM with “exists” states (i.e.
several transitions, accepts if at least one
transition accepts)
● ATM: TM with both “exists” and “for all”
states.
40. Complexity and alternating
Turing machines
● Turing machine (TM)= abstract computer
● Non-deterministic Turing Machine
(NTM)
= TM with “exists” states (i.e. several
transitions, accepts if at least one
accepts)
● Co-NTM: TM with “exists” states (i.e.
several transitions, accepts if at least one
transition accepts)
● ATM: TM with both “exists” and “for all”
states.
41. Complexity and alternating
Turing machines
● Turing machine (TM)= abstract computer
● Non-deterministic Turing Machine (NTM)
= TM with “exists” states (i.e. several
transitions, accepts if at least one
accepts)
● Co-NTM: TM with “for all” states (i.e.
several transitions, accepts if all lead to
accept)
● ATM: TM with both “exists” and “for all”
states.
42. Complexity and alternating
Turing machines
● Turing machine (TM)= abstract computer
● Non-deterministic Turing Machine (NTM)
= TM with “exists” states (i.e. several
transitions, accepts if at least one
accepts)
● Co-NTM: TM with “for all” states (i.e.
several transitions, accepts if all lead to
accept)
● ATM: TM with both “exists” and “for all”
states.
44. Outline
● Complexity and ATM
● Complexity and games (incl.
planning)
● Bounded horizon games
45. Computational complexity:
framework
Discrete time, uncertainty.
Uncertainty can be stochastic or adversarial.
Succinct representation or flat representations.
Which representation is more natural ?
Probably succinct (one of the succinct...), but
it's not always so easy...
46. Complexity, partial observation,
infinite horizon
● 1P+random, unobservable: undecidable
(Madani et al)
● 1P+random, P(win=1),
or equivalently 2P, P(win=1):
[Rintanen and refs therein]
– Fully observable: EXP [Littman94]
– Unobservable: EXPSPACE [Hasslum et al 2000]
– Partial observability: 2EXP
Rmk: “2P, P(win=1)” is not “2P”!
47. Complexity, partial observation,
infinite horizon
● 2P vs 1P: undecidable! [Hearn, Demaine]
● 2P (random or not):
– Existence of sure win: equiv. to 1P+random !
● EXP full-observable (e.g. Go, Robson 1984)
● PSPACE unobservable
● 2EXP partially observable
– Existence of sure win, same state forbidden:
EXPSPACE-complete (Go with Chinese rules ?
rather conjectured EXPTIME or PSPACE...)
– General case (optimal play): undecidable
(Auger, Teytaud) (what about phantom-Go ?)
48. Complexity, partial observation
Remarks:
● Continuous case ?
● Purely epistemic (we gather information, we
don't change the state) ? [Sabbadin et al]
● Restrictions on the policy, on the set of
actions...
● Discounted reward
● DEC-POMDP, POSG : many players,
same/opposite/different reward functions...
49. Let's explain
Distributed:
If you work on a problem with
Decision making + policy billions euros,
budget ~ 500
optimization a cluster is not that expensive.
Moreover, the problem is
Dynamic naturally multi-level:
games - High level = investments
- Low level = management
Partially Observable ~ 3 years, 2 weeks,
(horizon
1 day, 1 minute)
Distributed
Multiobjective
50. Distributed nature of the
problem
High level: optimization of the investments
(horizon = 50 years)
Lower level: simulation of the system,
given investment strategies
(lower level = parallelized)
(real case a bit more
complicated than that)
51. Let's explain
Decision makingOne policy for each
+ objective
optimization of several scenarios
Dynamic (climate change,
games fossile fuels,
technologies...)
Partially Observable
Distributed
Multiobjective
52. Let's explain
Decision making + policy
optimization
One objective for each
Dynamic of several risk levels
games (median, 5% worst,
1% worst, ...)
Partially Observable
Distributed
Multiobjective
53. Research philosophy
Too much industrial for Inria / Paris-Sud ?
In my humble opinion, no.
Industrial research is good if:
- it is widely applicable
(it is!)
- or it is visible and easy to operate
(it is not... “games” are!)
- or It is very important
(would you like it if there was nobody from academy
working numerically on this ?
==> we are **the** neutral people...)
54. What are the approaches ?
– Dynamic programming (Massé – Bellman 50's) (still
the main approach in industry), alpha-beta, retrograde analysis
– Reinforcement learning
– MCTS (R. Coulom. Efficient Selectivity and Backup
Operators in Monte-Carlo Tree Search. In
Proceedings of the 5th International Conference on
Computers and Games, Turin, Italy, 2006)
– Scripts + Tuning / Direct Policy Search
– Coevolution
55. What are the approaches ?
– Dynamic programming (Massé – Bellman 50's) (still
the main approach in industry), alpha-beta, retrograde analysis
– Reinforcement learning
– MCTS (R. Coulom. Efficient Selectivity and Backup
Operators in Monte-Carlo Tree Search. In
Proceedings of the 5th International Conference on
Computers and Games, Turin, Italy, 2006)
– Scripts + Tuning / Direct Policy Search
– Coevolution
==> remove non-anytime tools
56. What are the approaches ?
– Dynamic programming (Massé – Bellman 50's) (still
the main approach in industry), alpha-beta, retrograde analysis
– Reinforcement learning
– MCTS (R. Coulom. Efficient Selectivity and Backup
Operators in Monte-Carlo Tree Search. In
Proceedings of the 5th International Conference on
Computers and Games, Turin, Italy, 2006)
– Scripts + Tuning / Direct Policy Search
– Coevolution
==> remove unstable tools
57. What are the approaches ?
– Dynamic programming (Massé – Bellman 50's) (still
the main approach in industry), alpha-beta, retrograde analysis
– Reinforcement learning
– MCTS (R. Coulom. Efficient Selectivity and Backup
Operators in Monte-Carlo Tree Search. In
Proceedings of the 5th International Conference on
Computers and Games, Turin, Italy, 2006)
– Scripts + Tuning / Direct Policy Search
– Coevolution
==> remove unstable tools
58. What do we use ?
MCTS =
- start with a MC (random simulator)
- online optimize the simulations
depending on statistics (updates the
near future)
DPS = optimize a random simulator so that
decisions become better (far future
effects correctly handled)
Currently, we use MCTS with
DPS as a MC tool.
59. Conclusions
Nice big problems in energy. Require
collaborations (many models, datas).
● Our role is not to conclude “(don't) use
shale gas” or “(don't) use methane
clathrate”
● Better: “if you use quantify XXX of
clathrate and YYY of shale gas in
conditions ZZZ then the distribution of
economical and ecological costs switches
to ...”
60. Conclusions
Nice big problems in energy. Require
collaborations. By the way, if you want to
collaborate, people working numerically on this kind of
stuff are more than welcome :-)
Anytime algorithms are necessary, mixing
between MCTS / DPS.
There are still natural questions which are
undecidable ==> decidability matters.
Madani et al (1 player against random, no observability), extended here to
2 players with no random
61. Open problems & targets
Phantom-Go undecidable ?
Complexity of Go with Chinese rules ?
(conjectured: PSPACE or EXPTIME;
proved PSPACE-hard + EXPSPACE)
A stable high-scale anytime platform for
our energy management problems
==> if you like experimenting join us :-)