Labex2012g

DigiWorld

Distributed decision making: partially
observable dynamic games and
multiobjective policy optimization

Olivier.Teytaud@inria.fr + too many people for being all cited. Includes Inria, Cnrs, Univ.
Paris-Sud, LRI

TAO, Inria-Saclay IDF, Cnrs 8623, In a nutshell:
Lri, Univ. Paris-Sud,
Digiteo Labs, Pascal
Network of Excellence. We optimize strategies,
with parallel machines,
DigiWorld
and we test on games,
September 2012.
and we apply to energy.

Intro: so many words...

Distributed
Decision making
Partially observable
Dynamic
Games
Multiobjective
Policy
Optimization

Let's explain

Decision making + policy
optimization
Decision making:
Dynamic it's all about making decisions.
Games
Humans in the loop, or not.
Partially Observable

Distributed
Multiobjective

Let's explain

optimization
Policy:
Dynamic we provide policies.
Games
It's not graphical interfaces
or data visualization, it's
Partially Observable providing strategies.

Distributed
Multiobjective

Let's explain

optimization
Optimization:
Dynamic it's numerical.
games
We have objective functions,
Partially Observable optimize. It's science,
and we
not astrology.
Distributed
Multiobjective

Let's explain
Games: we have rules, a system
evolves according to these
Decision making + policy rules:
optimization
- Games:
Chess, game of Go,
Dynamic draughts (roughly useless, but
games convincing and easy to experiment)


Distributed
Multiobjective

Let's explain
optimization
Yes, MineSweeper
rules:
is
- Games: really important.
Chess, game of Go,
Dynamic draughts (roughly useless, but
games convincing and easy to experiment)


Distributed
Multiobjective

Let's explain
Decision Nearly nobody trusts an
making + policy rules:
optimization
industrial experiment,
- Games:
(in particular if effectsgame of Go,
Chess, are supposed
Dynamicto be a draughts (roughlyrisk but
reduction of useless,
games for horizon 50 years...).
convincing and easy to experiment)


Distributed
Multiobjective

Let's explain
Decision Nearly nobody trusts an
making + policy rules:
optimization
industrial experiment,
- Games:
(in particular if effectsgame of Go,
Chess, are supposed
Dynamicto be a draughts (roughlyrisk but
reduction of useless,
games for horizon 50 years...).
convincing and easy to experiment)

Partially But many people trust an
Observable
experiment on games.
Distributed
Multiobjective

Let's explain
First wins
against
rules:
professional
- Games:
players
Chess, game of Go,
draughts (roughly the game
for useless, but
convincing and easyGo
of to experiment)

==> opened various doors for us
(we are very grateful to strong pros like
Kim Myung-Wang!)

Let's explain
rules:
optimization Games:
-
Chess, game of Go,
draughts (roughly useless, but
Dynamic convincing and easy to experiment)
games
- Industrial stuff:
group of power plants

Distributed
Multiobjective

Let's explain
rules:
optimization Games:
-
Chess, game of Go,
games
- Industrial stuff:
Renewable
Distributed energy

Multiobjective

Let's explain
rules:
optimization Games:
-
Chess, game of Go,
Dynamic
Nuclear convincing and easy to experiment)
power
games
plant
- Industrial stuff:

Distributed
Multiobjective

Let's explain
rules:
optimization Games:
-
Chess, game of Go,
games
- Industrial stuff: Coal

Distributed
Multiobjective

Let's explain
rules:
optimization Games:
Hydroelectric -
power plant Chess, game of Go,
games
- Industrial stuff:

Distributed
Multiobjective

Let's explain
rules:
optimization Games:
Hydroelectric -
power plant Chess, game of Go,
games Involves
- Industrial
state variablesstuff:
(stock levels)

Distributed
Multiobjective

Let's explain
rules:
optimization Games:
-
Chess, game of Go,
games
- Industrial stuff:
Partially Observable + electricity consumers

Distributed Depends on weather,
economy, ...
Multiobjective

Let's explain
rules:
optimization Games:
-
Chess, game of Go,
games
- Industrial stuff:
+ electric network
Distributed of lines
Capacity
Demand = Production
Multiobjective >= demand!)
(certainly not just production

Let's explain
rules:
optimization- Games: So we have state variables,
uncertainties, time steps,
Chess, gameeffects...
long term of Go,
draughts (roughlya useless, but
==> this is termed dynamic game
games
- Industrial stuff:
+ electric network
Distributed of lines
Capacity
Demand = Production
Multiobjective >= demand!)
(certainly not just production

Let's explain
rules:
optimization Games:
-
Chess, game of Go,
games Can be modelized
- Industrial stuff: probability distribution
by a
==> not adversarial uncertainty
+ electricity consumers
+ electric network
+ weather
Distributed
Multiobjective

Let's explain
rules:
optimization Games:
-
Chess, game of Go,
games Modelized by a
- Industrial stuff: probability
group of power plants?
distribution
+ electric network
+ weather + economy
Distributed
Multiobjective

Let's explain
rules:
optimization Games:
-
Chess, game of Go,
draughts In particular,useless, but
(roughly smart grids!
games Worst case maybe better than
- Industrial stuff:
probabilistic models;
adversarial uncertainty.
+ electric network
+ weather + economy
Distributed + technical inovations
Multiobjective

Climate change, peak oil,
pollution, nuclear wastes...

Important problems.

We want to work numerically
on this.

Let's be simple, 1

We want electricity.
We prefer no nuclear waste.
We prefer no CO2.

So why don't we
just build plenty
of wind farms ?

Let's be simple, 1

So why don't we just build plenty of
wind farms ?

Because we need
production = demand

Always. And we can not give orders
to winds.

Let's be simple, 2

“Because we need
production = demand”

Why not production >= demand ?

Because otherwise, we destroy
both production tools and electric
appliances.

Let's be simple, 2

In case
production > demand,
and artificial demand for
useless motors / heaters / … ?
Maybe... wasting energy for
producing winds :-)
But it's better to do storage
E.g. because sometimes there's no
wind, no sun.

Let's be simple 3: so we solve
everything with storage ?

Hydroelectricity:
- Pumping water from bottom to top.
- Compressed air
==> but limited

Future: electric vehicles

Other solutions than storage ?

Devices which can be more or less
switched on/off on demand (e.g.
electric vehicules, air conditioning,
fridges, heaters...)
==> smart grids

Also: long distance connections
(sharing resources, smoothing
production and demand).

How is the future ?
Maybe much more electricity
demand (electric vehicles ?)
Hopefully less coal (CO2 pollution)
Shale gas, methane clathrate ? Be careful :-)
Wind farms ++
Concentration solar plants
Photovoltaic units ?
Long distance connections
Nuclear or not ?

Let's explain

optimization
“Games”: we have rules, a system
Dynamic evolves according to these
rules.
games
Uncertainties:
Partially - randomness
Observableadversarial (worst case)
-

Distributed
Multiobjective

Let's explain
Weather = maybe theoretically
Decision making + apolicy system,
stochastic
optimization but not all variables are observed.

Dynamic From restricted variables,
games weather is partially observable


Distributed
Multiobjective

Outline

● Complexity and ATM

● Complexity and games (incl. planning)

● Bounded horizon games

Classical complexity classes,
including non-determinism
P ⊂ NP ⊂ PSPACE ⊂ EXPTIME ⊂ NEXPTIME ⊂ EXPSPACE

Proved:
PSPACE ≠ EXPSPACE P ≠ EXPTIME
NP ≠ NEXPTIME

Believed, not proved:
P≠NP EXPTIME≠NEXPTIME
NEXPTIME≠EXPSPACE

Complexity and alternating
Turing machines
● Turing machine (TM)= abstract
computer
● Non-deterministic Turing Machine (NTM)
= TM with “for all” states (i.e. several
transitions, accepts if all transitions
accept)
● Co-NTM: TM with “exists” states (i.e.
several transitions, accepts if at least one
transition accepts)
● ATM: TM with both “exists” and “for all”
states.

Turing machines
● Turing machine (TM)= abstract computer
● Non-deterministic Turing Machine
(NTM)
= TM with “exists” states (i.e. several
transitions, accepts if at least one
accepts)
● Co-NTM: TM with “exists” states (i.e.
several transitions, accepts if at least one
transition accepts)
states.

Turing machines
● Turing machine (TM)= abstract computer
● Non-deterministic Turing Machine (NTM)
= TM with “exists” states (i.e. several
transitions, accepts if at least one
accepts)
● Co-NTM: TM with “for all” states (i.e.
several transitions, accepts if all lead to
accept)
states.

Outline

● Complexity and ATM

● Complexity and games (incl.
planning)

● Bounded horizon games

Computational complexity:
framework
Discrete time, uncertainty.
Uncertainty can be stochastic or adversarial.

Succinct representation or flat representations.

Which representation is more natural ?
Probably succinct (one of the succinct...), but
it's not always so easy...

Complexity, partial observation,
infinite horizon

● 1P+random, unobservable: undecidable
(Madani et al)
● 1P+random, P(win=1),
or equivalently 2P, P(win=1):
[Rintanen and refs therein]
– Fully observable: EXP [Littman94]

– Unobservable: EXPSPACE [Hasslum et al 2000]
– Partial observability: 2EXP

Rmk: “2P, P(win=1)” is not “2P”!

Complexity, partial observation,
infinite horizon

● 2P vs 1P: undecidable! [Hearn, Demaine]
● 2P (random or not):
– Existence of sure win: equiv. to 1P+random !
● EXP full-observable (e.g. Go, Robson 1984)
● PSPACE unobservable
● 2EXP partially observable
– Existence of sure win, same state forbidden:
EXPSPACE-complete (Go with Chinese rules ?
rather conjectured EXPTIME or PSPACE...)
– General case (optimal play): undecidable
(Auger, Teytaud) (what about phantom-Go ?)

Complexity, partial observation

Remarks:
● Continuous case ?
● Purely epistemic (we gather information, we
don't change the state) ? [Sabbadin et al]
● Restrictions on the policy, on the set of
actions...
● Discounted reward
● DEC-POMDP, POSG : many players,
same/opposite/different reward functions...

Let's explain
Distributed:
If you work on a problem with
Decision making + policy billions euros,
budget ~ 500
optimization a cluster is not that expensive.
Moreover, the problem is
Dynamic naturally multi-level:
games - High level = investments
- Low level = management
Partially Observable ~ 3 years, 2 weeks,
(horizon
1 day, 1 minute)
Distributed
Multiobjective

Distributed nature of the
problem

High level: optimization of the investments
(horizon = 50 years)

Lower level: simulation of the system,
given investment strategies
(lower level = parallelized)

(real case a bit more
complicated than that)

Let's explain

Decision makingOne policy for each
+ objective
optimization of several scenarios

Dynamic (climate change,
games fossile fuels,
technologies...)

Distributed
Multiobjective

Let's explain

optimization
One objective for each
Dynamic of several risk levels
games (median, 5% worst,
1% worst, ...)

Distributed
Multiobjective

Research philosophy

Too much industrial for Inria / Paris-Sud ?
In my humble opinion, no.
Industrial research is good if:
- it is widely applicable
(it is!)
- or it is visible and easy to operate
(it is not... “games” are!)
- or It is very important
(would you like it if there was nobody from academy
working numerically on this ?
==> we are **the** neutral people...)

What are the approaches ?

– Dynamic programming (Massé – Bellman 50's) (still
the main approach in industry), alpha-beta, retrograde analysis
– Reinforcement learning
– MCTS (R. Coulom. Efficient Selectivity and Backup
Operators in Monte-Carlo Tree Search. In
Proceedings of the 5th International Conference on
Computers and Games, Turin, Italy, 2006)
– Scripts + Tuning / Direct Policy Search
– Coevolution


– Coevolution

==> remove non-anytime tools


– Coevolution

==> remove unstable tools

What do we use ?

MCTS =
- start with a MC (random simulator)
- online optimize the simulations
depending on statistics (updates the
near future)

DPS = optimize a random simulator so that
decisions become better (far future
effects correctly handled)

Currently, we use MCTS with
DPS as a MC tool.

Conclusions

Nice big problems in energy. Require
collaborations (many models, datas).
● Our role is not to conclude “(don't) use
shale gas” or “(don't) use methane
clathrate”
● Better: “if you use quantify XXX of
clathrate and YYY of shale gas in
conditions ZZZ then the distribution of
economical and ecological costs switches
to ...”

Conclusions

Nice big problems in energy. Require
collaborations. By the way, if you want to
collaborate, people working numerically on this kind of
stuff are more than welcome :-)

Anytime algorithms are necessary, mixing
between MCTS / DPS.

There are still natural questions which are
undecidable ==> decidability matters.
Madani et al (1 player against random, no observability), extended here to
2 players with no random

Open problems & targets

Phantom-Go undecidable ?

Complexity of Go with Chinese rules ?
(conjectured: PSPACE or EXPTIME;
proved PSPACE-hard + EXPSPACE)

A stable high-scale anytime platform for
our energy management problems
==> if you like experimenting join us :-)

Labex2012g

Recommandé

Recommandé

Contenu connexe

Dernier

Dernier (20)

En vedette

En vedette (20)

Labex2012g