Theory of games

Bandit-based Monte-Carlo planning: the game
of Go and beyond

Games

Olivier.Teytaud@inria.fr + too many people for being all cited. Includes Inria, Cnrs, Univ.
Paris-Sud, LRI, CMAP, Univ. Amsterdam, Taiwan universities (including NUTN)

TAO, Inria-Saclay IDF, Cnrs 8623, Lri, Univ. Paris-Sud,
Digiteo Labs, Pascal Network of Excellence.

Tao,
January 2010+updated 2012.

Games

Introduction

Complexity measures

Computational complexity

Partial observability

Zoology

Introduction to games

Partially or fully observable
(“phantom” games)
Randomized or not
Iterated or not
1,2,3,... players
Decentralized or not
Continuous or not
Infinite time or not


Randomized or not
Iterated or not
1,2,3,... players
Continuous or not


Randomized or not
Iterated or not (reputation)
1,2,3,... players
Continuous or not


Randomized or not
Iterated or not
1,2,3,... players
Continuous or not

(rengo)

Complexity measures
(not always well defined)

State-space complexity
Game-tree size
Decision complexity
Game-tree complexity
Perfect-play complexity
State of the art level

Complexity measures

State-space complexity = number
of possible states
Game-tree size
Decision complexity

Complexity measures

Game-tree size = number of leafs
Decision complexity

Complexity measures

Game-tree size
Decision complexity = min # of
leafs of tree showing perfect play

Complexity measures

Game-tree size
Decision complexity
Game-tree complexity = # of leafs
for perfect play with constant depth

Complexity measures

Game-tree size
Decision complexity
Computational complexity (=
complexity classes, later)

Complexity measures

Game-tree size
Decision complexity
Perfect-play complexity (complexity
of perfect algorithm)


Very weak solving
Means that we know who should win
Typically proved by strategy-stealing
E.g.: hex (first player wins), hex + swap
(second player wins)
Weak solving
Strong solving
Best results so far

State of the art
level

Very weak solving
Weak solving
Perfect play reached with reasonnable computation
time
Biggest success: draughts (tenths of years of
computation on tenths of machines)
Strong solving
Best results so far

State of the art
level

Very weak solving
Weak solving
Strong solving
Perfect play from any situation in
reasonable time (variants of Tic-Tac-Toe)
Best results so far


Very weak solving
Weak solving
Strong solving

Best results so far
Shi-Fu-Mi: humans loose
English draughts: humans + machines reach perfect
play
Chess: nobody can compete with machines
9x9 Go: MoGoTW won with the disadvantageous side
with a top player

Games

Introduction

Complexity measures

Computational

complexity


Zoology

Computational complexity:
Main reasons for this measure ?

Good feeling of understanding
(disagree if you want :-) )
Explicit families of problems
(extracted by reduction)
Fun
Connections
with classical complexity measures
Much better for looking clever
(when you speak about NP-complete
problems you look clever)

Computational complexity:
Drawbacks

Not clearly related to human/computer
comparisons

Trivial games can be very complex (this
measure if a worst case on situations that might never
occur from the start of the game - many solvings are
based on openings restricting the game)

Often based on incredibly long games


Known:

Conjectured: strict inclusions everywhere.

Higher classes include undecidable cases.


Given a class X, a problem q can be
in X
or harder than pbs in X (X-hard)
or both (X-complete)
or neither
NP
NP -difficile
NP -complete

Complexity quizz

NP means non-polynomial ?
Assume P≠ NP. NP=NP-complete U P ?
Are there problems solvable in
exponential time but not in polynomial
time ?
quadratic time but not in linear time ?
Are there problems which can't be
solved, even with infinite time and
space ?

Complexity quizz

Are The existence of exponential problems is known.
No. there problems solvable in
It means (roughly) “polynomial with a machine which
time ?
can run several branches simultaneously”.
It means (very roughly) “polynomial in linear time ?
quadratic time but not with a machine which
just has to verify a proof”.
Maybe P = NP is not so interesting as a question :-)
space ?

Complexity quizz

No.
time ? are intermediate problems (if P≠ NP).
There
quadratic time butNP-problems
Yet, many important not in linear time ?
Are are eitherproblems which can't be
there P or NP-complete.
space ?

Complexity quizz

time ?
YES, infinite time and
solved, even with YES, and YES.
space ?

Complexity quizz

time ?
YES !
All P problems do not have
space ? the same complexity.

Complexity quizz

YES ! Undecidable problem.
E.g.: time but not in polynomial
exponential
time ? Is there a seg-fault ?
space ?


For evaluating the complexity of your game:

1. Generalize your game to any size
(non trivial for chess)
2. Consider the problem:
- here is a board
- is the situation a win in perfect play ?

NP
NP
NP -complete -difficile

How to show X-completeness

The problem is in X: show that you can
solve it with resources allowed in class X.

The problem is complete: show that you
can encode a X-complete problem in your
problem.

NP
NP


==> cast into a decision problem (binary question)

==> can be used for choosing optimal move
(but not necessary)

==> trivial games can be EXPTIME-hard

==> no clear correlation with the fact that a game is difficult
for a computer (when compared to humans)

NP
NP

A PSPACE-complete pb: planar
generalized geography
- A graph (oriented, planar) is given.
- Each player follows an edge (in turn).
- Repetition is not allowed.
- The first player who can't play looses.

==> A winning strategy for first player ?

Another PSPACE-complete pb:
quantified boolean formula

True or false ?

A EXPTIME-complete pb: does a
Turing machine halts in n steps ?

- A program is given.
- A number n is given.
- Will the program halt in n time steps ?

Best solution: simulate.
Cost: n (which is exponential in log(n)!)


Here discussed in the compact case
(i.e. representation by formula)

Usually, compact ==> bigger cost
(e.g. P ==> PSPACE)

Partial observability (structured)
(more difficult than an opponent)
P(success)>c is undecidable
(proba+opponent)
(if no time limit.)
==> analyzing P(success)=1 (no proba).

Phantom-games >>> POMDP

See Rintanen 03 (case with formulae).

Phantom-games & POMDP
with infinite horizon

Madani et al: infinite time POMDP are
undecidable.
Auger, Teytaud: finite time deterministic
games are undecidable.

Undecidability of phantom-Go ?

PSPACE vs EXPTIME

==> many important games are either PSPACE or EXPTIME

Theorem: If playing = filling a location
for eternity, then it is PSPACE.
(not necessarily PSPACE-complete!)

Proof: Depth-first search.
Applis: Hex, Havannah, Tic-Tac-Toe,
Atari-Go...

Go: from 29 to 6 stones
1998: loss against amateur (6d) 19x19 H29
2008: win against a pro (8p) 19x19, H9 MoGo
2008: win against a pro (4p) 19x19, H8 CrazyStone
2008: win against a pro (4p) 19x19, H7 CrazyStone

2007: win against a pro (5p) 9x9 (blitz) MoGo
2008: win against a pro (5p) 9x9 white MoGo
2009: win against a pro (5p) 9x9 black MoGo
2009: win against a pro (9p) 9x9 white Fuego
2009: win against a pro (9p) 9x9 black MoGoTW

==> still 6 stones at least!

Game of Go: counting territories
(white has 7.5 “bonus” as black starts)

Game of Go: the rules
Black plays at the blue circle: the
white group dies (it is removed)

It's impossible to kill white (two “eyes”).

“Ko” rules: we don't come back to the same situation.

(without ko: “PSPACE hard”
with ko: “EXPTIME-complete”)

At the end, we count territories
==> black starts, so +7.5 for white.

NP / PSPACE / EXPTIME in Go
Tsumegos with no ko, forced moves only for
W, 2 moves for B, polynomial length: NP-
complete
Atari Go : PSPACE
Go without ko: PSPACE-hard
Go with ko + japanese rules:
EXPTIME-complete
Go with ko + superko: unknown (EXPSPACE?)
Some phantom-rengo undecidable ?

If Go with ko > Go without ko, then
PSPACE EXPTIME

NP / PSPACE / EXPTIME in Go

Encoding
the formula
in a ladder:

Appendix 2: what is difficult for
computers ? Visual things ?

70

Easy for computers ... because
human knowledge easy to encode.

71

Difficult for computers
Muy difícil para las ordenadores.

72

A trivial semeai

Plenty of equivalent
situations!

They are randomly
sampled, with
no generalization.

50% of estimated
win probability!

Semeai

Plenty of equivalent
situations!

They are randomly
sampled, with
no generalization.

50% of estimated
win probability!

It does not work. Why ?

50% of estimated
win probability!

In the first node:
The first simulations give ~ 50%
The next simulations go to 100% or 0% (depending
on the chosen move)
But, then, we switch to another node
(~ 8! x 8! such nodes)

And the humans ?

50% of estimated
win probability!

In the first node:
The first simulations give ~ 50%
The next simulations go to 100% or 0% (depending
on the chosen move)
But, then, we DON'T switch to another node

Semeais

Should
white
play in
the
semeai
(G1)
or capture
(J15) ?
86

Semeais

Should black
play the
semeai ?

87

Semeais

Should black
play the
semeai ?

88

Semeais

Should black
play the
semeai ?

Useless!

89

Difficult games: Havannah

Very difficult
for computers.

Conclusions + other
elements
Go complexity:
superko ?
Ishi-no-shita (captures / recaptures) ?
(more generally: characterizing strength /weakness of programs ?)

Huge complexity classes for
structured games
partially observable games (what about phantom-games ?)
decentralized games

Great results for MCTS in GGP + difficult games. Next MCTS-challenges:
Partially observable cases & large horizon : cf Cazenave, Rolet
Solve main weaknesses of MCTS
(learning the MC ? Meta-actions ? Nested MC ?
Mixing with value-function as in amazon ?)

Biblio
Complexity: Robson, Tromp, Taylor, Crasmaru, ...
Bandits: Lai, Robbins, Auer, Cesa-Bianchi...
UCT: Kocsis, Szepesvari, Coquelin, Munos...
MCTS (Go): Coulom, Chaslot, Fiter, Gelly, Hoock, Silver, Muller,
Pérez, Rimmel, Wang...
Tree + DP for industrial applicationl: Péret, Garcia...
Bandits with infinitely many arms:
Audibert, Coulom, Munos, Wang...
Applications far from Go: Rolet,
Teytaud (F), Rimmel, De Mesmay
...
Links with “macro-actions” ?
Parallelization, mixing with offline
learning, bias...

Paul Veyssière
Hassen Doghmen
Amine Bourki
Matthieu Coulm Contributors Colleagues from
NUTN and CJCU

Bandits: Lai, Robbins, Auer, Cesa-Bianchi...
UCT: Kocsis, Szepesvari, Coquelin, Munos...
MCTS (Go): Coulom, Chaslot, Fiter, Gelly, Hoock, Silver, Muller,
Pérez, Rimmel, Wang...
Tree + DP for industrial applicationl: Péret, Garcia...
Bandits with infinitely many arms:
Audibert, Coulom, Munos, Wang...
Applications far from Go: Rolet,
Teytaud (F), Rimmel, De Mesmay
...
Links with “macro-actions” ?
Parallelization, mixing with offline
learning, bias...

Theory of games

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (13)

Similaire à Theory of games

Similaire à Theory of games (20)

Dernier

Dernier (20)

Theory of games