Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Grenoble
1. Monte-Carlo Tree Search
Games with partial
observation
Olivier.Teytaud@inria.fr + David Auger
+Hervé Fournier + Fabien Teytaud + Sébastien Flory
+ JY Audibert+ S. Bubeck + R. Munos + ...
Includes Inria, Cnrs, Univ. Paris-Sud, LRI, CMAP,
Taiwan universities, Lille, Paris, Boostr...
TAO, Inria-Saclay IDF, Cnrs 8623,
Lri, Univ. Paris-Sud,
Digiteo Labs, Pascal
Network of Excellence.
Grenoble
June 2011
Games with simultaneous actions 1 Grenoble, June 19th, 2011.
2. Monte-Carlo Tree Search
1. Games (a bit of formalism)
2. Hidden information <==> SA
3. Decidability / complexity
4. Real implementation
==> appli to UrbanRivals
Games with simultaneous actions 2 Grenoble, June 19th, 2011.
5. A game is a directed graph with actions
and players
1 White
Black
2
3
White 12
43
White Black
Black
Black
Black
Games with simultaneous actions Grenoble, June 19th, 2011. 5
6. A game is a directed graph with actions
and players and observations
Bob
Bear Bee
Bee 1 White
Black
2
3
White 12
43
White Black
Black
Black
Black
Games with simultaneous actions Grenoble, June 19th, 2011. 6
7. A game is a directed graph with actions
and players and observations and rewards
Bob
Bear Bee
Bee 1 White
Black
2
3
+1
0
White 12
43 Rewards
White Black on leafs
Black only!
Black
Black
Games with simultaneous actions Grenoble, June 19th, 2011. 7
8. A game is a directed graph +actions
+players +observations +rewards +loops
Bob
Bear Bee
Bee 1 White
Black
2
3
+1
0
White 12
43
White Black
Black
Black
Black
Games with simultaneous actions Grenoble, June 19th, 2011. 8
9. Monte-Carlo Tree Search
1. Games (a bit of formalism)
2. Hidden information <==> SA
3. Decidability / complexity
4. Real implementation
Games with simultaneous actions 9 Grenoble, June 19th, 2011.
10. A game is a directed graph +actions
+players +observations +rewards +loops
Consider games as follows:
Bob
Bear Bee
Bee 1
Black
Turn 1 White
Turn 2
2
…
3
+1
0
Turn K: all information is revealed.
Turn K+1 White 12
Turn K+2
… White 43
Black
Turn 2K: all information is revealed
Black
…
… Black
TurnBlack all information is revealed
NK:
Games with simultaneous actions Grenoble, June 19th, 2011. 10
11. A game is a directed graph +actions
Rewrite it as follows:
+players +observations +rewards +loops
Bob
Turn 1: player 1 chooses Bee Bear
Bee 1
Black
(privately) his strategy until turn K
White
Turn 2: player 2 chooses
2
(privately) his strategy until turn K +1
3
Intermediate turns removed! 0
Turn K: all information is revealed.
White 12
Turn K+1
Turn White K+2 43
Black
… Black
Turn 2K: all information is revealed
… Black
… Black
Games with simultaneous actions all information 2011. revealed
Turn NK: Grenoble, June 19th, is 11
12. A game is a directed graph +actions
Rewrite it as follows:
+players +observations +rewards +loops
Bob
Turn 1: player 1 chooses Bee Bear
Bee 1
Black
(privately) his strategy until turn K
White
Equivalent
Turn 2: player 2 chooses to
2
(privately) his strategy until turn K +1 simultaneous
3 actions
Intermediate turns removed! 0
Turn K: all information is revealed.
White 12
Turn K+1
Turn White K+2 43
Black
… Black
Turn 2K: all information is revealed
… Black
… Black
Games with simultaneous actions all information 2011. revealed
Turn NK: Grenoble, June 19th, is 12
13. A game is a directed graph +actions
+players +observations +rewards +loops
Bob
Bear Bee
Bee 1 White
Black
Now it's a game with simultaneous information
2
and no hidden information.
3
+1
0
Simultaneous actions
White 12
White
= (almost) Black
43
short term hidden information.
Black
Black
Black
Games with simultaneous actions Grenoble, June 19th, 2011. 13
14. Monte-Carlo Tree Search
1. Games (a bit of formalism)
2. Hidden information <== SA
(and sometimes <==>)
3. Decidability / complexity
4. Real implementation
Games with simultaneous actions 14 Grenoble, June 19th, 2011.
15. Compact representation ?
Succinct representation (in short, without tedious details):
- graph of size N represented in size O(log N) ;
- usually not better in terms of complexity;
- keep this in mind when considering complexity.
Games with simultaneous actions 15 Grenoble, June 19th, 2011.
16. Complexity question ?
Instance = position.
Question = Is there a strategy
which wins whatever
are the decisions
of the opponent ?
= natural question if full observability.
Answering this question then allows perfect
play.
Games with simultaneous actions 16 Grenoble, June 19th, 2011.
17. Complexity question ? (UD)
Instance = position.
Question = Is there a strategy
which wins whatever
are the decisions
of the opponent ?
= natural question if full observability.
Answering this question then allows perfect
play.
Games with simultaneous actions 17 Grenoble, June 19th, 2011.
18. Complexity question for matrix
game ?
100000
Good for column-player !
010000
001000 ==> but no sure win.
000100 ==> the “UD” question is not
000010 relevant here!
000001
Games with simultaneous actions 18 Grenoble, June 19th, 2011.
19. Complexity question for
Joint work with
phantom-games ? F. Teytaud
This is phantom-go.
Good for black: wins
with proba 1-1/(8!)
Here,
there's no move
which ensures a win.
But some moves are
much better than
others!
Games with simultaneous actions 19 Grenoble, June 19th, 2011.
20. It becomes complicated
Isn't it possible to
consider
a better question ?
Games with simultaneous actions Grenoble, June 19th, 2011. 20
21. Complexity (2P, no random)
X= proba(winning) that we look for
Unbounded Exponential Polynomial
horizon horizon horizon
Full
Observability EXP EXP PSPACE
No obs EXPSPACE NEXP
(X=100%) (Hasslum et al, 2000)
Partially 2EXP EXPSPACE
Observable (Rintanen) (Mundhenk)
(X=100%)
Simult. Actions ? EXPSPACE ? <<<= EXP <<<= EXP
No obs undecidable Teytaud,
Auger, IJFCS
Partiallywith simultaneous actions
Games undecidable 21 (accepted)
Grenoble, June 19th, 2011.
Observable
22. State of the art
EXPTIME-complete in the general
Grenoble, June 19th, 2011.
fully-observable case
Games with simultaneous actions 22
23. EXPTIME-complete fully
observable games
- Chess (for some nxn generalization)
- Go (with no superko)
- Draughts (international or english)
- Chinese checkers
- Shogi
Games with simultaneous actions 23 Grenoble, June 19th, 2011.
24. PSPACE-complete fully
observable games
- Amazons
- Hex
- Go-moku
- Connect-6
- Qubic
- Reversi
- Tic-Tac-Toe
Many games with filling of each cell once and only once
Games with simultaneous actions 24 Grenoble, June 19th, 2011.
25. EXPSPACE-complete
unobservable games (Hasslun & Jonnsson)
The two-player unobservable case is
EXPSPACE-complete
(games in succinct form).
Games with simultaneous actions 25 Grenoble, June 19th, 2011.
26. E X P S P Atwo-player unobservable case is
The C E - c o m p l e t e
unobservable games (Hasslun & Jonnsson)
EXPSPACE-complete
(games in succinct form).
PROOF:
(I) First note that strategies are just sequences of actions
(no observability!) + UD=>opponent can see the state!
(II) It is in EXPSPACE=NEXPSPACE, because of the
following algorithm:
(a) Non-deterministically choose the sequence of
Actions
(b) Check the result against all possible strategies
(III) We have to check the hardness only.
Games with simultaneous actions 26 Grenoble, June 19th, 2011.
27. E X P S P Atwo-player unobservable case is
The C E - c o m p l e t e
unobservable games (Hasslun & Jonnsson)
EXPSPACE-complete
(games in succinct form).
PROOF:
(I) First note that strategies are just sequences of actions
(no observability!) + UD=>opponent can see the state!
(II) It is in EXPSPACE=NEXPSPACE, because of the
following algorithm:
(a) Non-deterministically choose the sequence of
actions
(b) Check the result against all possible strategies
(III) We have to check the hardness only.
Games with simultaneous actions 27 Grenoble, June 19th, 2011.
28. E X P S P Atwo-player unobservable case is
The C E - c o m p l e t e
unobservable games (Hasslun & Jonnsson)
EXPSPACE-complete
(games in succinct form).
PROOF:
(I) First note that strategies are just sequences of actions
(no observability!)
(II) It is in EXPSPACE=NEXPSPACE, because of the
following algorithm:
(a) Non-deterministically choose the sequence of
actions
(b) Check the result against all possible strategies
(III) We have to check the hardness only.
Games with simultaneous actions 28 Grenoble, June 19th, 2011.
29. E X P S P Atwo-player unobservable case is
The C E - c o m p l e t e
unobservable games (Hasslun & Jonnsson)
EXPSPACE-complete
(games in succinct form).
PROOF of the hardness:
Reduction to: is my TM with exponential tape
going to halt ?
Consider a TM with tape of size N=2^n.
We must find a game
- with size n ( n= log2(N) )
- such that the first player has a winning
strategy iff the TM halts.
Games with simultaneous actions 29 Grenoble, June 19th, 2011.
30. EXPSPACE-complete
uEncoding ravTuring machine with Ha stape & J osizes oN)
n o b s e a b l e g a m e s ( a s l u n of n n s n
as a game with state O(log(N))
Player 1 chooses the sequence of
configurations of the tape (N=4):
x(0,1),x(0,2),x(0,3),x(0,4) ==> initial state
x(1,1),x(1,2),x(1,3),x(1,4)
x(2,1),x(2,2),x(2,3),x(2,4)
x(3,1),x(3,2),x(3,3),x(3,4)
.....................................
Games with simultaneous actions 30 Grenoble, June 19th, 2011.
31. EXPSPACE-complete
uEncoding ravTuring machine with Ha stape & J osizes oN)
n o b s e a b l e g a m e s ( a s l u n of n n s n
as a game with state O(log(N))
Player 1 chooses the sequence of
configurations of the tape (N=4):
x(0,1),x(0,2),x(0,3),x(0,4) ==> initial state
x(1,1),x(1,2),x(1,3),x(1,4)
x(2,1),x(2,2),x(2,3),x(2,4)
x(3,1),x(3,2),x(3,3),x(3,4)
.....................................
x(N,1), x(N,2), x(N,3), x(N,4)
Grenoble, June 19th, 2011.
Wins by
Games with simultaneous actions 31
final state !
32. EXPSPACE-complete
uEncoding ravTuring machine with Ha stape & J osizes oN)
n o b s e a b l e g a m e s ( a s l u n of n n s n
as a game with state O(log(N))
Player 1 chooses the sequence of
configurations of the tape (N=4):
x(0,1),x(0,2),x(0,3),x(0,4) ==> initial state
x(1,1),x(1,2),x(1,3),x(1,4)
x(2,1),x(2,2),x(2,3),x(2,4)Except if P2 finds an
x(3,1),x(3,2),x(3,3),x(3,4) illegal transition!
..................................... ==> P2 can check the
x(N,1), x(N,2), x(N,3), x(N,4)
consistency of one 3-uple per line
Wins by
Games with simultaneous actions 32 ==> requests space log(N)
Grenoble, June 19th, 2011.
final state ! ( = position of the 3-uple)
33. EXPSPACE-complete PO games
The one-player PO case is
EXPSPACE-complete
(games in succinct form).
Games with simultaneous actions 33 Grenoble, June 19th, 2011.
34. 2EXPTIME-complete PO games
The two-player PO case is
2EXP-complete
(games in succinct form).
Games with simultaneous actions 34 Grenoble, June 19th, 2011.
35. Undecidable games (B. Hearn)
The three-player PO case is
undecidable. (two players against one,
not allowed to communicate)
Games with simultaneous actions 35 Grenoble, June 19th, 2011.
36. Complexity (2P, no random)
Unbounded Exponential Polynomial
horizon horizon horizon
Full
Observability EXP EXP PSPACE
No obs EXPSPACE NEXP
(X=100%) (Hasslum et al, 2000)
Partially 2EXP EXPSPACE
Observable (Rintanen 97)
(X=100%) Reduction to 1P + random
(Madani et al)
Simult. Actions ? EXPSPACE ? <<<= EXP <<<= EXP
No obs undecidable
Partiallywith simultaneous actions
Games undecidable 36 Grenoble, June 19th, 2011.
Observable
37. Another formalization
c
==> much more satisfactory
Games with simultaneous actions 37 Grenoble, June 19th, 2011.
38. Madani et al.
c
1 player + random = undecidable.
Games with simultaneous actions 38 Grenoble, June 19th, 2011.
39. Madani et al.
1 player + random = undecidable.
We extend to two players with no random.
Problem: rewrite random nodes, thanks to additional
player.
Games with simultaneous actions 39 Grenoble, June 19th, 2011.
40. A random node to be rewritten
Games with simultaneous actions 40 Grenoble, June 19th, 2011.
41. A random node to be rewritten
Games with simultaneous actions 41 Grenoble, June 19th, 2011.
42. A random node to be rewritten
Rewritten as follows:
Player 1 chooses a in [[0,N-1]]
Player 2 chooses b in [[0,N-1]]
c=(a+b) modulo N
Go to tc
Each player can force the game to be equivalent to
the initial one (by playing uniformly)
==> the proba of winning for player 1 (in case of perfect play)
is the same as for the initial game
==> undecidability!
Games with simultaneous actions 42 Grenoble, June 19th, 2011.
43. Important remark
Existence of a strategy for winning with
proba > 0.5
==> also undecidable for the
restriction to games in which the proba
is >0.6 or <0.4
==> not just a subtle
precision trouble.
Games with simultaneous actions 43 Grenoble, June 19th, 2011.
44. Monte-Carlo Tree Search
1. Games (a bit of formalism)
2. Hidden information <==> SA
3. Decidability / complexity
4. Real implementation
Games with simultaneous actions 44 Grenoble, June 19th, 2011.
45. Real implementation for
simultaneous action ?
MCTS principle
But with EXP3 in nodes.
Games with simultaneous actions 45 Grenoble, June 19th, 2011.
56. ... or exploration ?
SCORE =
0/2
+ k.sqrt( log(10)/2 )
Replace it
with
EXP3 / INF
57. The game of Go is a part of AI.
Computers are ridiculous in front of children.
Easy situation.
Termed “semeai”.
Requires a little bit
of abstraction.
58. The game of Go is a part of AI.
Computers are ridiculous in front of children.
800 cores, 4.7 GHz,
top level program.
Plays a stupid move.
59. The game of Go is a part of AI.
Computers are ridiculous in front of children.
8 years old;
little training;
finds the good move
60. MoGo(TW): games vs pros
in the game of Go
First win in 9x9
First draw (a few days ago!) over 6 games
First win over 4 games in 9x9 blind Go
First win with H2.5 in 13x13 Go
First win with H6 in 19x19 Go in 2009 (also done by Zen)
First win with H7 in 19x19 Go vs top pro in 2009 (also
done by Pachi in 2011)
61. Monte-Carlo Tree Search
1. Games (a bit of formalism)
2. Hidden information <==> SA
3. Decidability / complexity
4. Real implementation
==> Dark Chess endgames
==> appli to UrbanRivals
Games with simultaneous actions 61 Grenoble, June 19th, 2011.
62. Let's have fun with Urban Rivals (4 cards)
Each player has
- four cards (each one can be used once)
- 12 pilz (each one can be used once)
- 12 life points
Each card has:
- one attack level
- one damage
- special effects (forget it for the moment)
Four turns:
- P1 attacks P2
- P2 attacks P1
- P1 attacks P2
- P2 attacks P1
Games with simultaneous actions Grenoble, June 19th, 2011. 62
63. Let's have fun with Urban Rivals
First, attacker plays:
- chooses a card
- chooses ( PRIVATELY ) a number of pilz
Attack level = attack(card) x (1+nb of pilz)
Then, defender plays:
- chooses a card
- chooses a number of pilz
Defense level = attack(card) x (1+nb of pilz)
Result:
If attack > defense
Defender looses Power(attacker's card)
Else
Attacker looses Power(defender's card)
Games with simultaneous actions Grenoble, June 19th, 2011. 63
64. Let's have fun with Urban Rivals
==> The MCTS-based AI is now at the best human level.
Experimental (only) remarks on EXP3:
- discard strategies with small number of sims = better approx
of the Nash
- also an improvement by taking into
account the other bandit
- not yet compared to INF
- virtual simulations (inspired by Kummer)
Games with simultaneous actions Grenoble, June 19th, 2011. 64
65. Conclusions
New stuff:
Undecidability of optimal play for 2-player games with hidden information
Transformation “PO periodically revealed ==> simultaneous action game
with full observation”
Open problems
Complexity: simultaneous action and infinite horizon (in progress)
Complexity with PO: same information for both cases ?
Nash of matrix games with strong dominance
Mathematical validation of variants of Exp3 / Inf
Consistent “realistic” approaches for PO games (H finite)
66. Conclusions
New stuff:
Undecidability of optimal play for 2-player games with hidden information
Transformation “PO periodically revealed ==> simultaneous action game
with full observation”
Open problems
Complexity: simultaneous action and infinite horizon (in progress)
Complexity with PO: same information for both cases ?
Nash of matrix games with strong dominance
Mathematical validation of variants of Exp3 / Inf
Consistent “realistic” approaches for PO games (H finite)
67. Conclusions
New stuff:
Undecidability of optimal play for 2-player games with hidden information
Transformation “PO periodically revealed ==> simultaneous action game
with full observation”
Open problems
Complexity: simultaneous action and infinite horizon (in progress)
Complexity with PO: same information for both cases ?
Nash of matrix games with strong dominance
Mathematical validation of variants of Exp3 / Inf
Consistent “realistic” approaches for PO games (H finite)
68. Conclusions
New stuff:
Undecidability of optimal play for 2-player games with hidden information
Transformation “PO periodically revealed ==> simultaneous action game
with full observation”
Open problems
Complexity: simultaneous action and infinite horizon (in progress)
Complexity with PO: same information for both players ?
Nash of matrix games with strong dominance
Mathematical validation of variants of Exp3 / Inf
Consistent “realistic” approaches for PO games (H finite)
69. Conclusions
New stuff:
Undecidability of optimal play for 2-player games with hidden information
Transformation “PO periodically revealed ==> simultaneous action game
with full observation”
Open problems
Complexity: simultaneous action and infinite horizon (in progress)
Complexity with PO: same information for both players ?
Nash of matrix games with strong dominance
Mathematical validation of variants of Exp3 / Inf
Consistent “realistic” approaches for PO games (H finite)
70. Conclusions
New stuff:
Undecidability of optimal play for 2-player games with hidden information
Transformation “PO periodically revealed ==> simultaneous action game
with full observation”
Open problems
Complexity: simultaneous action and infinite horizon (in progress)
Complexity with PO: same information for both players ?
Nash of matrix games with strong dominance
Mathematical validation of variants of Exp3 / Inf
Consistent “realistic” approaches for PO games (H finite)
71. Conclusions
New stuff:
Undecidability of optimal play for 2-player games with hidden information
Transformation “PO periodically revealed ==> simultaneous action game
with full observation”
Open problems
Complexity: simultaneous action and infinite horizon (in progress)
Complexity with PO: same information for both players ?
Nash of matrix games with strong dominance
Mathematical validation of variants of Exp3 / Inf
Consistent “realistic” approaches for PO games (H finite)
72. When is MCTS relevant ?
Robust in front of:
High dimension;
Non-convexity of Bellman values;
Complex models
Delayed reward
Simultaneous actions
More difficult for
High values of H;
Highly unobservable cases (Monte-Carlo, but not
Monte-Carlo Tree Search, see Cazenave et al.)
Lack of reasonable baseline for the MC
73. When is MCTS relevant ?
We should
Robust in front of: test INF and
High dimension; justify mathematically
Non-convexity of Bellman values;
our improvements
Complex models
Delayed reward Some Further
Simultaneous actions undecidability
results
work !
More difficult for
High values of H;
Highly unobservable cases (Monte-Carlo, but not
Monte-Carlo Tree Search, see Cazenave et al.)
Lack of reasonable baseline for the MC
74. When is MCTS relevant ?
Convenient.
How to apply it: Easy to check.
Implement the transition
(a function action x state → state )
Design a Monte-Carlo part (a random simulation)
(a heuristic in one-player games;
difficult if two opponents)
==> at this point you can simulate...
Implement UCT (just a bias in the simulator – no real optimizer)
Possibly parallelize (Gelly et al)
75. PO problems, approx.
Nash ==> mailing list
Challenge: outperform humans
in “Urban Rivals”
- free game
- fast games (~ 1 minute)
- 11M registered players