SlideShare une entreprise Scribd logo
1  sur  75
Monte-Carlo Tree Search



Games with partial
observation
 Olivier.Teytaud@inria.fr + David Auger
+Hervé Fournier + Fabien Teytaud + Sébastien Flory
+ JY Audibert+ S. Bubeck + R. Munos + ...
Includes Inria, Cnrs, Univ. Paris-Sud, LRI, CMAP,
Taiwan universities, Lille, Paris, Boostr...

TAO, Inria-Saclay IDF, Cnrs 8623,
Lri, Univ. Paris-Sud,
Digiteo Labs, Pascal
Network of Excellence.


Grenoble
June 2011
  Games with simultaneous actions        1           Grenoble, June 19th, 2011.
Monte-Carlo Tree Search




1. Games (a bit of formalism)

2. Hidden information <==> SA

3. Decidability / complexity

4. Real implementation
         ==> appli to UrbanRivals


 Games with simultaneous actions   2   Grenoble, June 19th, 2011.
A game is a directed graph




Games with simultaneous actions   Grenoble, June 19th, 2011.   3
A game is a directed graph with actions

                      1


                      2
           3




Games with simultaneous actions   Grenoble, June 19th, 2011.   4
A game is a directed graph with actions
                        and players
                      1                          White
Black
                      2
           3

                                  White          12

                                              43
                    White                                         Black
                                                    Black

                                   Black
                   Black
Games with simultaneous actions      Grenoble, June 19th, 2011.           5
A game is a directed graph with actions
                and players and observations
Bob
                                           Bear              Bee
          Bee          1                          White
Black
                       2
            3

                                   White          12

                                               43
                     White                                         Black
                                                     Black

                                    Black
                    Black
 Games with simultaneous actions      Grenoble, June 19th, 2011.           6
A game is a directed graph with actions
        and players and observations and rewards
Bob
                                           Bear              Bee
          Bee          1                          White
Black
                       2
            3
                                                                           +1
                                                                    0
                                   White          12

                                               43                          Rewards
                     White                                         Black   on leafs
                                                     Black                  only!
                                    Black
                    Black
 Games with simultaneous actions      Grenoble, June 19th, 2011.                  7
A game is a directed graph +actions
          +players +observations +rewards +loops
Bob
                                           Bear              Bee
          Bee          1                          White
Black
                       2
            3
                                                                           +1
                                                                    0
                                   White          12

                                               43
                     White                                         Black
                                                     Black

                                    Black
                    Black
 Games with simultaneous actions      Grenoble, June 19th, 2011.                8
Monte-Carlo Tree Search




1. Games (a bit of formalism)

2. Hidden information <==> SA

3. Decidability / complexity

4. Real implementation




 Games with simultaneous actions   9   Grenoble, June 19th, 2011.
A game is a directed graph +actions
         +players +observations +rewards +loops
         Consider games as follows:
Bob
                                       Bear               Bee
          Bee          1
Black
      Turn 1                   White
      Turn 2
            2
      …
       3
                                                                +1
                                       0
      Turn K: all information is revealed.
      Turn K+1       White     12
      Turn K+2
      … White                43
                                      Black
      Turn 2K: all information is revealed
                                Black
      …
      …                Black
      TurnBlack all information is revealed
           NK:
 Games with simultaneous actions   Grenoble, June 19th, 2011.        10
A game is a directed graph +actions
        Rewrite it as follows:
          +players +observations +rewards +loops
Bob
         Turn 1: player 1 chooses Bee        Bear
         Bee           1
 Black
                   (privately) his strategy until turn K
                                                     White
         Turn 2: player 2 chooses
                       2
                   (privately) his strategy until turn K +1
           3
         Intermediate turns removed!                              0
         Turn K: all information is revealed.
                                      White          12
         Turn K+1
         Turn White  K+2                          43
                                                                 Black
         …                                              Black
         Turn 2K: all information is revealed
         …                              Black
         … Black
 Games with simultaneous actions all information 2011. revealed
         Turn NK:                        Grenoble, June 19th, is         11
A game is a directed graph +actions
        Rewrite it as follows:
          +players +observations +rewards +loops
Bob
         Turn 1: player 1 chooses Bee        Bear
         Bee           1
 Black
                   (privately) his strategy until turn K
                                                     White
                                                                        Equivalent
         Turn 2: player 2 chooses                                            to
                       2
                   (privately) his strategy until turn K +1            simultaneous
           3                                                              actions
         Intermediate turns removed!                              0
         Turn K: all information is revealed.
                                      White          12
         Turn K+1
         Turn White  K+2                          43
                                                                 Black
         …                                              Black
         Turn 2K: all information is revealed
         …                              Black
         … Black
 Games with simultaneous actions all information 2011. revealed
         Turn NK:                        Grenoble, June 19th, is                12
A game is a directed graph +actions
          +players +observations +rewards +loops
Bob
                                        Bear               Bee
          Bee          1                        White
Black
        Now it's a game with simultaneous information
               2
        and no hidden information.
            3
                                                 +1
                                                                 0
                 Simultaneous actions
                       White   12


               White
                         = (almost) Black
                             43

             short term hidden information.
                                Black

                                   Black
                    Black
 Games with simultaneous actions    Grenoble, June 19th, 2011.       13
Monte-Carlo Tree Search




1. Games (a bit of formalism)

2. Hidden information <== SA
   (and sometimes <==>)

3. Decidability / complexity

4. Real implementation


 Games with simultaneous actions   14   Grenoble, June 19th, 2011.
Compact representation ?



  Succinct representation (in short, without tedious details):
  - graph of size N represented in size O(log N) ;
  - usually not better in terms of complexity;
  - keep this in mind when considering complexity.




Games with simultaneous actions   15         Grenoble, June 19th, 2011.
Complexity question ?


                  Instance = position.

          Question = Is there a strategy
                    which wins whatever
                     are the decisions
                      of the opponent ?
 = natural question if full observability.
 Answering this question then allows perfect
 play.
Games with simultaneous actions 16 Grenoble, June 19th, 2011.
Complexity question ?                                     (UD)


                 Instance = position.

          Question = Is there a strategy
                    which wins whatever
                     are the decisions
                      of the opponent ?
 = natural question if full observability.
 Answering this question then allows perfect
 play.
Games with simultaneous actions 17 Grenoble, June 19th, 2011.
Complexity question for matrix
 game ?



  100000
                                       Good for column-player !
  010000
  001000                               ==> but no sure win.
  000100                               ==> the “UD” question is not
  000010                                relevant here!
  000001
Games with simultaneous actions   18       Grenoble, June 19th, 2011.
Complexity question for
                                                         Joint work with
 phantom-games ?                                           F. Teytaud



                                         This is phantom-go.

                                         Good for black: wins
                                         with proba 1-1/(8!)

                                         Here,
                                         there's no move
                                         which ensures a win.

                                         But some moves are
                                         much better than
                                         others!
Games with simultaneous actions   19   Grenoble, June 19th, 2011.
It becomes complicated




            Isn't it possible to
                  consider
            a better question ?


Games with simultaneous actions   Grenoble, June 19th, 2011.   20
Complexity (2P, no random)
     X= proba(winning) that we look for
                            Unbounded                     Exponential         Polynomial
                              horizon                      horizon             horizon
Full
Observability                   EXP                        EXP                PSPACE

No obs                      EXPSPACE                       NEXP
(X=100%)                     (Hasslum et al, 2000)


Partially                       2EXP                       EXPSPACE
Observable                       (Rintanen)                (Mundhenk)
(X=100%)

Simult. Actions           ? EXPSPACE ?                    <<<= EXP               <<<= EXP

No obs                      undecidable                                          Teytaud,
                                                                              Auger, IJFCS
Partiallywith simultaneous actions
   Games                        undecidable          21                         (accepted)
                                                                    Grenoble, June 19th, 2011.
Observable
State of the art




 EXPTIME-complete in the general
                                       Grenoble, June 19th, 2011.
   fully-observable case
Games with simultaneous actions   22
EXPTIME-complete fully
observable games


       - Chess (for some nxn generalization)

       - Go (with no superko)

       - Draughts (international or english)

       - Chinese checkers

       - Shogi
Games with simultaneous actions   23   Grenoble, June 19th, 2011.
PSPACE-complete fully
observable games

            - Amazons
            - Hex
            - Go-moku
            - Connect-6
            - Qubic
            - Reversi
            - Tic-Tac-Toe


                 Many games with filling of each cell once and only once
Games with simultaneous actions       24         Grenoble, June 19th, 2011.
EXPSPACE-complete
unobservable games                        (Hasslun & Jonnsson)




                  The two-player unobservable case is
                  EXPSPACE-complete
                  (games in succinct form).




Games with simultaneous actions   25   Grenoble, June 19th, 2011.
E X P S P Atwo-player unobservable case is
      The C E - c o m p l e t e
unobservable games (Hasslun & Jonnsson)
      EXPSPACE-complete
      (games in succinct form).


PROOF:
 (I) First note that strategies are just sequences of actions
    (no observability!) + UD=>opponent can see the state!
 (II) It is in EXPSPACE=NEXPSPACE, because of the
   following algorithm:
   (a) Non-deterministically choose the sequence of
       Actions
   (b) Check the result against all possible strategies
 (III) We have to check the hardness only.
Games with simultaneous actions   26   Grenoble, June 19th, 2011.
E X P S P Atwo-player unobservable case is
      The C E - c o m p l e t e
unobservable games (Hasslun & Jonnsson)
      EXPSPACE-complete
      (games in succinct form).


PROOF:
 (I) First note that strategies are just sequences of actions
    (no observability!) + UD=>opponent can see the state!
 (II) It is in EXPSPACE=NEXPSPACE, because of the
   following algorithm:
   (a) Non-deterministically choose the sequence of
       actions
   (b) Check the result against all possible strategies
 (III) We have to check the hardness only.
Games with simultaneous actions   27   Grenoble, June 19th, 2011.
E X P S P Atwo-player unobservable case is
      The C E - c o m p l e t e
unobservable games (Hasslun & Jonnsson)
      EXPSPACE-complete
      (games in succinct form).


PROOF:
 (I) First note that strategies are just sequences of actions
    (no observability!)
 (II) It is in EXPSPACE=NEXPSPACE, because of the
   following algorithm:
   (a) Non-deterministically choose the sequence of
       actions
   (b) Check the result against all possible strategies
 (III) We have to check the hardness only.
Games with simultaneous actions   28   Grenoble, June 19th, 2011.
E X P S P Atwo-player unobservable case is
      The C E - c o m p l e t e
unobservable games (Hasslun & Jonnsson)
      EXPSPACE-complete
      (games in succinct form).
    PROOF of the hardness:
     Reduction to: is my TM with exponential tape
      going to halt ?

    Consider a TM with tape of size N=2^n.

    We must find a game
    - with size n              ( n= log2(N) )
    - such that the first player has a winning
           strategy iff the TM halts.
Games with simultaneous actions   29   Grenoble, June 19th, 2011.
EXPSPACE-complete
uEncoding ravTuring machine with Ha stape & J osizes oN)
  n o b s e a b l e g a m e s ( a s l u n of n n s n
           as a game with state O(log(N))


              Player 1 chooses the sequence of
              configurations of the tape (N=4):

                x(0,1),x(0,2),x(0,3),x(0,4) ==> initial state
                x(1,1),x(1,2),x(1,3),x(1,4)
                x(2,1),x(2,2),x(2,3),x(2,4)
                x(3,1),x(3,2),x(3,3),x(3,4)
                 .....................................


Games with simultaneous actions    30         Grenoble, June 19th, 2011.
EXPSPACE-complete
uEncoding ravTuring machine with Ha stape & J osizes oN)
  n o b s e a b l e g a m e s ( a s l u n of n n s n
           as a game with state O(log(N))


                Player 1 chooses the sequence of
                configurations of the tape (N=4):

                 x(0,1),x(0,2),x(0,3),x(0,4) ==> initial state
                 x(1,1),x(1,2),x(1,3),x(1,4)
                 x(2,1),x(2,2),x(2,3),x(2,4)
                 x(3,1),x(3,2),x(3,3),x(3,4)
                  .....................................
                  x(N,1), x(N,2), x(N,3), x(N,4)

                                                   Grenoble, June 19th, 2011.
 Wins by
 Games with simultaneous actions       31

final state !
EXPSPACE-complete
uEncoding ravTuring machine with Ha stape & J osizes oN)
  n o b s e a b l e g a m e s ( a s l u n of n n s n
           as a game with state O(log(N))


                Player 1 chooses the sequence of
                configurations of the tape (N=4):

                 x(0,1),x(0,2),x(0,3),x(0,4) ==> initial state
                 x(1,1),x(1,2),x(1,3),x(1,4)
                 x(2,1),x(2,2),x(2,3),x(2,4)Except if P2 finds an
                 x(3,1),x(3,2),x(3,3),x(3,4) illegal transition!
                  ..................................... ==> P2 can check the
                  x(N,1), x(N,2), x(N,3), x(N,4)
                                          consistency of one 3-uple per line

 Wins by
 Games with simultaneous actions          32    ==> requests space log(N)
                                                      Grenoble, June 19th, 2011.

final state !                                   ( = position of the 3-uple)
EXPSPACE-complete PO games


                  The one-player PO case is
                  EXPSPACE-complete
                  (games in succinct form).




Games with simultaneous actions   33   Grenoble, June 19th, 2011.
2EXPTIME-complete PO games


                  The two-player PO case is
                  2EXP-complete
                  (games in succinct form).




Games with simultaneous actions   34   Grenoble, June 19th, 2011.
Undecidable games                                 (B. Hearn)




                The three-player PO case is
                undecidable. (two players against one,
                not allowed to communicate)




Games with simultaneous actions   35   Grenoble, June 19th, 2011.
Complexity (2P, no random)
                            Unbounded                     Exponential         Polynomial
                              horizon                      horizon             horizon
Full
Observability                   EXP                        EXP                PSPACE

No obs                      EXPSPACE                       NEXP
(X=100%)                     (Hasslum et al, 2000)


Partially                       2EXP                       EXPSPACE
Observable                    (Rintanen 97)
(X=100%)                                                    Reduction to 1P + random
                                                                 (Madani et al)
Simult. Actions           ? EXPSPACE ?                    <<<= EXP        <<<= EXP

No obs                      undecidable

Partiallywith simultaneous actions
   Games                        undecidable          36           Grenoble, June 19th, 2011.
Observable
Another formalization




                                   c




    ==> much more satisfactory
Games with simultaneous actions   37   Grenoble, June 19th, 2011.
Madani et al.




                                   c




    1 player + random = undecidable.
Games with simultaneous actions   38   Grenoble, June 19th, 2011.
Madani et al.


1 player + random = undecidable.

We extend to two players with no random.
Problem: rewrite random nodes, thanks to additional
player.




Games with simultaneous actions   39   Grenoble, June 19th, 2011.
A random node to be rewritten




Games with simultaneous actions   40   Grenoble, June 19th, 2011.
A random node to be rewritten




Games with simultaneous actions   41   Grenoble, June 19th, 2011.
A random node to be rewritten

 Rewritten as follows:
 Player 1 chooses a in [[0,N-1]]
 Player 2 chooses b in [[0,N-1]]
 c=(a+b) modulo N
 Go to tc
Each player can force the game to be equivalent to
the initial one (by playing uniformly)
==> the proba of winning for player 1 (in case of perfect play)
   is the same as for the initial game
==> undecidability!
Games with simultaneous actions   42     Grenoble, June 19th, 2011.
Important remark

 Existence of a strategy for winning with
 proba > 0.5
 ==> also undecidable for the
       restriction to games in which the proba
       is >0.6 or <0.4
 ==> not just a subtle
       precision trouble.

Games with simultaneous actions   43   Grenoble, June 19th, 2011.
Monte-Carlo Tree Search




1. Games (a bit of formalism)

2. Hidden information <==> SA

3. Decidability / complexity

4. Real implementation



 Games with simultaneous actions   44   Grenoble, June 19th, 2011.
Real implementation for
 simultaneous action ?



 MCTS principle

 But with EXP3 in nodes.



Games with simultaneous actions   45   Grenoble, June 19th, 2011.
UCT (Upper Confidence Trees)




Coulom (06)
Chaslot, Saito & Bouzy (06)
Kocsis Szepesvari (06)
UCT
UCT
UCT
UCT
UCT
      Kocsis & Szepesvari (06)
Exploitation ...
Exploitation ...
            SCORE =
               5/7
             + k.sqrt( log(10)/7 )
Exploitation ...
            SCORE =
               5/7
             + k.sqrt( log(10)/7 )
Exploitation ...
            SCORE =
               5/7
             + k.sqrt( log(10)/7 )
... or exploration ?
              SCORE =
                 0/2
               + k.sqrt( log(10)/2 )




                  Replace it
                    with
                 EXP3 / INF
The game of Go is a part of AI.
Computers are ridiculous in front of children.



                                                  Easy situation.
                                                 Termed “semeai”.
                                                 Requires a little bit
                                                   of abstraction.
The game of Go is a part of AI.
Computers are ridiculous in front of children.


                                                 800 cores, 4.7 GHz,
                                                  top level program.

                                                 Plays a stupid move.
The game of Go is a part of AI.
Computers are ridiculous in front of children.



                                                      8 years old;
                                                     little training;
                                                 finds the good move
MoGo(TW): games vs pros
     in the game of Go
First win in 9x9

First draw (a few days ago!) over 6 games

First win over 4 games in 9x9 blind Go

First win with H2.5 in 13x13 Go

First win with H6 in 19x19 Go in 2009 (also done by Zen)

First win with H7 in 19x19 Go vs top pro in 2009 (also
   done by Pachi in 2011)
Monte-Carlo Tree Search




1. Games (a bit of formalism)

2. Hidden information <==> SA

3. Decidability / complexity

4. Real implementation
    ==> Dark Chess endgames
   ==> appli to UrbanRivals


 Games with simultaneous actions   61   Grenoble, June 19th, 2011.
Let's have fun with Urban Rivals (4 cards)
 Each player has
  - four cards (each one can be used once)
  - 12 pilz (each one can be used once)
  - 12 life points

 Each card has:
  - one attack level
  - one damage
  - special effects (forget it for the moment)

 Four turns:
  - P1 attacks P2
  - P2 attacks P1
  - P1 attacks P2
  - P2 attacks P1

 Games with simultaneous actions   Grenoble, June 19th, 2011.   62
Let's have fun with Urban Rivals
First, attacker plays:
- chooses a card
- chooses ( PRIVATELY ) a number of pilz
 Attack level = attack(card) x (1+nb of pilz)

Then, defender plays:
 - chooses a card
 - chooses a number of pilz
 Defense level = attack(card) x (1+nb of pilz)

Result:
 If attack > defense
     Defender looses Power(attacker's card)
  Else
     Attacker looses Power(defender's card)

 Games with simultaneous actions   Grenoble, June 19th, 2011.   63
Let's have fun with Urban Rivals
==> The MCTS-based AI is now at the best human level.

Experimental (only) remarks on EXP3:

- discard strategies with small number of sims = better approx
   of the Nash

- also an improvement by taking into
  account the other bandit

- not yet compared to INF

- virtual simulations (inspired by Kummer)
Games with simultaneous actions   Grenoble, June 19th, 2011.   64
Conclusions
 New stuff:
Undecidability of optimal play for 2-player games with hidden information
Transformation “PO periodically revealed ==> simultaneous action game
with full observation”

Open problems
Complexity: simultaneous action and infinite horizon (in progress)
Complexity with PO: same information for both cases ?
Nash of matrix games with strong dominance
Mathematical validation of variants of Exp3 / Inf
Consistent “realistic” approaches for PO games (H finite)
Conclusions
 New stuff:
Undecidability of optimal play for 2-player games with hidden information
Transformation “PO periodically revealed ==> simultaneous action game
with full observation”

Open problems
Complexity: simultaneous action and infinite horizon (in progress)
Complexity with PO: same information for both cases ?
Nash of matrix games with strong dominance
Mathematical validation of variants of Exp3 / Inf
Consistent “realistic” approaches for PO games (H finite)
Conclusions
 New stuff:
Undecidability of optimal play for 2-player games with hidden information
Transformation “PO periodically revealed ==> simultaneous action game
with full observation”

Open problems
Complexity: simultaneous action and infinite horizon (in progress)
Complexity with PO: same information for both cases ?
Nash of matrix games with strong dominance
Mathematical validation of variants of Exp3 / Inf
Consistent “realistic” approaches for PO games (H finite)
Conclusions
 New stuff:
Undecidability of optimal play for 2-player games with hidden information
Transformation “PO periodically revealed ==> simultaneous action game
with full observation”

Open problems
Complexity: simultaneous action and infinite horizon (in progress)
Complexity with PO: same information for both players ?
Nash of matrix games with strong dominance
Mathematical validation of variants of Exp3 / Inf
Consistent “realistic” approaches for PO games (H finite)
Conclusions
 New stuff:
Undecidability of optimal play for 2-player games with hidden information
Transformation “PO periodically revealed ==> simultaneous action game
with full observation”

Open problems
Complexity: simultaneous action and infinite horizon (in progress)
Complexity with PO: same information for both players ?
Nash of matrix games with strong dominance
Mathematical validation of variants of Exp3 / Inf
Consistent “realistic” approaches for PO games (H finite)
Conclusions
 New stuff:
Undecidability of optimal play for 2-player games with hidden information
Transformation “PO periodically revealed ==> simultaneous action game
with full observation”

Open problems
Complexity: simultaneous action and infinite horizon (in progress)
Complexity with PO: same information for both players ?
Nash of matrix games with strong dominance
Mathematical validation of variants of Exp3 / Inf
Consistent “realistic” approaches for PO games (H finite)
Conclusions
 New stuff:
Undecidability of optimal play for 2-player games with hidden information
Transformation “PO periodically revealed ==> simultaneous action game
with full observation”

Open problems
Complexity: simultaneous action and infinite horizon (in progress)
Complexity with PO: same information for both players ?
Nash of matrix games with strong dominance
Mathematical validation of variants of Exp3 / Inf
Consistent “realistic” approaches for PO games (H finite)
When is MCTS relevant ?

 Robust in front of:
High dimension;
Non-convexity of Bellman values;
Complex models
Delayed reward
Simultaneous actions

More difficult for
High values of H;
Highly unobservable cases (Monte-Carlo, but not
Monte-Carlo Tree Search, see Cazenave et al.)
Lack of reasonable baseline for the MC
When is MCTS relevant ?

                       We should
 Robust in front of: test INF and
High dimension; justify mathematically
Non-convexity of Bellman values;
                   our improvements
Complex models
Delayed reward                      Some         Further
Simultaneous actions            undecidability
                                    results
                                                 work !
More difficult for
High values of H;
Highly unobservable cases (Monte-Carlo, but not
Monte-Carlo Tree Search, see Cazenave et al.)
Lack of reasonable baseline for the MC
When is MCTS relevant ?
                                                  Convenient.
How to apply it:                                 Easy to check.
Implement the transition
                (a function action x state → state )

Design a Monte-Carlo part (a random simulation)
                     (a heuristic in one-player games;
                              difficult if two opponents)

         ==> at this point you can simulate...

Implement UCT (just a bias in the simulator – no real optimizer)

Possibly parallelize (Gelly et al)
PO problems, approx.
 Nash ==> mailing list

Challenge: outperform humans
in “Urban Rivals”
- free game
- fast games (~ 1 minute)
- 11M registered players

Contenu connexe

En vedette

Tools for Discrete Time Control; Application to Power Systems
Tools for Discrete Time Control; Application to Power SystemsTools for Discrete Time Control; Application to Power Systems
Tools for Discrete Time Control; Application to Power SystemsOlivier Teytaud
 
Energy Management (production side)
Energy Management (production side)Energy Management (production side)
Energy Management (production side)Olivier Teytaud
 
Artificial Intelligence and Optimization with Parallelism
Artificial Intelligence and Optimization with ParallelismArtificial Intelligence and Optimization with Parallelism
Artificial Intelligence and Optimization with ParallelismOlivier Teytaud
 
Provocative statements around energy
Provocative statements around energyProvocative statements around energy
Provocative statements around energyOlivier Teytaud
 
The game of Go and energy; two nice computational intelligence problems (with...
The game of Go and energy; two nice computational intelligence problems (with...The game of Go and energy; two nice computational intelligence problems (with...
The game of Go and energy; two nice computational intelligence problems (with...Olivier Teytaud
 
Optimization of power systems - old and new tools
Optimization of power systems - old and new toolsOptimization of power systems - old and new tools
Optimization of power systems - old and new toolsOlivier Teytaud
 
Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...
Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...
Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...Olivier Teytaud
 

En vedette (10)

Tools for Discrete Time Control; Application to Power Systems
Tools for Discrete Time Control; Application to Power SystemsTools for Discrete Time Control; Application to Power Systems
Tools for Discrete Time Control; Application to Power Systems
 
Energy Management (production side)
Energy Management (production side)Energy Management (production side)
Energy Management (production side)
 
Artificial Intelligence and Optimization with Parallelism
Artificial Intelligence and Optimization with ParallelismArtificial Intelligence and Optimization with Parallelism
Artificial Intelligence and Optimization with Parallelism
 
Tutorialmcts
TutorialmctsTutorialmcts
Tutorialmcts
 
Provocative statements around energy
Provocative statements around energyProvocative statements around energy
Provocative statements around energy
 
The game of Go and energy; two nice computational intelligence problems (with...
The game of Go and energy; two nice computational intelligence problems (with...The game of Go and energy; two nice computational intelligence problems (with...
The game of Go and energy; two nice computational intelligence problems (with...
 
Theory of games
Theory of gamesTheory of games
Theory of games
 
Labex2012g
Labex2012gLabex2012g
Labex2012g
 
Optimization of power systems - old and new tools
Optimization of power systems - old and new toolsOptimization of power systems - old and new tools
Optimization of power systems - old and new tools
 
Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...
Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...
Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...
 

Dernier

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 

Dernier (20)

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 

Grenoble

  • 1. Monte-Carlo Tree Search Games with partial observation Olivier.Teytaud@inria.fr + David Auger +Hervé Fournier + Fabien Teytaud + Sébastien Flory + JY Audibert+ S. Bubeck + R. Munos + ... Includes Inria, Cnrs, Univ. Paris-Sud, LRI, CMAP, Taiwan universities, Lille, Paris, Boostr... TAO, Inria-Saclay IDF, Cnrs 8623, Lri, Univ. Paris-Sud, Digiteo Labs, Pascal Network of Excellence. Grenoble June 2011 Games with simultaneous actions 1 Grenoble, June 19th, 2011.
  • 2. Monte-Carlo Tree Search 1. Games (a bit of formalism) 2. Hidden information <==> SA 3. Decidability / complexity 4. Real implementation ==> appli to UrbanRivals Games with simultaneous actions 2 Grenoble, June 19th, 2011.
  • 3. A game is a directed graph Games with simultaneous actions Grenoble, June 19th, 2011. 3
  • 4. A game is a directed graph with actions 1 2 3 Games with simultaneous actions Grenoble, June 19th, 2011. 4
  • 5. A game is a directed graph with actions and players 1 White Black 2 3 White 12 43 White Black Black Black Black Games with simultaneous actions Grenoble, June 19th, 2011. 5
  • 6. A game is a directed graph with actions and players and observations Bob Bear Bee Bee 1 White Black 2 3 White 12 43 White Black Black Black Black Games with simultaneous actions Grenoble, June 19th, 2011. 6
  • 7. A game is a directed graph with actions and players and observations and rewards Bob Bear Bee Bee 1 White Black 2 3 +1 0 White 12 43 Rewards White Black on leafs Black only! Black Black Games with simultaneous actions Grenoble, June 19th, 2011. 7
  • 8. A game is a directed graph +actions +players +observations +rewards +loops Bob Bear Bee Bee 1 White Black 2 3 +1 0 White 12 43 White Black Black Black Black Games with simultaneous actions Grenoble, June 19th, 2011. 8
  • 9. Monte-Carlo Tree Search 1. Games (a bit of formalism) 2. Hidden information <==> SA 3. Decidability / complexity 4. Real implementation Games with simultaneous actions 9 Grenoble, June 19th, 2011.
  • 10. A game is a directed graph +actions +players +observations +rewards +loops Consider games as follows: Bob Bear Bee Bee 1 Black Turn 1 White Turn 2 2 … 3 +1 0 Turn K: all information is revealed. Turn K+1 White 12 Turn K+2 … White 43 Black Turn 2K: all information is revealed Black … … Black TurnBlack all information is revealed NK: Games with simultaneous actions Grenoble, June 19th, 2011. 10
  • 11. A game is a directed graph +actions Rewrite it as follows: +players +observations +rewards +loops Bob Turn 1: player 1 chooses Bee Bear Bee 1 Black (privately) his strategy until turn K White Turn 2: player 2 chooses 2 (privately) his strategy until turn K +1 3 Intermediate turns removed! 0 Turn K: all information is revealed. White 12 Turn K+1 Turn White K+2 43 Black … Black Turn 2K: all information is revealed … Black … Black Games with simultaneous actions all information 2011. revealed Turn NK: Grenoble, June 19th, is 11
  • 12. A game is a directed graph +actions Rewrite it as follows: +players +observations +rewards +loops Bob Turn 1: player 1 chooses Bee Bear Bee 1 Black (privately) his strategy until turn K White Equivalent Turn 2: player 2 chooses to 2 (privately) his strategy until turn K +1 simultaneous 3 actions Intermediate turns removed! 0 Turn K: all information is revealed. White 12 Turn K+1 Turn White K+2 43 Black … Black Turn 2K: all information is revealed … Black … Black Games with simultaneous actions all information 2011. revealed Turn NK: Grenoble, June 19th, is 12
  • 13. A game is a directed graph +actions +players +observations +rewards +loops Bob Bear Bee Bee 1 White Black Now it's a game with simultaneous information 2 and no hidden information. 3 +1 0 Simultaneous actions White 12 White = (almost) Black 43 short term hidden information. Black Black Black Games with simultaneous actions Grenoble, June 19th, 2011. 13
  • 14. Monte-Carlo Tree Search 1. Games (a bit of formalism) 2. Hidden information <== SA (and sometimes <==>) 3. Decidability / complexity 4. Real implementation Games with simultaneous actions 14 Grenoble, June 19th, 2011.
  • 15. Compact representation ? Succinct representation (in short, without tedious details): - graph of size N represented in size O(log N) ; - usually not better in terms of complexity; - keep this in mind when considering complexity. Games with simultaneous actions 15 Grenoble, June 19th, 2011.
  • 16. Complexity question ? Instance = position. Question = Is there a strategy which wins whatever are the decisions of the opponent ? = natural question if full observability. Answering this question then allows perfect play. Games with simultaneous actions 16 Grenoble, June 19th, 2011.
  • 17. Complexity question ? (UD) Instance = position. Question = Is there a strategy which wins whatever are the decisions of the opponent ? = natural question if full observability. Answering this question then allows perfect play. Games with simultaneous actions 17 Grenoble, June 19th, 2011.
  • 18. Complexity question for matrix game ? 100000 Good for column-player ! 010000 001000 ==> but no sure win. 000100 ==> the “UD” question is not 000010 relevant here! 000001 Games with simultaneous actions 18 Grenoble, June 19th, 2011.
  • 19. Complexity question for Joint work with phantom-games ? F. Teytaud This is phantom-go. Good for black: wins with proba 1-1/(8!) Here, there's no move which ensures a win. But some moves are much better than others! Games with simultaneous actions 19 Grenoble, June 19th, 2011.
  • 20. It becomes complicated Isn't it possible to consider a better question ? Games with simultaneous actions Grenoble, June 19th, 2011. 20
  • 21. Complexity (2P, no random) X= proba(winning) that we look for Unbounded Exponential Polynomial horizon horizon horizon Full Observability EXP EXP PSPACE No obs EXPSPACE NEXP (X=100%) (Hasslum et al, 2000) Partially 2EXP EXPSPACE Observable (Rintanen) (Mundhenk) (X=100%) Simult. Actions ? EXPSPACE ? <<<= EXP <<<= EXP No obs undecidable Teytaud, Auger, IJFCS Partiallywith simultaneous actions Games undecidable 21 (accepted) Grenoble, June 19th, 2011. Observable
  • 22. State of the art EXPTIME-complete in the general Grenoble, June 19th, 2011. fully-observable case Games with simultaneous actions 22
  • 23. EXPTIME-complete fully observable games - Chess (for some nxn generalization) - Go (with no superko) - Draughts (international or english) - Chinese checkers - Shogi Games with simultaneous actions 23 Grenoble, June 19th, 2011.
  • 24. PSPACE-complete fully observable games - Amazons - Hex - Go-moku - Connect-6 - Qubic - Reversi - Tic-Tac-Toe Many games with filling of each cell once and only once Games with simultaneous actions 24 Grenoble, June 19th, 2011.
  • 25. EXPSPACE-complete unobservable games (Hasslun & Jonnsson) The two-player unobservable case is EXPSPACE-complete (games in succinct form). Games with simultaneous actions 25 Grenoble, June 19th, 2011.
  • 26. E X P S P Atwo-player unobservable case is The C E - c o m p l e t e unobservable games (Hasslun & Jonnsson) EXPSPACE-complete (games in succinct form). PROOF: (I) First note that strategies are just sequences of actions (no observability!) + UD=>opponent can see the state! (II) It is in EXPSPACE=NEXPSPACE, because of the following algorithm: (a) Non-deterministically choose the sequence of Actions (b) Check the result against all possible strategies (III) We have to check the hardness only. Games with simultaneous actions 26 Grenoble, June 19th, 2011.
  • 27. E X P S P Atwo-player unobservable case is The C E - c o m p l e t e unobservable games (Hasslun & Jonnsson) EXPSPACE-complete (games in succinct form). PROOF: (I) First note that strategies are just sequences of actions (no observability!) + UD=>opponent can see the state! (II) It is in EXPSPACE=NEXPSPACE, because of the following algorithm: (a) Non-deterministically choose the sequence of actions (b) Check the result against all possible strategies (III) We have to check the hardness only. Games with simultaneous actions 27 Grenoble, June 19th, 2011.
  • 28. E X P S P Atwo-player unobservable case is The C E - c o m p l e t e unobservable games (Hasslun & Jonnsson) EXPSPACE-complete (games in succinct form). PROOF: (I) First note that strategies are just sequences of actions (no observability!) (II) It is in EXPSPACE=NEXPSPACE, because of the following algorithm: (a) Non-deterministically choose the sequence of actions (b) Check the result against all possible strategies (III) We have to check the hardness only. Games with simultaneous actions 28 Grenoble, June 19th, 2011.
  • 29. E X P S P Atwo-player unobservable case is The C E - c o m p l e t e unobservable games (Hasslun & Jonnsson) EXPSPACE-complete (games in succinct form). PROOF of the hardness: Reduction to: is my TM with exponential tape going to halt ? Consider a TM with tape of size N=2^n. We must find a game - with size n ( n= log2(N) ) - such that the first player has a winning strategy iff the TM halts. Games with simultaneous actions 29 Grenoble, June 19th, 2011.
  • 30. EXPSPACE-complete uEncoding ravTuring machine with Ha stape & J osizes oN) n o b s e a b l e g a m e s ( a s l u n of n n s n as a game with state O(log(N)) Player 1 chooses the sequence of configurations of the tape (N=4): x(0,1),x(0,2),x(0,3),x(0,4) ==> initial state x(1,1),x(1,2),x(1,3),x(1,4) x(2,1),x(2,2),x(2,3),x(2,4) x(3,1),x(3,2),x(3,3),x(3,4) ..................................... Games with simultaneous actions 30 Grenoble, June 19th, 2011.
  • 31. EXPSPACE-complete uEncoding ravTuring machine with Ha stape & J osizes oN) n o b s e a b l e g a m e s ( a s l u n of n n s n as a game with state O(log(N)) Player 1 chooses the sequence of configurations of the tape (N=4): x(0,1),x(0,2),x(0,3),x(0,4) ==> initial state x(1,1),x(1,2),x(1,3),x(1,4) x(2,1),x(2,2),x(2,3),x(2,4) x(3,1),x(3,2),x(3,3),x(3,4) ..................................... x(N,1), x(N,2), x(N,3), x(N,4) Grenoble, June 19th, 2011. Wins by Games with simultaneous actions 31 final state !
  • 32. EXPSPACE-complete uEncoding ravTuring machine with Ha stape & J osizes oN) n o b s e a b l e g a m e s ( a s l u n of n n s n as a game with state O(log(N)) Player 1 chooses the sequence of configurations of the tape (N=4): x(0,1),x(0,2),x(0,3),x(0,4) ==> initial state x(1,1),x(1,2),x(1,3),x(1,4) x(2,1),x(2,2),x(2,3),x(2,4)Except if P2 finds an x(3,1),x(3,2),x(3,3),x(3,4) illegal transition! ..................................... ==> P2 can check the x(N,1), x(N,2), x(N,3), x(N,4) consistency of one 3-uple per line Wins by Games with simultaneous actions 32 ==> requests space log(N) Grenoble, June 19th, 2011. final state ! ( = position of the 3-uple)
  • 33. EXPSPACE-complete PO games The one-player PO case is EXPSPACE-complete (games in succinct form). Games with simultaneous actions 33 Grenoble, June 19th, 2011.
  • 34. 2EXPTIME-complete PO games The two-player PO case is 2EXP-complete (games in succinct form). Games with simultaneous actions 34 Grenoble, June 19th, 2011.
  • 35. Undecidable games (B. Hearn) The three-player PO case is undecidable. (two players against one, not allowed to communicate) Games with simultaneous actions 35 Grenoble, June 19th, 2011.
  • 36. Complexity (2P, no random) Unbounded Exponential Polynomial horizon horizon horizon Full Observability EXP EXP PSPACE No obs EXPSPACE NEXP (X=100%) (Hasslum et al, 2000) Partially 2EXP EXPSPACE Observable (Rintanen 97) (X=100%) Reduction to 1P + random (Madani et al) Simult. Actions ? EXPSPACE ? <<<= EXP <<<= EXP No obs undecidable Partiallywith simultaneous actions Games undecidable 36 Grenoble, June 19th, 2011. Observable
  • 37. Another formalization c ==> much more satisfactory Games with simultaneous actions 37 Grenoble, June 19th, 2011.
  • 38. Madani et al. c 1 player + random = undecidable. Games with simultaneous actions 38 Grenoble, June 19th, 2011.
  • 39. Madani et al. 1 player + random = undecidable. We extend to two players with no random. Problem: rewrite random nodes, thanks to additional player. Games with simultaneous actions 39 Grenoble, June 19th, 2011.
  • 40. A random node to be rewritten Games with simultaneous actions 40 Grenoble, June 19th, 2011.
  • 41. A random node to be rewritten Games with simultaneous actions 41 Grenoble, June 19th, 2011.
  • 42. A random node to be rewritten Rewritten as follows: Player 1 chooses a in [[0,N-1]] Player 2 chooses b in [[0,N-1]] c=(a+b) modulo N Go to tc Each player can force the game to be equivalent to the initial one (by playing uniformly) ==> the proba of winning for player 1 (in case of perfect play) is the same as for the initial game ==> undecidability! Games with simultaneous actions 42 Grenoble, June 19th, 2011.
  • 43. Important remark Existence of a strategy for winning with proba > 0.5 ==> also undecidable for the restriction to games in which the proba is >0.6 or <0.4 ==> not just a subtle precision trouble. Games with simultaneous actions 43 Grenoble, June 19th, 2011.
  • 44. Monte-Carlo Tree Search 1. Games (a bit of formalism) 2. Hidden information <==> SA 3. Decidability / complexity 4. Real implementation Games with simultaneous actions 44 Grenoble, June 19th, 2011.
  • 45. Real implementation for simultaneous action ? MCTS principle But with EXP3 in nodes. Games with simultaneous actions 45 Grenoble, June 19th, 2011.
  • 46. UCT (Upper Confidence Trees) Coulom (06) Chaslot, Saito & Bouzy (06) Kocsis Szepesvari (06)
  • 47. UCT
  • 48. UCT
  • 49. UCT
  • 50. UCT
  • 51. UCT Kocsis & Szepesvari (06)
  • 53. Exploitation ... SCORE = 5/7 + k.sqrt( log(10)/7 )
  • 54. Exploitation ... SCORE = 5/7 + k.sqrt( log(10)/7 )
  • 55. Exploitation ... SCORE = 5/7 + k.sqrt( log(10)/7 )
  • 56. ... or exploration ? SCORE = 0/2 + k.sqrt( log(10)/2 ) Replace it with EXP3 / INF
  • 57. The game of Go is a part of AI. Computers are ridiculous in front of children. Easy situation. Termed “semeai”. Requires a little bit of abstraction.
  • 58. The game of Go is a part of AI. Computers are ridiculous in front of children. 800 cores, 4.7 GHz, top level program. Plays a stupid move.
  • 59. The game of Go is a part of AI. Computers are ridiculous in front of children. 8 years old; little training; finds the good move
  • 60. MoGo(TW): games vs pros in the game of Go First win in 9x9 First draw (a few days ago!) over 6 games First win over 4 games in 9x9 blind Go First win with H2.5 in 13x13 Go First win with H6 in 19x19 Go in 2009 (also done by Zen) First win with H7 in 19x19 Go vs top pro in 2009 (also done by Pachi in 2011)
  • 61. Monte-Carlo Tree Search 1. Games (a bit of formalism) 2. Hidden information <==> SA 3. Decidability / complexity 4. Real implementation ==> Dark Chess endgames ==> appli to UrbanRivals Games with simultaneous actions 61 Grenoble, June 19th, 2011.
  • 62. Let's have fun with Urban Rivals (4 cards) Each player has - four cards (each one can be used once) - 12 pilz (each one can be used once) - 12 life points Each card has: - one attack level - one damage - special effects (forget it for the moment) Four turns: - P1 attacks P2 - P2 attacks P1 - P1 attacks P2 - P2 attacks P1 Games with simultaneous actions Grenoble, June 19th, 2011. 62
  • 63. Let's have fun with Urban Rivals First, attacker plays: - chooses a card - chooses ( PRIVATELY ) a number of pilz Attack level = attack(card) x (1+nb of pilz) Then, defender plays: - chooses a card - chooses a number of pilz Defense level = attack(card) x (1+nb of pilz) Result: If attack > defense Defender looses Power(attacker's card) Else Attacker looses Power(defender's card) Games with simultaneous actions Grenoble, June 19th, 2011. 63
  • 64. Let's have fun with Urban Rivals ==> The MCTS-based AI is now at the best human level. Experimental (only) remarks on EXP3: - discard strategies with small number of sims = better approx of the Nash - also an improvement by taking into account the other bandit - not yet compared to INF - virtual simulations (inspired by Kummer) Games with simultaneous actions Grenoble, June 19th, 2011. 64
  • 65. Conclusions New stuff: Undecidability of optimal play for 2-player games with hidden information Transformation “PO periodically revealed ==> simultaneous action game with full observation” Open problems Complexity: simultaneous action and infinite horizon (in progress) Complexity with PO: same information for both cases ? Nash of matrix games with strong dominance Mathematical validation of variants of Exp3 / Inf Consistent “realistic” approaches for PO games (H finite)
  • 66. Conclusions New stuff: Undecidability of optimal play for 2-player games with hidden information Transformation “PO periodically revealed ==> simultaneous action game with full observation” Open problems Complexity: simultaneous action and infinite horizon (in progress) Complexity with PO: same information for both cases ? Nash of matrix games with strong dominance Mathematical validation of variants of Exp3 / Inf Consistent “realistic” approaches for PO games (H finite)
  • 67. Conclusions New stuff: Undecidability of optimal play for 2-player games with hidden information Transformation “PO periodically revealed ==> simultaneous action game with full observation” Open problems Complexity: simultaneous action and infinite horizon (in progress) Complexity with PO: same information for both cases ? Nash of matrix games with strong dominance Mathematical validation of variants of Exp3 / Inf Consistent “realistic” approaches for PO games (H finite)
  • 68. Conclusions New stuff: Undecidability of optimal play for 2-player games with hidden information Transformation “PO periodically revealed ==> simultaneous action game with full observation” Open problems Complexity: simultaneous action and infinite horizon (in progress) Complexity with PO: same information for both players ? Nash of matrix games with strong dominance Mathematical validation of variants of Exp3 / Inf Consistent “realistic” approaches for PO games (H finite)
  • 69. Conclusions New stuff: Undecidability of optimal play for 2-player games with hidden information Transformation “PO periodically revealed ==> simultaneous action game with full observation” Open problems Complexity: simultaneous action and infinite horizon (in progress) Complexity with PO: same information for both players ? Nash of matrix games with strong dominance Mathematical validation of variants of Exp3 / Inf Consistent “realistic” approaches for PO games (H finite)
  • 70. Conclusions New stuff: Undecidability of optimal play for 2-player games with hidden information Transformation “PO periodically revealed ==> simultaneous action game with full observation” Open problems Complexity: simultaneous action and infinite horizon (in progress) Complexity with PO: same information for both players ? Nash of matrix games with strong dominance Mathematical validation of variants of Exp3 / Inf Consistent “realistic” approaches for PO games (H finite)
  • 71. Conclusions New stuff: Undecidability of optimal play for 2-player games with hidden information Transformation “PO periodically revealed ==> simultaneous action game with full observation” Open problems Complexity: simultaneous action and infinite horizon (in progress) Complexity with PO: same information for both players ? Nash of matrix games with strong dominance Mathematical validation of variants of Exp3 / Inf Consistent “realistic” approaches for PO games (H finite)
  • 72. When is MCTS relevant ? Robust in front of: High dimension; Non-convexity of Bellman values; Complex models Delayed reward Simultaneous actions More difficult for High values of H; Highly unobservable cases (Monte-Carlo, but not Monte-Carlo Tree Search, see Cazenave et al.) Lack of reasonable baseline for the MC
  • 73. When is MCTS relevant ? We should Robust in front of: test INF and High dimension; justify mathematically Non-convexity of Bellman values; our improvements Complex models Delayed reward Some Further Simultaneous actions undecidability results work ! More difficult for High values of H; Highly unobservable cases (Monte-Carlo, but not Monte-Carlo Tree Search, see Cazenave et al.) Lack of reasonable baseline for the MC
  • 74. When is MCTS relevant ? Convenient. How to apply it: Easy to check. Implement the transition (a function action x state → state ) Design a Monte-Carlo part (a random simulation) (a heuristic in one-player games; difficult if two opponents) ==> at this point you can simulate... Implement UCT (just a bias in the simulator – no real optimizer) Possibly parallelize (Gelly et al)
  • 75. PO problems, approx. Nash ==> mailing list Challenge: outperform humans in “Urban Rivals” - free game - fast games (~ 1 minute) - 11M registered players