SlideShare une entreprise Scribd logo
1  sur  74
Télécharger pour lire hors ligne
Introduction
        Monte Carlo Go
             My Work




Machine Learning applied to Go

         Dmitry Kamenetsky

         Supervisor: Nic Schraudolph


                March 2007




     Dmitry Kamenetsky   Machine Learning applied to Go
Introduction
                      Monte Carlo Go
                           My Work




1   Introduction

2   Monte Carlo Go

3   My Work




                   Dmitry Kamenetsky   Machine Learning applied to Go
Introduction
   Monte Carlo Go
        My Work



          What is Go?


                         Two-player deterministic
                         board game

                         Originated in ancient China.
                         Today very popular in China,
                         Japan and Korea

                         19x19 grid, also 9x9 grid for
                         beginners

                         Simple rules, but very
                         complex strategy


Dmitry Kamenetsky   Machine Learning applied to Go
Introduction
                    Monte Carlo Go
                         My Work



                        Why study Go?




If there are sentient beings on other planets, then they play Go
– Emanuel Lasker, former chess world champion

Go is one of the grand challenges of AI
– Ron Rivest, professor of Computer Science at MIT

Go is like life and life is like Go
- Chinese proverb




                 Dmitry Kamenetsky    Machine Learning applied to Go
Introduction
                    Monte Carlo Go
                         My Work



                        Why study Go?




If there are sentient beings on other planets, then they play Go
– Emanuel Lasker, former chess world champion

Go is one of the grand challenges of AI
– Ron Rivest, professor of Computer Science at MIT

Go is like life and life is like Go
- Chinese proverb




                 Dmitry Kamenetsky    Machine Learning applied to Go
Introduction
                    Monte Carlo Go
                         My Work



                        Why study Go?




If there are sentient beings on other planets, then they play Go
– Emanuel Lasker, former chess world champion

Go is one of the grand challenges of AI
– Ron Rivest, professor of Computer Science at MIT

Go is like life and life is like Go
- Chinese proverb




                 Dmitry Kamenetsky    Machine Learning applied to Go
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



                        Monte Carlo


Often used for problems that have no closed solution, e.g.
computational physics

Sample instances from some large population

Use samples to approximate some common property of the
population

In search, select the next state based on some fixed
distribution (usually uniform)

Recently, have been very successful in Go, causing a
mini-revolution

              Dmitry Kamenetsky   Machine Learning applied to Go
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



                        Monte Carlo


Often used for problems that have no closed solution, e.g.
computational physics

Sample instances from some large population

Use samples to approximate some common property of the
population

In search, select the next state based on some fixed
distribution (usually uniform)

Recently, have been very successful in Go, causing a
mini-revolution

              Dmitry Kamenetsky   Machine Learning applied to Go
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



                        Monte Carlo


Often used for problems that have no closed solution, e.g.
computational physics

Sample instances from some large population

Use samples to approximate some common property of the
population

In search, select the next state based on some fixed
distribution (usually uniform)

Recently, have been very successful in Go, causing a
mini-revolution

              Dmitry Kamenetsky   Machine Learning applied to Go
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



                        Monte Carlo


Often used for problems that have no closed solution, e.g.
computational physics

Sample instances from some large population

Use samples to approximate some common property of the
population

In search, select the next state based on some fixed
distribution (usually uniform)

Recently, have been very successful in Go, causing a
mini-revolution

              Dmitry Kamenetsky   Machine Learning applied to Go
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



                        Monte Carlo


Often used for problems that have no closed solution, e.g.
computational physics

Sample instances from some large population

Use samples to approximate some common property of the
population

In search, select the next state based on some fixed
distribution (usually uniform)

Recently, have been very successful in Go, causing a
mini-revolution

              Dmitry Kamenetsky   Machine Learning applied to Go
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



               K-armed bandit problem

Slot machine with K arms. Each arm provides a reward
based on some unknown, but fixed distribution

Goal: to choose arms to play such that the total reward is
maximized

How should the gambler play at any given moment?
    Choose arm with highest average reward seen so far
    (exploitation)

    Choose a sub-optimal arm in the hope that it it will lead to
    a greater reward (exploration)

    Neither - need a combination of exploitation and
    exploration
              Dmitry Kamenetsky   Machine Learning applied to Go
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



               K-armed bandit problem

Slot machine with K arms. Each arm provides a reward
based on some unknown, but fixed distribution

Goal: to choose arms to play such that the total reward is
maximized

How should the gambler play at any given moment?
    Choose arm with highest average reward seen so far
    (exploitation)

    Choose a sub-optimal arm in the hope that it it will lead to
    a greater reward (exploration)

    Neither - need a combination of exploitation and
    exploration
              Dmitry Kamenetsky   Machine Learning applied to Go
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



               K-armed bandit problem

Slot machine with K arms. Each arm provides a reward
based on some unknown, but fixed distribution

Goal: to choose arms to play such that the total reward is
maximized

How should the gambler play at any given moment?
    Choose arm with highest average reward seen so far
    (exploitation)

    Choose a sub-optimal arm in the hope that it it will lead to
    a greater reward (exploration)

    Neither - need a combination of exploitation and
    exploration
              Dmitry Kamenetsky   Machine Learning applied to Go
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



               K-armed bandit problem

Slot machine with K arms. Each arm provides a reward
based on some unknown, but fixed distribution

Goal: to choose arms to play such that the total reward is
maximized

How should the gambler play at any given moment?
    Choose arm with highest average reward seen so far
    (exploitation)

    Choose a sub-optimal arm in the hope that it it will lead to
    a greater reward (exploration)

    Neither - need a combination of exploitation and
    exploration
              Dmitry Kamenetsky   Machine Learning applied to Go
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



               K-armed bandit problem

Slot machine with K arms. Each arm provides a reward
based on some unknown, but fixed distribution

Goal: to choose arms to play such that the total reward is
maximized

How should the gambler play at any given moment?
    Choose arm with highest average reward seen so far
    (exploitation)

    Choose a sub-optimal arm in the hope that it it will lead to
    a greater reward (exploration)

    Neither - need a combination of exploitation and
    exploration
              Dmitry Kamenetsky   Machine Learning applied to Go
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



               K-armed bandit problem

Slot machine with K arms. Each arm provides a reward
based on some unknown, but fixed distribution

Goal: to choose arms to play such that the total reward is
maximized

How should the gambler play at any given moment?
    Choose arm with highest average reward seen so far
    (exploitation)

    Choose a sub-optimal arm in the hope that it it will lead to
    a greater reward (exploration)

    Neither - need a combination of exploitation and
    exploration
              Dmitry Kamenetsky   Machine Learning applied to Go
Introduction   Monte Carlo
                  Monte Carlo Go   Bandit Problem
                       My Work     MoGo



              Upper Confidence Bounds


Successive plays of arm i give rewards Xi,1 , Xi,2 , ... which
are i.i.d. with unknown E(Xi ) = µi

Let Ti (n) be the number of times arm i has been played
during the first n plays of machine

Upper Confidence Bounds - UCB (Auer et al. 2002)

    Initialization: Play each arm once
                                                       2 log n
    Loop: Play arm i that maximizes µi +                Ti (n)


Proven to achieve optimal regret

               Dmitry Kamenetsky   Machine Learning applied to Go
Introduction   Monte Carlo
                  Monte Carlo Go   Bandit Problem
                       My Work     MoGo



              Upper Confidence Bounds


Successive plays of arm i give rewards Xi,1 , Xi,2 , ... which
are i.i.d. with unknown E(Xi ) = µi

Let Ti (n) be the number of times arm i has been played
during the first n plays of machine

Upper Confidence Bounds - UCB (Auer et al. 2002)

    Initialization: Play each arm once
                                                       2 log n
    Loop: Play arm i that maximizes µi +                Ti (n)


Proven to achieve optimal regret

               Dmitry Kamenetsky   Machine Learning applied to Go
Introduction   Monte Carlo
                  Monte Carlo Go   Bandit Problem
                       My Work     MoGo



              Upper Confidence Bounds


Successive plays of arm i give rewards Xi,1 , Xi,2 , ... which
are i.i.d. with unknown E(Xi ) = µi

Let Ti (n) be the number of times arm i has been played
during the first n plays of machine

Upper Confidence Bounds - UCB (Auer et al. 2002)

    Initialization: Play each arm once
                                                       2 log n
    Loop: Play arm i that maximizes µi +                Ti (n)


Proven to achieve optimal regret

               Dmitry Kamenetsky   Machine Learning applied to Go
Introduction   Monte Carlo
                  Monte Carlo Go   Bandit Problem
                       My Work     MoGo



              Upper Confidence Bounds


Successive plays of arm i give rewards Xi,1 , Xi,2 , ... which
are i.i.d. with unknown E(Xi ) = µi

Let Ti (n) be the number of times arm i has been played
during the first n plays of machine

Upper Confidence Bounds - UCB (Auer et al. 2002)

    Initialization: Play each arm once
                                                       2 log n
    Loop: Play arm i that maximizes µi +                Ti (n)


Proven to achieve optimal regret

               Dmitry Kamenetsky   Machine Learning applied to Go
Introduction   Monte Carlo
                  Monte Carlo Go   Bandit Problem
                       My Work     MoGo



              Upper Confidence Bounds


Successive plays of arm i give rewards Xi,1 , Xi,2 , ... which
are i.i.d. with unknown E(Xi ) = µi

Let Ti (n) be the number of times arm i has been played
during the first n plays of machine

Upper Confidence Bounds - UCB (Auer et al. 2002)

    Initialization: Play each arm once
                                                       2 log n
    Loop: Play arm i that maximizes µi +                Ti (n)


Proven to achieve optimal regret

               Dmitry Kamenetsky   Machine Learning applied to Go
Introduction   Monte Carlo
                  Monte Carlo Go   Bandit Problem
                       My Work     MoGo



              Upper Confidence Bounds


Successive plays of arm i give rewards Xi,1 , Xi,2 , ... which
are i.i.d. with unknown E(Xi ) = µi

Let Ti (n) be the number of times arm i has been played
during the first n plays of machine

Upper Confidence Bounds - UCB (Auer et al. 2002)

    Initialization: Play each arm once
                                                       2 log n
    Loop: Play arm i that maximizes µi +                Ti (n)


Proven to achieve optimal regret

               Dmitry Kamenetsky   Machine Learning applied to Go
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



                             UCT


UCB for minimax tree search (Kocsis and Szepesvari 2006)

Start at the current board position p

For i = 1 to 100,000 (number of simulations)
    p ←p
    until stopping criterion is reached (e.g. end of game)
         p ← p + move given by UCB
         create node p
    Evaluate leaf: value ← winner of p
    Update all the visited nodes with value

At p play move with highest winning percentage

              Dmitry Kamenetsky   Machine Learning applied to Go
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



                             UCT


UCB for minimax tree search (Kocsis and Szepesvari 2006)

Start at the current board position p

For i = 1 to 100,000 (number of simulations)
    p ←p
    until stopping criterion is reached (e.g. end of game)
         p ← p + move given by UCB
         create node p
    Evaluate leaf: value ← winner of p
    Update all the visited nodes with value

At p play move with highest winning percentage

              Dmitry Kamenetsky   Machine Learning applied to Go
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



                             UCT


UCB for minimax tree search (Kocsis and Szepesvari 2006)

Start at the current board position p

For i = 1 to 100,000 (number of simulations)
    p ←p
    until stopping criterion is reached (e.g. end of game)
         p ← p + move given by UCB
         create node p
    Evaluate leaf: value ← winner of p
    Update all the visited nodes with value

At p play move with highest winning percentage

              Dmitry Kamenetsky   Machine Learning applied to Go
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



                             UCT


UCB for minimax tree search (Kocsis and Szepesvari 2006)

Start at the current board position p

For i = 1 to 100,000 (number of simulations)
    p ←p
    until stopping criterion is reached (e.g. end of game)
         p ← p + move given by UCB
         create node p
    Evaluate leaf: value ← winner of p
    Update all the visited nodes with value

At p play move with highest winning percentage

              Dmitry Kamenetsky   Machine Learning applied to Go
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



                             UCT


UCB for minimax tree search (Kocsis and Szepesvari 2006)

Start at the current board position p

For i = 1 to 100,000 (number of simulations)
    p ←p
    until stopping criterion is reached (e.g. end of game)
         p ← p + move given by UCB
         create node p
    Evaluate leaf: value ← winner of p
    Update all the visited nodes with value

At p play move with highest winning percentage

              Dmitry Kamenetsky   Machine Learning applied to Go
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



                             UCT


UCB for minimax tree search (Kocsis and Szepesvari 2006)

Start at the current board position p

For i = 1 to 100,000 (number of simulations)
    p ←p
    until stopping criterion is reached (e.g. end of game)
         p ← p + move given by UCB
         create node p
    Evaluate leaf: value ← winner of p
    Update all the visited nodes with value

At p play move with highest winning percentage

              Dmitry Kamenetsky   Machine Learning applied to Go
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



                             UCT


UCB for minimax tree search (Kocsis and Szepesvari 2006)

Start at the current board position p

For i = 1 to 100,000 (number of simulations)
    p ←p
    until stopping criterion is reached (e.g. end of game)
         p ← p + move given by UCB
         create node p
    Evaluate leaf: value ← winner of p
    Update all the visited nodes with value

At p play move with highest winning percentage

              Dmitry Kamenetsky   Machine Learning applied to Go
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



                             UCT


UCB for minimax tree search (Kocsis and Szepesvari 2006)

Start at the current board position p

For i = 1 to 100,000 (number of simulations)
    p ←p
    until stopping criterion is reached (e.g. end of game)
         p ← p + move given by UCB
         create node p
    Evaluate leaf: value ← winner of p
    Update all the visited nodes with value

At p play move with highest winning percentage

              Dmitry Kamenetsky   Machine Learning applied to Go
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



                             UCT


UCB for minimax tree search (Kocsis and Szepesvari 2006)

Start at the current board position p

For i = 1 to 100,000 (number of simulations)
    p ←p
    until stopping criterion is reached (e.g. end of game)
         p ← p + move given by UCB
         create node p
    Evaluate leaf: value ← winner of p
    Update all the visited nodes with value

At p play move with highest winning percentage

              Dmitry Kamenetsky   Machine Learning applied to Go
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



                             UCT


UCB for minimax tree search (Kocsis and Szepesvari 2006)

Start at the current board position p

For i = 1 to 100,000 (number of simulations)
    p ←p
    until stopping criterion is reached (e.g. end of game)
         p ← p + move given by UCB
         create node p
    Evaluate leaf: value ← winner of p
    Update all the visited nodes with value

At p play move with highest winning percentage

              Dmitry Kamenetsky   Machine Learning applied to Go
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



                            MoGo

First Go program to use UCT (Gelly et. al 2006)

Store nodes and their statistics in a tree data structure

Stopping criterion is a node that is not yet in the tree

Leaf node evaluation:

    Playout position randomly until no moves remain. Final
    position is trivial to score

    Enhanced through the use of patterns and playing near the
    previous move

Pruning techniques, smart ordering of unexplored moves
              Dmitry Kamenetsky   Machine Learning applied to Go
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



                            MoGo

First Go program to use UCT (Gelly et. al 2006)

Store nodes and their statistics in a tree data structure

Stopping criterion is a node that is not yet in the tree

Leaf node evaluation:

    Playout position randomly until no moves remain. Final
    position is trivial to score

    Enhanced through the use of patterns and playing near the
    previous move

Pruning techniques, smart ordering of unexplored moves
              Dmitry Kamenetsky   Machine Learning applied to Go
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



                            MoGo

First Go program to use UCT (Gelly et. al 2006)

Store nodes and their statistics in a tree data structure

Stopping criterion is a node that is not yet in the tree

Leaf node evaluation:

    Playout position randomly until no moves remain. Final
    position is trivial to score

    Enhanced through the use of patterns and playing near the
    previous move

Pruning techniques, smart ordering of unexplored moves
              Dmitry Kamenetsky   Machine Learning applied to Go
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



                            MoGo

First Go program to use UCT (Gelly et. al 2006)

Store nodes and their statistics in a tree data structure

Stopping criterion is a node that is not yet in the tree

Leaf node evaluation:

    Playout position randomly until no moves remain. Final
    position is trivial to score

    Enhanced through the use of patterns and playing near the
    previous move

Pruning techniques, smart ordering of unexplored moves
              Dmitry Kamenetsky   Machine Learning applied to Go
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



                            MoGo

First Go program to use UCT (Gelly et. al 2006)

Store nodes and their statistics in a tree data structure

Stopping criterion is a node that is not yet in the tree

Leaf node evaluation:

    Playout position randomly until no moves remain. Final
    position is trivial to score

    Enhanced through the use of patterns and playing near the
    previous move

Pruning techniques, smart ordering of unexplored moves
              Dmitry Kamenetsky   Machine Learning applied to Go
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



                            MoGo

First Go program to use UCT (Gelly et. al 2006)

Store nodes and their statistics in a tree data structure

Stopping criterion is a node that is not yet in the tree

Leaf node evaluation:

    Playout position randomly until no moves remain. Final
    position is trivial to score

    Enhanced through the use of patterns and playing near the
    previous move

Pruning techniques, smart ordering of unexplored moves
              Dmitry Kamenetsky   Machine Learning applied to Go
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



                            MoGo

First Go program to use UCT (Gelly et. al 2006)

Store nodes and their statistics in a tree data structure

Stopping criterion is a node that is not yet in the tree

Leaf node evaluation:

    Playout position randomly until no moves remain. Final
    position is trivial to score

    Enhanced through the use of patterns and playing near the
    previous move

Pruning techniques, smart ordering of unexplored moves
              Dmitry Kamenetsky   Machine Learning applied to Go
Introduction   Monte Carlo
                Monte Carlo Go   Bandit Problem
                     My Work     MoGo



                    MoGo’s success




Ranked first on 9x9 Computer Go Server since August
2006

Won two most recent tournaments on 9x9 and 13x13

Expected to reach the level of human professional on 9x9
board




             Dmitry Kamenetsky   Machine Learning applied to Go
Introduction   Monte Carlo
                Monte Carlo Go   Bandit Problem
                     My Work     MoGo



                    MoGo’s success




Ranked first on 9x9 Computer Go Server since August
2006

Won two most recent tournaments on 9x9 and 13x13

Expected to reach the level of human professional on 9x9
board




             Dmitry Kamenetsky   Machine Learning applied to Go
Introduction   Monte Carlo
                Monte Carlo Go   Bandit Problem
                     My Work     MoGo



                    MoGo’s success




Ranked first on 9x9 Computer Go Server since August
2006

Won two most recent tournaments on 9x9 and 13x13

Expected to reach the level of human professional on 9x9
board




             Dmitry Kamenetsky   Machine Learning applied to Go
Beta Distributions
                   Introduction
                                  Cooperative Scorer
                 Monte Carlo Go
                                  Zobrist Hash
                      My Work
                                  Conditional Random Fields


                     Replacing UCB

UCT is ad-hoc. Lack of theoretical analysis, because
random variables (rewards Xi ) are not i.i.d.

Instead, use Beta distributions to model random variables

Beta distribution is a conjugate prior to binomial
distribution (game result)

Here α = wins from node, β = losses from node

Let p be parent’s winning percentage and 0 < a < 1
parameter

Pick a move that is most likely to have a winning
percentage greater than (1 − a)p + a
              Dmitry Kamenetsky   Machine Learning applied to Go
Beta Distributions
                   Introduction
                                  Cooperative Scorer
                 Monte Carlo Go
                                  Zobrist Hash
                      My Work
                                  Conditional Random Fields


                     Replacing UCB

UCT is ad-hoc. Lack of theoretical analysis, because
random variables (rewards Xi ) are not i.i.d.

Instead, use Beta distributions to model random variables

Beta distribution is a conjugate prior to binomial
distribution (game result)

Here α = wins from node, β = losses from node

Let p be parent’s winning percentage and 0 < a < 1
parameter

Pick a move that is most likely to have a winning
percentage greater than (1 − a)p + a
              Dmitry Kamenetsky   Machine Learning applied to Go
Beta Distributions
                   Introduction
                                  Cooperative Scorer
                 Monte Carlo Go
                                  Zobrist Hash
                      My Work
                                  Conditional Random Fields


                     Replacing UCB

UCT is ad-hoc. Lack of theoretical analysis, because
random variables (rewards Xi ) are not i.i.d.

Instead, use Beta distributions to model random variables

Beta distribution is a conjugate prior to binomial
distribution (game result)

Here α = wins from node, β = losses from node

Let p be parent’s winning percentage and 0 < a < 1
parameter

Pick a move that is most likely to have a winning
percentage greater than (1 − a)p + a
              Dmitry Kamenetsky   Machine Learning applied to Go
Beta Distributions
                   Introduction
                                  Cooperative Scorer
                 Monte Carlo Go
                                  Zobrist Hash
                      My Work
                                  Conditional Random Fields


                     Replacing UCB

UCT is ad-hoc. Lack of theoretical analysis, because
random variables (rewards Xi ) are not i.i.d.

Instead, use Beta distributions to model random variables

Beta distribution is a conjugate prior to binomial
distribution (game result)

Here α = wins from node, β = losses from node

Let p be parent’s winning percentage and 0 < a < 1
parameter

Pick a move that is most likely to have a winning
percentage greater than (1 − a)p + a
              Dmitry Kamenetsky   Machine Learning applied to Go
Beta Distributions
                   Introduction
                                  Cooperative Scorer
                 Monte Carlo Go
                                  Zobrist Hash
                      My Work
                                  Conditional Random Fields


                     Replacing UCB

UCT is ad-hoc. Lack of theoretical analysis, because
random variables (rewards Xi ) are not i.i.d.

Instead, use Beta distributions to model random variables

Beta distribution is a conjugate prior to binomial
distribution (game result)

Here α = wins from node, β = losses from node

Let p be parent’s winning percentage and 0 < a < 1
parameter

Pick a move that is most likely to have a winning
percentage greater than (1 − a)p + a
              Dmitry Kamenetsky   Machine Learning applied to Go
Beta Distributions
                   Introduction
                                  Cooperative Scorer
                 Monte Carlo Go
                                  Zobrist Hash
                      My Work
                                  Conditional Random Fields


                     Replacing UCB

UCT is ad-hoc. Lack of theoretical analysis, because
random variables (rewards Xi ) are not i.i.d.

Instead, use Beta distributions to model random variables

Beta distribution is a conjugate prior to binomial
distribution (game result)

Here α = wins from node, β = losses from node

Let p be parent’s winning percentage and 0 < a < 1
parameter

Pick a move that is most likely to have a winning
percentage greater than (1 − a)p + a
              Dmitry Kamenetsky   Machine Learning applied to Go
Beta Distributions
                    Introduction
                                   Cooperative Scorer
                  Monte Carlo Go
                                   Zobrist Hash
                       My Work
                                   Conditional Random Fields


              Improving node evaluation


MoGo’s node evaluation is fast, but not so meaningful

Instead, use our cooperative scorer:

    Initialization: Statically fill neutral territory with stones

    Loop: players cooperate to make moves that do not affect
    the score

Accurately predicts score: 96.3% on 9x9 and 89.2% on
19x19

Only 15 times slower than pure random


               Dmitry Kamenetsky   Machine Learning applied to Go
Beta Distributions
                    Introduction
                                   Cooperative Scorer
                  Monte Carlo Go
                                   Zobrist Hash
                       My Work
                                   Conditional Random Fields


              Improving node evaluation


MoGo’s node evaluation is fast, but not so meaningful

Instead, use our cooperative scorer:

    Initialization: Statically fill neutral territory with stones

    Loop: players cooperate to make moves that do not affect
    the score

Accurately predicts score: 96.3% on 9x9 and 89.2% on
19x19

Only 15 times slower than pure random


               Dmitry Kamenetsky   Machine Learning applied to Go
Beta Distributions
                    Introduction
                                   Cooperative Scorer
                  Monte Carlo Go
                                   Zobrist Hash
                       My Work
                                   Conditional Random Fields


              Improving node evaluation


MoGo’s node evaluation is fast, but not so meaningful

Instead, use our cooperative scorer:

    Initialization: Statically fill neutral territory with stones

    Loop: players cooperate to make moves that do not affect
    the score

Accurately predicts score: 96.3% on 9x9 and 89.2% on
19x19

Only 15 times slower than pure random


               Dmitry Kamenetsky   Machine Learning applied to Go
Beta Distributions
                    Introduction
                                   Cooperative Scorer
                  Monte Carlo Go
                                   Zobrist Hash
                       My Work
                                   Conditional Random Fields


              Improving node evaluation


MoGo’s node evaluation is fast, but not so meaningful

Instead, use our cooperative scorer:

    Initialization: Statically fill neutral territory with stones

    Loop: players cooperate to make moves that do not affect
    the score

Accurately predicts score: 96.3% on 9x9 and 89.2% on
19x19

Only 15 times slower than pure random


               Dmitry Kamenetsky   Machine Learning applied to Go
Beta Distributions
                    Introduction
                                   Cooperative Scorer
                  Monte Carlo Go
                                   Zobrist Hash
                       My Work
                                   Conditional Random Fields


              Improving node evaluation


MoGo’s node evaluation is fast, but not so meaningful

Instead, use our cooperative scorer:

    Initialization: Statically fill neutral territory with stones

    Loop: players cooperate to make moves that do not affect
    the score

Accurately predicts score: 96.3% on 9x9 and 89.2% on
19x19

Only 15 times slower than pure random


               Dmitry Kamenetsky   Machine Learning applied to Go
Beta Distributions
                    Introduction
                                   Cooperative Scorer
                  Monte Carlo Go
                                   Zobrist Hash
                       My Work
                                   Conditional Random Fields


              Improving node evaluation


MoGo’s node evaluation is fast, but not so meaningful

Instead, use our cooperative scorer:

    Initialization: Statically fill neutral territory with stones

    Loop: players cooperate to make moves that do not affect
    the score

Accurately predicts score: 96.3% on 9x9 and 89.2% on
19x19

Only 15 times slower than pure random


               Dmitry Kamenetsky   Machine Learning applied to Go
Beta Distributions
     Introduction
                    Cooperative Scorer
   Monte Carlo Go
                    Zobrist Hash
        My Work
                    Conditional Random Fields


       Scorer Example




Dmitry Kamenetsky   Machine Learning applied to Go
Beta Distributions
     Introduction
                    Cooperative Scorer
   Monte Carlo Go
                    Zobrist Hash
        My Work
                    Conditional Random Fields


       Scorer Example




Dmitry Kamenetsky   Machine Learning applied to Go
Beta Distributions
     Introduction
                    Cooperative Scorer
   Monte Carlo Go
                    Zobrist Hash
        My Work
                    Conditional Random Fields


       Scorer Example




Dmitry Kamenetsky   Machine Learning applied to Go
Beta Distributions
                   Introduction
                                  Cooperative Scorer
                 Monte Carlo Go
                                  Zobrist Hash
                      My Work
                                  Conditional Random Fields


          Improving memory management



Tree data structure is memory-inefficient

Instead, use a hash table:

    For each visited position p, key = ZobristHash(p)

    Store statistics of p: hashTable[key] = (#wins, #runs, depth)

    Collision handling

Use a small hash table with more information for
frequently visited nodes


              Dmitry Kamenetsky   Machine Learning applied to Go
Beta Distributions
                   Introduction
                                  Cooperative Scorer
                 Monte Carlo Go
                                  Zobrist Hash
                      My Work
                                  Conditional Random Fields


          Improving memory management



Tree data structure is memory-inefficient

Instead, use a hash table:

    For each visited position p, key = ZobristHash(p)

    Store statistics of p: hashTable[key] = (#wins, #runs, depth)

    Collision handling

Use a small hash table with more information for
frequently visited nodes


              Dmitry Kamenetsky   Machine Learning applied to Go
Beta Distributions
                   Introduction
                                  Cooperative Scorer
                 Monte Carlo Go
                                  Zobrist Hash
                      My Work
                                  Conditional Random Fields


          Improving memory management



Tree data structure is memory-inefficient

Instead, use a hash table:

    For each visited position p, key = ZobristHash(p)

    Store statistics of p: hashTable[key] = (#wins, #runs, depth)

    Collision handling

Use a small hash table with more information for
frequently visited nodes


              Dmitry Kamenetsky   Machine Learning applied to Go
Beta Distributions
                   Introduction
                                  Cooperative Scorer
                 Monte Carlo Go
                                  Zobrist Hash
                      My Work
                                  Conditional Random Fields


          Improving memory management



Tree data structure is memory-inefficient

Instead, use a hash table:

    For each visited position p, key = ZobristHash(p)

    Store statistics of p: hashTable[key] = (#wins, #runs, depth)

    Collision handling

Use a small hash table with more information for
frequently visited nodes


              Dmitry Kamenetsky   Machine Learning applied to Go
Beta Distributions
                   Introduction
                                  Cooperative Scorer
                 Monte Carlo Go
                                  Zobrist Hash
                      My Work
                                  Conditional Random Fields


          Improving memory management



Tree data structure is memory-inefficient

Instead, use a hash table:

    For each visited position p, key = ZobristHash(p)

    Store statistics of p: hashTable[key] = (#wins, #runs, depth)

    Collision handling

Use a small hash table with more information for
frequently visited nodes


              Dmitry Kamenetsky   Machine Learning applied to Go
Beta Distributions
                   Introduction
                                  Cooperative Scorer
                 Monte Carlo Go
                                  Zobrist Hash
                      My Work
                                  Conditional Random Fields


          Improving memory management



Tree data structure is memory-inefficient

Instead, use a hash table:

    For each visited position p, key = ZobristHash(p)

    Store statistics of p: hashTable[key] = (#wins, #runs, depth)

    Collision handling

Use a small hash table with more information for
frequently visited nodes


              Dmitry Kamenetsky   Machine Learning applied to Go
Beta Distributions
                   Introduction
                                  Cooperative Scorer
                 Monte Carlo Go
                                  Zobrist Hash
                      My Work
                                  Conditional Random Fields


            Learning evaluation function


Convert the board position into a graph:

    Collapse regions of the same colour into one node

    Create edges between adjacent regions

Use Condition Random Fields (CRF) to learn from this
graph:

    Learn final territory assignment

    Predict the next move

Can use this with the scorer or for move generation

              Dmitry Kamenetsky   Machine Learning applied to Go
Beta Distributions
                   Introduction
                                  Cooperative Scorer
                 Monte Carlo Go
                                  Zobrist Hash
                      My Work
                                  Conditional Random Fields


            Learning evaluation function


Convert the board position into a graph:

    Collapse regions of the same colour into one node

    Create edges between adjacent regions

Use Condition Random Fields (CRF) to learn from this
graph:

    Learn final territory assignment

    Predict the next move

Can use this with the scorer or for move generation

              Dmitry Kamenetsky   Machine Learning applied to Go
Beta Distributions
                   Introduction
                                  Cooperative Scorer
                 Monte Carlo Go
                                  Zobrist Hash
                      My Work
                                  Conditional Random Fields


            Learning evaluation function


Convert the board position into a graph:

    Collapse regions of the same colour into one node

    Create edges between adjacent regions

Use Condition Random Fields (CRF) to learn from this
graph:

    Learn final territory assignment

    Predict the next move

Can use this with the scorer or for move generation

              Dmitry Kamenetsky   Machine Learning applied to Go
Beta Distributions
                   Introduction
                                  Cooperative Scorer
                 Monte Carlo Go
                                  Zobrist Hash
                      My Work
                                  Conditional Random Fields


            Learning evaluation function


Convert the board position into a graph:

    Collapse regions of the same colour into one node

    Create edges between adjacent regions

Use Condition Random Fields (CRF) to learn from this
graph:

    Learn final territory assignment

    Predict the next move

Can use this with the scorer or for move generation

              Dmitry Kamenetsky   Machine Learning applied to Go
Beta Distributions
                   Introduction
                                  Cooperative Scorer
                 Monte Carlo Go
                                  Zobrist Hash
                      My Work
                                  Conditional Random Fields


            Learning evaluation function


Convert the board position into a graph:

    Collapse regions of the same colour into one node

    Create edges between adjacent regions

Use Condition Random Fields (CRF) to learn from this
graph:

    Learn final territory assignment

    Predict the next move

Can use this with the scorer or for move generation

              Dmitry Kamenetsky   Machine Learning applied to Go
Beta Distributions
                   Introduction
                                  Cooperative Scorer
                 Monte Carlo Go
                                  Zobrist Hash
                      My Work
                                  Conditional Random Fields


            Learning evaluation function


Convert the board position into a graph:

    Collapse regions of the same colour into one node

    Create edges between adjacent regions

Use Condition Random Fields (CRF) to learn from this
graph:

    Learn final territory assignment

    Predict the next move

Can use this with the scorer or for move generation

              Dmitry Kamenetsky   Machine Learning applied to Go
Beta Distributions
                   Introduction
                                  Cooperative Scorer
                 Monte Carlo Go
                                  Zobrist Hash
                      My Work
                                  Conditional Random Fields


            Learning evaluation function


Convert the board position into a graph:

    Collapse regions of the same colour into one node

    Create edges between adjacent regions

Use Condition Random Fields (CRF) to learn from this
graph:

    Learn final territory assignment

    Predict the next move

Can use this with the scorer or for move generation

              Dmitry Kamenetsky   Machine Learning applied to Go
Beta Distributions
     Introduction
                    Cooperative Scorer
   Monte Carlo Go
                    Zobrist Hash
        My Work
                    Conditional Random Fields


Graph conversion example




Dmitry Kamenetsky   Machine Learning applied to Go
Beta Distributions
     Introduction
                    Cooperative Scorer
   Monte Carlo Go
                    Zobrist Hash
        My Work
                    Conditional Random Fields


Graph conversion example




Dmitry Kamenetsky   Machine Learning applied to Go
Beta Distributions
                  Introduction
                                 Cooperative Scorer
                Monte Carlo Go
                                 Zobrist Hash
                     My Work
                                 Conditional Random Fields


                        Questions?




You never ever know if you never ever GO!




             Dmitry Kamenetsky   Machine Learning applied to Go

Contenu connexe

Plus de butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 
Download
DownloadDownload
Downloadbutest
 
resume.doc
resume.docresume.doc
resume.docbutest
 
Download.doc.doc
Download.doc.docDownload.doc.doc
Download.doc.docbutest
 
Resume
ResumeResume
Resumebutest
 

Plus de butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 
Download
DownloadDownload
Download
 
resume.doc
resume.docresume.doc
resume.doc
 
Download.doc.doc
Download.doc.docDownload.doc.doc
Download.doc.doc
 
Resume
ResumeResume
Resume
 

Machine Learning applied to Go

  • 1. Introduction Monte Carlo Go My Work Machine Learning applied to Go Dmitry Kamenetsky Supervisor: Nic Schraudolph March 2007 Dmitry Kamenetsky Machine Learning applied to Go
  • 2. Introduction Monte Carlo Go My Work 1 Introduction 2 Monte Carlo Go 3 My Work Dmitry Kamenetsky Machine Learning applied to Go
  • 3. Introduction Monte Carlo Go My Work What is Go? Two-player deterministic board game Originated in ancient China. Today very popular in China, Japan and Korea 19x19 grid, also 9x9 grid for beginners Simple rules, but very complex strategy Dmitry Kamenetsky Machine Learning applied to Go
  • 4. Introduction Monte Carlo Go My Work Why study Go? If there are sentient beings on other planets, then they play Go – Emanuel Lasker, former chess world champion Go is one of the grand challenges of AI – Ron Rivest, professor of Computer Science at MIT Go is like life and life is like Go - Chinese proverb Dmitry Kamenetsky Machine Learning applied to Go
  • 5. Introduction Monte Carlo Go My Work Why study Go? If there are sentient beings on other planets, then they play Go – Emanuel Lasker, former chess world champion Go is one of the grand challenges of AI – Ron Rivest, professor of Computer Science at MIT Go is like life and life is like Go - Chinese proverb Dmitry Kamenetsky Machine Learning applied to Go
  • 6. Introduction Monte Carlo Go My Work Why study Go? If there are sentient beings on other planets, then they play Go – Emanuel Lasker, former chess world champion Go is one of the grand challenges of AI – Ron Rivest, professor of Computer Science at MIT Go is like life and life is like Go - Chinese proverb Dmitry Kamenetsky Machine Learning applied to Go
  • 7. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo Monte Carlo Often used for problems that have no closed solution, e.g. computational physics Sample instances from some large population Use samples to approximate some common property of the population In search, select the next state based on some fixed distribution (usually uniform) Recently, have been very successful in Go, causing a mini-revolution Dmitry Kamenetsky Machine Learning applied to Go
  • 8. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo Monte Carlo Often used for problems that have no closed solution, e.g. computational physics Sample instances from some large population Use samples to approximate some common property of the population In search, select the next state based on some fixed distribution (usually uniform) Recently, have been very successful in Go, causing a mini-revolution Dmitry Kamenetsky Machine Learning applied to Go
  • 9. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo Monte Carlo Often used for problems that have no closed solution, e.g. computational physics Sample instances from some large population Use samples to approximate some common property of the population In search, select the next state based on some fixed distribution (usually uniform) Recently, have been very successful in Go, causing a mini-revolution Dmitry Kamenetsky Machine Learning applied to Go
  • 10. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo Monte Carlo Often used for problems that have no closed solution, e.g. computational physics Sample instances from some large population Use samples to approximate some common property of the population In search, select the next state based on some fixed distribution (usually uniform) Recently, have been very successful in Go, causing a mini-revolution Dmitry Kamenetsky Machine Learning applied to Go
  • 11. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo Monte Carlo Often used for problems that have no closed solution, e.g. computational physics Sample instances from some large population Use samples to approximate some common property of the population In search, select the next state based on some fixed distribution (usually uniform) Recently, have been very successful in Go, causing a mini-revolution Dmitry Kamenetsky Machine Learning applied to Go
  • 12. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo K-armed bandit problem Slot machine with K arms. Each arm provides a reward based on some unknown, but fixed distribution Goal: to choose arms to play such that the total reward is maximized How should the gambler play at any given moment? Choose arm with highest average reward seen so far (exploitation) Choose a sub-optimal arm in the hope that it it will lead to a greater reward (exploration) Neither - need a combination of exploitation and exploration Dmitry Kamenetsky Machine Learning applied to Go
  • 13. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo K-armed bandit problem Slot machine with K arms. Each arm provides a reward based on some unknown, but fixed distribution Goal: to choose arms to play such that the total reward is maximized How should the gambler play at any given moment? Choose arm with highest average reward seen so far (exploitation) Choose a sub-optimal arm in the hope that it it will lead to a greater reward (exploration) Neither - need a combination of exploitation and exploration Dmitry Kamenetsky Machine Learning applied to Go
  • 14. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo K-armed bandit problem Slot machine with K arms. Each arm provides a reward based on some unknown, but fixed distribution Goal: to choose arms to play such that the total reward is maximized How should the gambler play at any given moment? Choose arm with highest average reward seen so far (exploitation) Choose a sub-optimal arm in the hope that it it will lead to a greater reward (exploration) Neither - need a combination of exploitation and exploration Dmitry Kamenetsky Machine Learning applied to Go
  • 15. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo K-armed bandit problem Slot machine with K arms. Each arm provides a reward based on some unknown, but fixed distribution Goal: to choose arms to play such that the total reward is maximized How should the gambler play at any given moment? Choose arm with highest average reward seen so far (exploitation) Choose a sub-optimal arm in the hope that it it will lead to a greater reward (exploration) Neither - need a combination of exploitation and exploration Dmitry Kamenetsky Machine Learning applied to Go
  • 16. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo K-armed bandit problem Slot machine with K arms. Each arm provides a reward based on some unknown, but fixed distribution Goal: to choose arms to play such that the total reward is maximized How should the gambler play at any given moment? Choose arm with highest average reward seen so far (exploitation) Choose a sub-optimal arm in the hope that it it will lead to a greater reward (exploration) Neither - need a combination of exploitation and exploration Dmitry Kamenetsky Machine Learning applied to Go
  • 17. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo K-armed bandit problem Slot machine with K arms. Each arm provides a reward based on some unknown, but fixed distribution Goal: to choose arms to play such that the total reward is maximized How should the gambler play at any given moment? Choose arm with highest average reward seen so far (exploitation) Choose a sub-optimal arm in the hope that it it will lead to a greater reward (exploration) Neither - need a combination of exploitation and exploration Dmitry Kamenetsky Machine Learning applied to Go
  • 18. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo Upper Confidence Bounds Successive plays of arm i give rewards Xi,1 , Xi,2 , ... which are i.i.d. with unknown E(Xi ) = µi Let Ti (n) be the number of times arm i has been played during the first n plays of machine Upper Confidence Bounds - UCB (Auer et al. 2002) Initialization: Play each arm once 2 log n Loop: Play arm i that maximizes µi + Ti (n) Proven to achieve optimal regret Dmitry Kamenetsky Machine Learning applied to Go
  • 19. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo Upper Confidence Bounds Successive plays of arm i give rewards Xi,1 , Xi,2 , ... which are i.i.d. with unknown E(Xi ) = µi Let Ti (n) be the number of times arm i has been played during the first n plays of machine Upper Confidence Bounds - UCB (Auer et al. 2002) Initialization: Play each arm once 2 log n Loop: Play arm i that maximizes µi + Ti (n) Proven to achieve optimal regret Dmitry Kamenetsky Machine Learning applied to Go
  • 20. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo Upper Confidence Bounds Successive plays of arm i give rewards Xi,1 , Xi,2 , ... which are i.i.d. with unknown E(Xi ) = µi Let Ti (n) be the number of times arm i has been played during the first n plays of machine Upper Confidence Bounds - UCB (Auer et al. 2002) Initialization: Play each arm once 2 log n Loop: Play arm i that maximizes µi + Ti (n) Proven to achieve optimal regret Dmitry Kamenetsky Machine Learning applied to Go
  • 21. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo Upper Confidence Bounds Successive plays of arm i give rewards Xi,1 , Xi,2 , ... which are i.i.d. with unknown E(Xi ) = µi Let Ti (n) be the number of times arm i has been played during the first n plays of machine Upper Confidence Bounds - UCB (Auer et al. 2002) Initialization: Play each arm once 2 log n Loop: Play arm i that maximizes µi + Ti (n) Proven to achieve optimal regret Dmitry Kamenetsky Machine Learning applied to Go
  • 22. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo Upper Confidence Bounds Successive plays of arm i give rewards Xi,1 , Xi,2 , ... which are i.i.d. with unknown E(Xi ) = µi Let Ti (n) be the number of times arm i has been played during the first n plays of machine Upper Confidence Bounds - UCB (Auer et al. 2002) Initialization: Play each arm once 2 log n Loop: Play arm i that maximizes µi + Ti (n) Proven to achieve optimal regret Dmitry Kamenetsky Machine Learning applied to Go
  • 23. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo Upper Confidence Bounds Successive plays of arm i give rewards Xi,1 , Xi,2 , ... which are i.i.d. with unknown E(Xi ) = µi Let Ti (n) be the number of times arm i has been played during the first n plays of machine Upper Confidence Bounds - UCB (Auer et al. 2002) Initialization: Play each arm once 2 log n Loop: Play arm i that maximizes µi + Ti (n) Proven to achieve optimal regret Dmitry Kamenetsky Machine Learning applied to Go
  • 24. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo UCT UCB for minimax tree search (Kocsis and Szepesvari 2006) Start at the current board position p For i = 1 to 100,000 (number of simulations) p ←p until stopping criterion is reached (e.g. end of game) p ← p + move given by UCB create node p Evaluate leaf: value ← winner of p Update all the visited nodes with value At p play move with highest winning percentage Dmitry Kamenetsky Machine Learning applied to Go
  • 25. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo UCT UCB for minimax tree search (Kocsis and Szepesvari 2006) Start at the current board position p For i = 1 to 100,000 (number of simulations) p ←p until stopping criterion is reached (e.g. end of game) p ← p + move given by UCB create node p Evaluate leaf: value ← winner of p Update all the visited nodes with value At p play move with highest winning percentage Dmitry Kamenetsky Machine Learning applied to Go
  • 26. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo UCT UCB for minimax tree search (Kocsis and Szepesvari 2006) Start at the current board position p For i = 1 to 100,000 (number of simulations) p ←p until stopping criterion is reached (e.g. end of game) p ← p + move given by UCB create node p Evaluate leaf: value ← winner of p Update all the visited nodes with value At p play move with highest winning percentage Dmitry Kamenetsky Machine Learning applied to Go
  • 27. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo UCT UCB for minimax tree search (Kocsis and Szepesvari 2006) Start at the current board position p For i = 1 to 100,000 (number of simulations) p ←p until stopping criterion is reached (e.g. end of game) p ← p + move given by UCB create node p Evaluate leaf: value ← winner of p Update all the visited nodes with value At p play move with highest winning percentage Dmitry Kamenetsky Machine Learning applied to Go
  • 28. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo UCT UCB for minimax tree search (Kocsis and Szepesvari 2006) Start at the current board position p For i = 1 to 100,000 (number of simulations) p ←p until stopping criterion is reached (e.g. end of game) p ← p + move given by UCB create node p Evaluate leaf: value ← winner of p Update all the visited nodes with value At p play move with highest winning percentage Dmitry Kamenetsky Machine Learning applied to Go
  • 29. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo UCT UCB for minimax tree search (Kocsis and Szepesvari 2006) Start at the current board position p For i = 1 to 100,000 (number of simulations) p ←p until stopping criterion is reached (e.g. end of game) p ← p + move given by UCB create node p Evaluate leaf: value ← winner of p Update all the visited nodes with value At p play move with highest winning percentage Dmitry Kamenetsky Machine Learning applied to Go
  • 30. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo UCT UCB for minimax tree search (Kocsis and Szepesvari 2006) Start at the current board position p For i = 1 to 100,000 (number of simulations) p ←p until stopping criterion is reached (e.g. end of game) p ← p + move given by UCB create node p Evaluate leaf: value ← winner of p Update all the visited nodes with value At p play move with highest winning percentage Dmitry Kamenetsky Machine Learning applied to Go
  • 31. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo UCT UCB for minimax tree search (Kocsis and Szepesvari 2006) Start at the current board position p For i = 1 to 100,000 (number of simulations) p ←p until stopping criterion is reached (e.g. end of game) p ← p + move given by UCB create node p Evaluate leaf: value ← winner of p Update all the visited nodes with value At p play move with highest winning percentage Dmitry Kamenetsky Machine Learning applied to Go
  • 32. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo UCT UCB for minimax tree search (Kocsis and Szepesvari 2006) Start at the current board position p For i = 1 to 100,000 (number of simulations) p ←p until stopping criterion is reached (e.g. end of game) p ← p + move given by UCB create node p Evaluate leaf: value ← winner of p Update all the visited nodes with value At p play move with highest winning percentage Dmitry Kamenetsky Machine Learning applied to Go
  • 33. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo UCT UCB for minimax tree search (Kocsis and Szepesvari 2006) Start at the current board position p For i = 1 to 100,000 (number of simulations) p ←p until stopping criterion is reached (e.g. end of game) p ← p + move given by UCB create node p Evaluate leaf: value ← winner of p Update all the visited nodes with value At p play move with highest winning percentage Dmitry Kamenetsky Machine Learning applied to Go
  • 34. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo MoGo First Go program to use UCT (Gelly et. al 2006) Store nodes and their statistics in a tree data structure Stopping criterion is a node that is not yet in the tree Leaf node evaluation: Playout position randomly until no moves remain. Final position is trivial to score Enhanced through the use of patterns and playing near the previous move Pruning techniques, smart ordering of unexplored moves Dmitry Kamenetsky Machine Learning applied to Go
  • 35. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo MoGo First Go program to use UCT (Gelly et. al 2006) Store nodes and their statistics in a tree data structure Stopping criterion is a node that is not yet in the tree Leaf node evaluation: Playout position randomly until no moves remain. Final position is trivial to score Enhanced through the use of patterns and playing near the previous move Pruning techniques, smart ordering of unexplored moves Dmitry Kamenetsky Machine Learning applied to Go
  • 36. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo MoGo First Go program to use UCT (Gelly et. al 2006) Store nodes and their statistics in a tree data structure Stopping criterion is a node that is not yet in the tree Leaf node evaluation: Playout position randomly until no moves remain. Final position is trivial to score Enhanced through the use of patterns and playing near the previous move Pruning techniques, smart ordering of unexplored moves Dmitry Kamenetsky Machine Learning applied to Go
  • 37. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo MoGo First Go program to use UCT (Gelly et. al 2006) Store nodes and their statistics in a tree data structure Stopping criterion is a node that is not yet in the tree Leaf node evaluation: Playout position randomly until no moves remain. Final position is trivial to score Enhanced through the use of patterns and playing near the previous move Pruning techniques, smart ordering of unexplored moves Dmitry Kamenetsky Machine Learning applied to Go
  • 38. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo MoGo First Go program to use UCT (Gelly et. al 2006) Store nodes and their statistics in a tree data structure Stopping criterion is a node that is not yet in the tree Leaf node evaluation: Playout position randomly until no moves remain. Final position is trivial to score Enhanced through the use of patterns and playing near the previous move Pruning techniques, smart ordering of unexplored moves Dmitry Kamenetsky Machine Learning applied to Go
  • 39. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo MoGo First Go program to use UCT (Gelly et. al 2006) Store nodes and their statistics in a tree data structure Stopping criterion is a node that is not yet in the tree Leaf node evaluation: Playout position randomly until no moves remain. Final position is trivial to score Enhanced through the use of patterns and playing near the previous move Pruning techniques, smart ordering of unexplored moves Dmitry Kamenetsky Machine Learning applied to Go
  • 40. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo MoGo First Go program to use UCT (Gelly et. al 2006) Store nodes and their statistics in a tree data structure Stopping criterion is a node that is not yet in the tree Leaf node evaluation: Playout position randomly until no moves remain. Final position is trivial to score Enhanced through the use of patterns and playing near the previous move Pruning techniques, smart ordering of unexplored moves Dmitry Kamenetsky Machine Learning applied to Go
  • 41. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo MoGo’s success Ranked first on 9x9 Computer Go Server since August 2006 Won two most recent tournaments on 9x9 and 13x13 Expected to reach the level of human professional on 9x9 board Dmitry Kamenetsky Machine Learning applied to Go
  • 42. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo MoGo’s success Ranked first on 9x9 Computer Go Server since August 2006 Won two most recent tournaments on 9x9 and 13x13 Expected to reach the level of human professional on 9x9 board Dmitry Kamenetsky Machine Learning applied to Go
  • 43. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo MoGo’s success Ranked first on 9x9 Computer Go Server since August 2006 Won two most recent tournaments on 9x9 and 13x13 Expected to reach the level of human professional on 9x9 board Dmitry Kamenetsky Machine Learning applied to Go
  • 44. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Replacing UCB UCT is ad-hoc. Lack of theoretical analysis, because random variables (rewards Xi ) are not i.i.d. Instead, use Beta distributions to model random variables Beta distribution is a conjugate prior to binomial distribution (game result) Here α = wins from node, β = losses from node Let p be parent’s winning percentage and 0 < a < 1 parameter Pick a move that is most likely to have a winning percentage greater than (1 − a)p + a Dmitry Kamenetsky Machine Learning applied to Go
  • 45. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Replacing UCB UCT is ad-hoc. Lack of theoretical analysis, because random variables (rewards Xi ) are not i.i.d. Instead, use Beta distributions to model random variables Beta distribution is a conjugate prior to binomial distribution (game result) Here α = wins from node, β = losses from node Let p be parent’s winning percentage and 0 < a < 1 parameter Pick a move that is most likely to have a winning percentage greater than (1 − a)p + a Dmitry Kamenetsky Machine Learning applied to Go
  • 46. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Replacing UCB UCT is ad-hoc. Lack of theoretical analysis, because random variables (rewards Xi ) are not i.i.d. Instead, use Beta distributions to model random variables Beta distribution is a conjugate prior to binomial distribution (game result) Here α = wins from node, β = losses from node Let p be parent’s winning percentage and 0 < a < 1 parameter Pick a move that is most likely to have a winning percentage greater than (1 − a)p + a Dmitry Kamenetsky Machine Learning applied to Go
  • 47. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Replacing UCB UCT is ad-hoc. Lack of theoretical analysis, because random variables (rewards Xi ) are not i.i.d. Instead, use Beta distributions to model random variables Beta distribution is a conjugate prior to binomial distribution (game result) Here α = wins from node, β = losses from node Let p be parent’s winning percentage and 0 < a < 1 parameter Pick a move that is most likely to have a winning percentage greater than (1 − a)p + a Dmitry Kamenetsky Machine Learning applied to Go
  • 48. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Replacing UCB UCT is ad-hoc. Lack of theoretical analysis, because random variables (rewards Xi ) are not i.i.d. Instead, use Beta distributions to model random variables Beta distribution is a conjugate prior to binomial distribution (game result) Here α = wins from node, β = losses from node Let p be parent’s winning percentage and 0 < a < 1 parameter Pick a move that is most likely to have a winning percentage greater than (1 − a)p + a Dmitry Kamenetsky Machine Learning applied to Go
  • 49. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Replacing UCB UCT is ad-hoc. Lack of theoretical analysis, because random variables (rewards Xi ) are not i.i.d. Instead, use Beta distributions to model random variables Beta distribution is a conjugate prior to binomial distribution (game result) Here α = wins from node, β = losses from node Let p be parent’s winning percentage and 0 < a < 1 parameter Pick a move that is most likely to have a winning percentage greater than (1 − a)p + a Dmitry Kamenetsky Machine Learning applied to Go
  • 50. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Improving node evaluation MoGo’s node evaluation is fast, but not so meaningful Instead, use our cooperative scorer: Initialization: Statically fill neutral territory with stones Loop: players cooperate to make moves that do not affect the score Accurately predicts score: 96.3% on 9x9 and 89.2% on 19x19 Only 15 times slower than pure random Dmitry Kamenetsky Machine Learning applied to Go
  • 51. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Improving node evaluation MoGo’s node evaluation is fast, but not so meaningful Instead, use our cooperative scorer: Initialization: Statically fill neutral territory with stones Loop: players cooperate to make moves that do not affect the score Accurately predicts score: 96.3% on 9x9 and 89.2% on 19x19 Only 15 times slower than pure random Dmitry Kamenetsky Machine Learning applied to Go
  • 52. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Improving node evaluation MoGo’s node evaluation is fast, but not so meaningful Instead, use our cooperative scorer: Initialization: Statically fill neutral territory with stones Loop: players cooperate to make moves that do not affect the score Accurately predicts score: 96.3% on 9x9 and 89.2% on 19x19 Only 15 times slower than pure random Dmitry Kamenetsky Machine Learning applied to Go
  • 53. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Improving node evaluation MoGo’s node evaluation is fast, but not so meaningful Instead, use our cooperative scorer: Initialization: Statically fill neutral territory with stones Loop: players cooperate to make moves that do not affect the score Accurately predicts score: 96.3% on 9x9 and 89.2% on 19x19 Only 15 times slower than pure random Dmitry Kamenetsky Machine Learning applied to Go
  • 54. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Improving node evaluation MoGo’s node evaluation is fast, but not so meaningful Instead, use our cooperative scorer: Initialization: Statically fill neutral territory with stones Loop: players cooperate to make moves that do not affect the score Accurately predicts score: 96.3% on 9x9 and 89.2% on 19x19 Only 15 times slower than pure random Dmitry Kamenetsky Machine Learning applied to Go
  • 55. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Improving node evaluation MoGo’s node evaluation is fast, but not so meaningful Instead, use our cooperative scorer: Initialization: Statically fill neutral territory with stones Loop: players cooperate to make moves that do not affect the score Accurately predicts score: 96.3% on 9x9 and 89.2% on 19x19 Only 15 times slower than pure random Dmitry Kamenetsky Machine Learning applied to Go
  • 56. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Scorer Example Dmitry Kamenetsky Machine Learning applied to Go
  • 57. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Scorer Example Dmitry Kamenetsky Machine Learning applied to Go
  • 58. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Scorer Example Dmitry Kamenetsky Machine Learning applied to Go
  • 59. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Improving memory management Tree data structure is memory-inefficient Instead, use a hash table: For each visited position p, key = ZobristHash(p) Store statistics of p: hashTable[key] = (#wins, #runs, depth) Collision handling Use a small hash table with more information for frequently visited nodes Dmitry Kamenetsky Machine Learning applied to Go
  • 60. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Improving memory management Tree data structure is memory-inefficient Instead, use a hash table: For each visited position p, key = ZobristHash(p) Store statistics of p: hashTable[key] = (#wins, #runs, depth) Collision handling Use a small hash table with more information for frequently visited nodes Dmitry Kamenetsky Machine Learning applied to Go
  • 61. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Improving memory management Tree data structure is memory-inefficient Instead, use a hash table: For each visited position p, key = ZobristHash(p) Store statistics of p: hashTable[key] = (#wins, #runs, depth) Collision handling Use a small hash table with more information for frequently visited nodes Dmitry Kamenetsky Machine Learning applied to Go
  • 62. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Improving memory management Tree data structure is memory-inefficient Instead, use a hash table: For each visited position p, key = ZobristHash(p) Store statistics of p: hashTable[key] = (#wins, #runs, depth) Collision handling Use a small hash table with more information for frequently visited nodes Dmitry Kamenetsky Machine Learning applied to Go
  • 63. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Improving memory management Tree data structure is memory-inefficient Instead, use a hash table: For each visited position p, key = ZobristHash(p) Store statistics of p: hashTable[key] = (#wins, #runs, depth) Collision handling Use a small hash table with more information for frequently visited nodes Dmitry Kamenetsky Machine Learning applied to Go
  • 64. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Improving memory management Tree data structure is memory-inefficient Instead, use a hash table: For each visited position p, key = ZobristHash(p) Store statistics of p: hashTable[key] = (#wins, #runs, depth) Collision handling Use a small hash table with more information for frequently visited nodes Dmitry Kamenetsky Machine Learning applied to Go
  • 65. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Learning evaluation function Convert the board position into a graph: Collapse regions of the same colour into one node Create edges between adjacent regions Use Condition Random Fields (CRF) to learn from this graph: Learn final territory assignment Predict the next move Can use this with the scorer or for move generation Dmitry Kamenetsky Machine Learning applied to Go
  • 66. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Learning evaluation function Convert the board position into a graph: Collapse regions of the same colour into one node Create edges between adjacent regions Use Condition Random Fields (CRF) to learn from this graph: Learn final territory assignment Predict the next move Can use this with the scorer or for move generation Dmitry Kamenetsky Machine Learning applied to Go
  • 67. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Learning evaluation function Convert the board position into a graph: Collapse regions of the same colour into one node Create edges between adjacent regions Use Condition Random Fields (CRF) to learn from this graph: Learn final territory assignment Predict the next move Can use this with the scorer or for move generation Dmitry Kamenetsky Machine Learning applied to Go
  • 68. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Learning evaluation function Convert the board position into a graph: Collapse regions of the same colour into one node Create edges between adjacent regions Use Condition Random Fields (CRF) to learn from this graph: Learn final territory assignment Predict the next move Can use this with the scorer or for move generation Dmitry Kamenetsky Machine Learning applied to Go
  • 69. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Learning evaluation function Convert the board position into a graph: Collapse regions of the same colour into one node Create edges between adjacent regions Use Condition Random Fields (CRF) to learn from this graph: Learn final territory assignment Predict the next move Can use this with the scorer or for move generation Dmitry Kamenetsky Machine Learning applied to Go
  • 70. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Learning evaluation function Convert the board position into a graph: Collapse regions of the same colour into one node Create edges between adjacent regions Use Condition Random Fields (CRF) to learn from this graph: Learn final territory assignment Predict the next move Can use this with the scorer or for move generation Dmitry Kamenetsky Machine Learning applied to Go
  • 71. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Learning evaluation function Convert the board position into a graph: Collapse regions of the same colour into one node Create edges between adjacent regions Use Condition Random Fields (CRF) to learn from this graph: Learn final territory assignment Predict the next move Can use this with the scorer or for move generation Dmitry Kamenetsky Machine Learning applied to Go
  • 72. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Graph conversion example Dmitry Kamenetsky Machine Learning applied to Go
  • 73. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Graph conversion example Dmitry Kamenetsky Machine Learning applied to Go
  • 74. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Questions? You never ever know if you never ever GO! Dmitry Kamenetsky Machine Learning applied to Go