SlideShare une entreprise Scribd logo
1  sur  63
Optimistic Heuristics &
Application to MineSweeper

O. Buffet, W. Lin, O. Teytaud
A great challenge: MineSweeper.

- looks easy
- in fact, not easy:
    many myopic (one-
   step-ahead)
   approaches.
- partially observable
1. Rules of MineSweeper

    2. State of the art

  3. The CSP approach

  4. The UCT approach

5. The best of both worlds
RULES



    At the
 beginning,
      all
  locations
     are
  Covered
(unkwown).
I play
here!
Good news!

 No mine in
     the
neighborhood!

 I can “click”
    all the
 neighbours.
I have 3
  uncovered
  neighbors,
 and I have 3
 mines in the
neighborhood
 ==> 3 flags!
I know
  it's a
 mine,
so I put
 a flag!
No info !
I play here and I lose...
The most
successful
game ever!
Who in this
room never
  played
   Mine-
Sweeper ?
1. Rules of MineSweeper

   2. State of the art

  3. The CSP approach

  4. The UCT approach

5. The best of both worlds
Do you
 think it's
  easy ?
 (10 mines)

MineSweeper
is not simple.
What is
the optimal
  move ?
What is
                                        the optimal
                                          move ?


 Remark: the question makes sense, without
             Knowing the history.
You don't need the history for playing optimaly.
 ==> (this fact is mathematically non trivial!)
What is
                                     the optimal
                                       move ?



             This one is easy.

Both remaining locations win with proba 50%.
More
difficult!
 Which
move is
optimal ?

Here, the
classical
approach
  (CSP)
is wrong.
Probability
   of a mine ?
- Top:
- Middle:
- Bottom:
Probability
   of a mine ?
- Top: 33%
- Middle:
- Bottom:
Probability
   of a mine ?
- Top: 33%
- Middle: 33%
- Bottom:
Probability
   of a mine ?
- Top: 33%
- Middle: 33%
- Bottom: 33%
Probability
    of a mine ?
- Top: 33%
- Middle: 33%
- Bottom: 33%

==> so all moves
    equivalent ?
Probability
    of a mine ?
- Top: 33%
- Middle: 33%
- Bottom: 33%

==> so all moves
    equivalent ?
==> NOOOOO!!!
Probability
    of a mine ?
- Top: 33%
- Middle: 33%
- Bottom: 33%

Top or bottom:
  66% of win!

Middle: 33%!
The myopic
(one-step ahead)
 approach plays
   randomly.

 The middle is a
   bad move!

 Even with same
  proba of mine,
 some moves are
better than others!
State of the art:
- solved in 4x4
- NP-complete
- Constraint Satisfaction Problem approach:
    = Find the location which is less likely
        to be a mine, play there.
  ==> 80% success “beginner” (9x9, 10 mines)
  ==> 45% success “intermediate” (16x16, 40
                                           mines)
  ==> 34% success “expert” (30x40, 99 mines)
1. Rules of MineSweeper

       2. State of the art

     3. The CSP approach
(and other old known methods)

     4. The UCT approach

   5. The best of both worlds
- Exact MDP: very expensive. 4x4 solved.
- Single Point Strategy (SPS): simple local solving
- CSP (constraint satisf. problem): the main approach.
    - (unknown) state:
          x(i) = 1 if there is a mine at location i
    - each visible location is a constraint:
           If location 15 is 4, then the constraint is
           x(04)+x(05)+x(06)
          +x(14)+         x(16)
          +x(24)+x(25)+x(26) = 4.
    - find all solutions x1, x2, x3,...,xN
    - P(mine in j) = (sumi Xij ) / N <== this is math. proved!
    - play j such that P(mine in j) minimal
    - if several such j, randomly break ties.

                MDP= Markov Decision Process
              CSP = Constraint Satisfaction Problem
CSP as modified by Legendre et al, 2012:

   - (unknown) state:
         x(i) = 1 if there is a mine at location i
   - each visible location is a constraint:
          If location 15 is 4, then the constraint is
          x(04)+x(05)+x(06)
         +x(14)+         x(16)
         +x(24)+x(25)+x(26) = 4.
   - find all solutions x1, x2, x3,...,xN
   - P(mine in j) = (sumi Xij ) / N <== this is math. proved!
   - play j such that P(mine in j) minimal
   - if several such j, choose one “closest to the frontier”
                        (proposed by Legendre et al)
   - if several such j, randomly break ties.
CSP
- is very fast
- but it's not optimal
- because of




Here CSP plays randomly!
Also for the initial move: don't play
 randomly the first move!   (sometimes opening book)
1. Rules of MineSweeper

    2. State of the art

  3. The CSP approach

 4. The UCT approach

5. The best of both worlds
Why not UCT ?
- looks like a stupid idea at first view
- can not compete with CSP in terms of speed
- But at least UCT is
  consistent: if given
  sufficient
  time, it will play
  optimally.
- Tested in Couetoux
  and Teytaud, 2011
UCT (Upper Confidence Trees)




Coulom (06)
Chaslot, Saito & Bouzy (06)
Kocsis Szepesvari (06)
UCT
UCT
UCT
UCT
UCT
      Kocsis & Szepesvari (06)
Exploitation ...
Exploitation ...
            SCORE =
                5/7
             + k.sqrt( log(10)/7 )
Exploitation ...
            SCORE =
                5/7
             + k.sqrt( log(10)/7 )
Exploitation ...
            SCORE =
                5/7
             + k.sqrt( log(10)/7 )
... or exploration ?
              SCORE =
                  0/2
               + k.sqrt( log(10)/2 )
UCT in one slide
UCT in one slide



            C SP by
     se the al 2012
We u re et
      d
Legen expansion
   for      ulation
                   .
   a nd sim
Applying UCT here ?
•   Might look like a hammer for a
    drosophilia
•   But in many cases CSP is suboptimal
•   We have seen an example of suboptimal
    move by CSP a few slides ago
•   Let's see two additional examples
An example showing that the initial
move matters (UCT finds it, not CSP)..

                              3x3, 7 mines:
                            the optimal move
                       is anything but the center.
                      Optimal winning rate: 25%.
                        Optimal winning rate if
                           random uniform
                         initial move: 17/72.

                           (yes we get 1/72
                            improvement!)
Second such example:
       15 mines on 5x5 board with
                GnoMine rule
      (i.e. initial move is a 0, i.e. no
        mine in the neighborhood)
           Optimal success rate = 100%!!!!!
Play the center, and you win (well, you have to work...)
      The myopic CSP approach does not find it.
1. Rules of MineSweeper

    2. State of the art

  3. The CSP approach

  4. The UCT approach

5. The best of both worlds
Summary
I have two approaches:
•   CSP:

     •     Fast

     •     Suboptimal (myopic, only 1-step ahead)

•   UCT:

     •     needs a generative model (probability of
           next states, given my action),

     •     Asymptotically optimal
The best of both worlds ?

•   CSP:

     •     Fast

     •     Suboptimal (myopic, only 1-step ahead)

•   UCT:

     •     needs a generative model by CSP,

     •     Asymptotically optimal
What do I need for implementing UCT ?
A complete generative model.
Given a state and an action,
I must be able to simulate possible transitions.
State S, Action a:
(S,a) ==> S'
Example: given the state below, and the action “top left”, what
are the possible next states ?
What do I need for implementing UCT ?

A complete generative model.
Given a state and an action,
I must be able to simulate possible transitions.

State S, Action a:
(S,a) ==> S'

Example: given the state below, and the action “top left”, what are the possible next
states ?
What do I need for implementing UCT ?

A complete generative model.
Given a state and an action,
I must be able to simulate possible transitions.

State S, Action a:
(S,a) ==> S'

Example: given the state below, and the action “top left”, what are the possible next
states ?
What do I need for implementing UCT ?

A complete generative model.
Given a state and an action,
I must be able to simulate possible transitions.

State S, Action a:
(S,a) ==> S'

Example: given the state below, and the action “top left”, what are the possible next
states ?
What do I need for implementing UCT ?

A complete generative model.
Given a state and an action,
I must be able to simulate possible transitions.

State S, Action a:
(S,a) ==> S'

Example: given the state below, and the action “top left”, what are the possible next
states ?
We published a version of UCT
       for MineSweeper in which this was
What do I need for implementing UCT ?


                        implemented using
A complete generative model.
Given a state and an action,
                 the rejection method only.
I must be able to simulate possible transitions.

State S, Action a:
(S,a) ==> S'

Example: given the state below, and the action “top left”, what are the possible next
states ?
Rejection algorithm:
      1- randomly draw the mines
What do I need for implementing UCT ?


Given 2- if and an action, return the new observation
       a state it's ok,
A complete generative model.


      3- otherwise, go back to 1.
I must be able to simulate possible transitions.

State S, Action a:
(S,a) ==> S'

Example: given the state below, and the action “top left”, what are the possible next
states ?
It is mathematically ok, but it is too slow.
Then,need for used a UCT ? CSP implementation.
What do I
            we implementing weak
A complete generative model.
Given a state and an action,
                               Still too slow.
Now a reasonably fast implementation, with
I must be able to simulate possible transitions.

State S, Action a:
(S,a) ==> S'
                    Legendre et al heuristic.
Example: given the state below, and the action “top left”, what are the possible next
states ?
EXPERIMENTAL RESULTS




                            Huge
                         computation
10 000 UCT-simulations      time
       per move                      Our results
                                  (total = a few days)
CONCLUSIONS: a
       methodology for sequential
           decision making

- When you have a myopic solver
  (i.e. which neglects long term
  effects, as too often in industry!)
     ==> improve it with heuristics (as
            Legendre et al)
     ==> combine with UCT (as we did)
     ==> significant improvements

- We have similar experiments on
   industrial testbeds
Thanks for your
attention!

    9 Mines.
  What is the
optimal move ?

Contenu connexe

Tendances

AtCoder Beginner Contest 013 解説
AtCoder Beginner Contest 013 解説AtCoder Beginner Contest 013 解説
AtCoder Beginner Contest 013 解説AtCoder Inc.
 
AtCoder Beginner Contest 030 解説
AtCoder Beginner Contest 030 解説AtCoder Beginner Contest 030 解説
AtCoder Beginner Contest 030 解説AtCoder Inc.
 
0章 Linuxカーネルを読む前に最低限知っておくべきこと
0章 Linuxカーネルを読む前に最低限知っておくべきこと0章 Linuxカーネルを読む前に最低限知っておくべきこと
0章 Linuxカーネルを読む前に最低限知っておくべきことmao999
 
AtCoder Beginner Contest 015 解説
AtCoder Beginner Contest 015 解説AtCoder Beginner Contest 015 解説
AtCoder Beginner Contest 015 解説AtCoder Inc.
 
x86とコンテキストスイッチ
x86とコンテキストスイッチx86とコンテキストスイッチ
x86とコンテキストスイッチMasami Ichikawa
 
色々なダイクストラ高速化
色々なダイクストラ高速化色々なダイクストラ高速化
色々なダイクストラ高速化yosupo
 
SystemC Tutorial
SystemC TutorialSystemC Tutorial
SystemC Tutorialkocha2012
 
iFunEngine: 30분 만에 게임 서버 만들기
iFunEngine: 30분 만에 게임 서버 만들기iFunEngine: 30분 만에 게임 서버 만들기
iFunEngine: 30분 만에 게임 서버 만들기iFunFactory Inc.
 
AtCoder Regular Contest 018 解説
AtCoder Regular Contest 018 解説AtCoder Regular Contest 018 解説
AtCoder Regular Contest 018 解説AtCoder Inc.
 
Pwning in c++ (basic)
Pwning in c++ (basic)Pwning in c++ (basic)
Pwning in c++ (basic)Angel Boy
 
Applied numerical methods lec5
Applied numerical methods lec5Applied numerical methods lec5
Applied numerical methods lec5Yasser Ahmed
 
CODE FESTIVAL 2015 解説
CODE FESTIVAL 2015 解説CODE FESTIVAL 2015 解説
CODE FESTIVAL 2015 解説AtCoder Inc.
 
GLSLによるシェーダーアートことはじめ
GLSLによるシェーダーアートことはじめGLSLによるシェーダーアートことはじめ
GLSLによるシェーダーアートことはじめYoichi Hirata
 

Tendances (20)

AtCoder Beginner Contest 013 解説
AtCoder Beginner Contest 013 解説AtCoder Beginner Contest 013 解説
AtCoder Beginner Contest 013 解説
 
AtCoder Beginner Contest 030 解説
AtCoder Beginner Contest 030 解説AtCoder Beginner Contest 030 解説
AtCoder Beginner Contest 030 解説
 
グラフネットワーク〜フロー&カット〜
グラフネットワーク〜フロー&カット〜グラフネットワーク〜フロー&カット〜
グラフネットワーク〜フロー&カット〜
 
0章 Linuxカーネルを読む前に最低限知っておくべきこと
0章 Linuxカーネルを読む前に最低限知っておくべきこと0章 Linuxカーネルを読む前に最低限知っておくべきこと
0章 Linuxカーネルを読む前に最低限知っておくべきこと
 
AtCoder Beginner Contest 015 解説
AtCoder Beginner Contest 015 解説AtCoder Beginner Contest 015 解説
AtCoder Beginner Contest 015 解説
 
optimal Ate pairing
optimal Ate pairingoptimal Ate pairing
optimal Ate pairing
 
x86とコンテキストスイッチ
x86とコンテキストスイッチx86とコンテキストスイッチ
x86とコンテキストスイッチ
 
色々なダイクストラ高速化
色々なダイクストラ高速化色々なダイクストラ高速化
色々なダイクストラ高速化
 
SystemC Tutorial
SystemC TutorialSystemC Tutorial
SystemC Tutorial
 
文字列アルゴリズム
文字列アルゴリズム文字列アルゴリズム
文字列アルゴリズム
 
iFunEngine: 30분 만에 게임 서버 만들기
iFunEngine: 30분 만에 게임 서버 만들기iFunEngine: 30분 만에 게임 서버 만들기
iFunEngine: 30분 만에 게임 서버 만들기
 
AtCoder Regular Contest 018 解説
AtCoder Regular Contest 018 解説AtCoder Regular Contest 018 解説
AtCoder Regular Contest 018 解説
 
Abc009
Abc009Abc009
Abc009
 
LLVM最適化のこつ
LLVM最適化のこつLLVM最適化のこつ
LLVM最適化のこつ
 
Pwning in c++ (basic)
Pwning in c++ (basic)Pwning in c++ (basic)
Pwning in c++ (basic)
 
Applied numerical methods lec5
Applied numerical methods lec5Applied numerical methods lec5
Applied numerical methods lec5
 
CODE FESTIVAL 2015 解説
CODE FESTIVAL 2015 解説CODE FESTIVAL 2015 解説
CODE FESTIVAL 2015 解説
 
双対性
双対性双対性
双対性
 
abc031
abc031abc031
abc031
 
GLSLによるシェーダーアートことはじめ
GLSLによるシェーダーアートことはじめGLSLによるシェーダーアートことはじめ
GLSLによるシェーダーアートことはじめ
 

Similaire à Combining UCT and Constraint Satisfaction Problems for Minesweeper

Simulation-based optimization: Upper Confidence Tree and Direct Policy Search
Simulation-based optimization: Upper Confidence Tree and Direct Policy SearchSimulation-based optimization: Upper Confidence Tree and Direct Policy Search
Simulation-based optimization: Upper Confidence Tree and Direct Policy SearchOlivier Teytaud
 
Monte Carlo Tree Search in 2014 (MCMC days in Marseille)
Monte Carlo Tree Search in 2014 (MCMC days in Marseille)Monte Carlo Tree Search in 2014 (MCMC days in Marseille)
Monte Carlo Tree Search in 2014 (MCMC days in Marseille)Olivier Teytaud
 
Simple regret bandit algorithms for unstructured noisy optimization
Simple regret bandit algorithms for unstructured noisy optimizationSimple regret bandit algorithms for unstructured noisy optimization
Simple regret bandit algorithms for unstructured noisy optimizationOlivier Teytaud
 
Complexity of planning and games with partial information
Complexity of planning and games with partial informationComplexity of planning and games with partial information
Complexity of planning and games with partial informationOlivier Teytaud
 
Games with partial information
Games with partial informationGames with partial information
Games with partial informationOlivier Teytaud
 
Intro to Reinforcement Learning
Intro to Reinforcement LearningIntro to Reinforcement Learning
Intro to Reinforcement LearningUtkarsh Garg
 
Silverdisappointing8 120924091642-phpapp01
Silverdisappointing8 120924091642-phpapp01Silverdisappointing8 120924091642-phpapp01
Silverdisappointing8 120924091642-phpapp01David Robles
 
Disappointing results & open problems in Monte-Carlo Tree Search
Disappointing results & open problems in Monte-Carlo Tree SearchDisappointing results & open problems in Monte-Carlo Tree Search
Disappointing results & open problems in Monte-Carlo Tree SearchOlivier Teytaud
 
Heuristic approach optimization
Heuristic  approach optimizationHeuristic  approach optimization
Heuristic approach optimizationAng Sovann
 
constructing_generic_algorithms__ben_deane__cppcon_2020.pdf
constructing_generic_algorithms__ben_deane__cppcon_2020.pdfconstructing_generic_algorithms__ben_deane__cppcon_2020.pdf
constructing_generic_algorithms__ben_deane__cppcon_2020.pdfSayanSamanta39
 
Search-Beyond-Classical-no-exercise-answers.pdf
Search-Beyond-Classical-no-exercise-answers.pdfSearch-Beyond-Classical-no-exercise-answers.pdf
Search-Beyond-Classical-no-exercise-answers.pdfMrRRThirrunavukkaras
 
AIMA_ch3_L2-complement.ppt kjekfkjekjfkjefkjefkjek
AIMA_ch3_L2-complement.ppt kjekfkjekjfkjefkjefkjekAIMA_ch3_L2-complement.ppt kjekfkjekjfkjefkjefkjek
AIMA_ch3_L2-complement.ppt kjekfkjekjfkjefkjefkjekpavan402055
 
Knights tour on chessboard using backtracking
Knights tour on chessboard using backtrackingKnights tour on chessboard using backtracking
Knights tour on chessboard using backtrackingAbhishek Singh
 
Pso kota baru parahyangan 2017
Pso kota baru parahyangan 2017Pso kota baru parahyangan 2017
Pso kota baru parahyangan 2017Iwan Sofana
 
BeyondClassicalSearch.ppt
BeyondClassicalSearch.pptBeyondClassicalSearch.ppt
BeyondClassicalSearch.pptGauravWani20
 

Similaire à Combining UCT and Constraint Satisfaction Problems for Minesweeper (20)

Simulation-based optimization: Upper Confidence Tree and Direct Policy Search
Simulation-based optimization: Upper Confidence Tree and Direct Policy SearchSimulation-based optimization: Upper Confidence Tree and Direct Policy Search
Simulation-based optimization: Upper Confidence Tree and Direct Policy Search
 
Monte Carlo Tree Search in 2014 (MCMC days in Marseille)
Monte Carlo Tree Search in 2014 (MCMC days in Marseille)Monte Carlo Tree Search in 2014 (MCMC days in Marseille)
Monte Carlo Tree Search in 2014 (MCMC days in Marseille)
 
Simple regret bandit algorithms for unstructured noisy optimization
Simple regret bandit algorithms for unstructured noisy optimizationSimple regret bandit algorithms for unstructured noisy optimization
Simple regret bandit algorithms for unstructured noisy optimization
 
Complexity of planning and games with partial information
Complexity of planning and games with partial informationComplexity of planning and games with partial information
Complexity of planning and games with partial information
 
Lecture 7
Lecture 7Lecture 7
Lecture 7
 
Games with partial information
Games with partial informationGames with partial information
Games with partial information
 
Ucb
UcbUcb
Ucb
 
Intro to Reinforcement Learning
Intro to Reinforcement LearningIntro to Reinforcement Learning
Intro to Reinforcement Learning
 
Silverdisappointing8 120924091642-phpapp01
Silverdisappointing8 120924091642-phpapp01Silverdisappointing8 120924091642-phpapp01
Silverdisappointing8 120924091642-phpapp01
 
Disappointing results & open problems in Monte-Carlo Tree Search
Disappointing results & open problems in Monte-Carlo Tree SearchDisappointing results & open problems in Monte-Carlo Tree Search
Disappointing results & open problems in Monte-Carlo Tree Search
 
Heuristic approach optimization
Heuristic  approach optimizationHeuristic  approach optimization
Heuristic approach optimization
 
constructing_generic_algorithms__ben_deane__cppcon_2020.pdf
constructing_generic_algorithms__ben_deane__cppcon_2020.pdfconstructing_generic_algorithms__ben_deane__cppcon_2020.pdf
constructing_generic_algorithms__ben_deane__cppcon_2020.pdf
 
Search-Beyond-Classical-no-exercise-answers.pdf
Search-Beyond-Classical-no-exercise-answers.pdfSearch-Beyond-Classical-no-exercise-answers.pdf
Search-Beyond-Classical-no-exercise-answers.pdf
 
simple
simplesimple
simple
 
simple
simplesimple
simple
 
AIMA_ch3_L2-complement.ppt kjekfkjekjfkjefkjefkjek
AIMA_ch3_L2-complement.ppt kjekfkjekjfkjefkjefkjekAIMA_ch3_L2-complement.ppt kjekfkjekjfkjefkjefkjek
AIMA_ch3_L2-complement.ppt kjekfkjekjfkjefkjefkjek
 
Knights tour on chessboard using backtracking
Knights tour on chessboard using backtrackingKnights tour on chessboard using backtracking
Knights tour on chessboard using backtracking
 
Nature-inspired algorithms
Nature-inspired algorithmsNature-inspired algorithms
Nature-inspired algorithms
 
Pso kota baru parahyangan 2017
Pso kota baru parahyangan 2017Pso kota baru parahyangan 2017
Pso kota baru parahyangan 2017
 
BeyondClassicalSearch.ppt
BeyondClassicalSearch.pptBeyondClassicalSearch.ppt
BeyondClassicalSearch.ppt
 

Dernier

Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 

Dernier (20)

Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 

Combining UCT and Constraint Satisfaction Problems for Minesweeper

  • 1. Optimistic Heuristics & Application to MineSweeper O. Buffet, W. Lin, O. Teytaud
  • 2. A great challenge: MineSweeper. - looks easy - in fact, not easy: many myopic (one- step-ahead) approaches. - partially observable
  • 3. 1. Rules of MineSweeper 2. State of the art 3. The CSP approach 4. The UCT approach 5. The best of both worlds
  • 4. RULES At the beginning, all locations are Covered (unkwown).
  • 6. Good news! No mine in the neighborhood! I can “click” all the neighbours.
  • 7. I have 3 uncovered neighbors, and I have 3 mines in the neighborhood ==> 3 flags!
  • 8.
  • 9. I know it's a mine, so I put a flag!
  • 11. I play here and I lose...
  • 12. The most successful game ever! Who in this room never played Mine- Sweeper ?
  • 13. 1. Rules of MineSweeper 2. State of the art 3. The CSP approach 4. The UCT approach 5. The best of both worlds
  • 14. Do you think it's easy ? (10 mines) MineSweeper is not simple.
  • 16. What is the optimal move ? Remark: the question makes sense, without Knowing the history. You don't need the history for playing optimaly. ==> (this fact is mathematically non trivial!)
  • 17. What is the optimal move ? This one is easy. Both remaining locations win with proba 50%.
  • 18. More difficult! Which move is optimal ? Here, the classical approach (CSP) is wrong.
  • 19. Probability of a mine ? - Top: - Middle: - Bottom:
  • 20. Probability of a mine ? - Top: 33% - Middle: - Bottom:
  • 21. Probability of a mine ? - Top: 33% - Middle: 33% - Bottom:
  • 22. Probability of a mine ? - Top: 33% - Middle: 33% - Bottom: 33%
  • 23. Probability of a mine ? - Top: 33% - Middle: 33% - Bottom: 33% ==> so all moves equivalent ?
  • 24. Probability of a mine ? - Top: 33% - Middle: 33% - Bottom: 33% ==> so all moves equivalent ? ==> NOOOOO!!!
  • 25. Probability of a mine ? - Top: 33% - Middle: 33% - Bottom: 33% Top or bottom: 66% of win! Middle: 33%!
  • 26. The myopic (one-step ahead) approach plays randomly. The middle is a bad move! Even with same proba of mine, some moves are better than others!
  • 27. State of the art: - solved in 4x4 - NP-complete - Constraint Satisfaction Problem approach: = Find the location which is less likely to be a mine, play there. ==> 80% success “beginner” (9x9, 10 mines) ==> 45% success “intermediate” (16x16, 40 mines) ==> 34% success “expert” (30x40, 99 mines)
  • 28. 1. Rules of MineSweeper 2. State of the art 3. The CSP approach (and other old known methods) 4. The UCT approach 5. The best of both worlds
  • 29. - Exact MDP: very expensive. 4x4 solved. - Single Point Strategy (SPS): simple local solving - CSP (constraint satisf. problem): the main approach. - (unknown) state: x(i) = 1 if there is a mine at location i - each visible location is a constraint: If location 15 is 4, then the constraint is x(04)+x(05)+x(06) +x(14)+ x(16) +x(24)+x(25)+x(26) = 4. - find all solutions x1, x2, x3,...,xN - P(mine in j) = (sumi Xij ) / N <== this is math. proved! - play j such that P(mine in j) minimal - if several such j, randomly break ties. MDP= Markov Decision Process CSP = Constraint Satisfaction Problem
  • 30. CSP as modified by Legendre et al, 2012: - (unknown) state: x(i) = 1 if there is a mine at location i - each visible location is a constraint: If location 15 is 4, then the constraint is x(04)+x(05)+x(06) +x(14)+ x(16) +x(24)+x(25)+x(26) = 4. - find all solutions x1, x2, x3,...,xN - P(mine in j) = (sumi Xij ) / N <== this is math. proved! - play j such that P(mine in j) minimal - if several such j, choose one “closest to the frontier” (proposed by Legendre et al) - if several such j, randomly break ties.
  • 31. CSP - is very fast - but it's not optimal - because of Here CSP plays randomly! Also for the initial move: don't play randomly the first move! (sometimes opening book)
  • 32. 1. Rules of MineSweeper 2. State of the art 3. The CSP approach 4. The UCT approach 5. The best of both worlds
  • 33. Why not UCT ? - looks like a stupid idea at first view - can not compete with CSP in terms of speed - But at least UCT is consistent: if given sufficient time, it will play optimally. - Tested in Couetoux and Teytaud, 2011
  • 34. UCT (Upper Confidence Trees) Coulom (06) Chaslot, Saito & Bouzy (06) Kocsis Szepesvari (06)
  • 35. UCT
  • 36. UCT
  • 37. UCT
  • 38. UCT
  • 39. UCT Kocsis & Szepesvari (06)
  • 41. Exploitation ... SCORE = 5/7 + k.sqrt( log(10)/7 )
  • 42. Exploitation ... SCORE = 5/7 + k.sqrt( log(10)/7 )
  • 43. Exploitation ... SCORE = 5/7 + k.sqrt( log(10)/7 )
  • 44. ... or exploration ? SCORE = 0/2 + k.sqrt( log(10)/2 )
  • 45. UCT in one slide
  • 46. UCT in one slide C SP by se the al 2012 We u re et d Legen expansion for ulation . a nd sim
  • 47. Applying UCT here ? • Might look like a hammer for a drosophilia • But in many cases CSP is suboptimal • We have seen an example of suboptimal move by CSP a few slides ago • Let's see two additional examples
  • 48. An example showing that the initial move matters (UCT finds it, not CSP).. 3x3, 7 mines: the optimal move is anything but the center. Optimal winning rate: 25%. Optimal winning rate if random uniform initial move: 17/72. (yes we get 1/72 improvement!)
  • 49. Second such example: 15 mines on 5x5 board with GnoMine rule (i.e. initial move is a 0, i.e. no mine in the neighborhood) Optimal success rate = 100%!!!!! Play the center, and you win (well, you have to work...) The myopic CSP approach does not find it.
  • 50. 1. Rules of MineSweeper 2. State of the art 3. The CSP approach 4. The UCT approach 5. The best of both worlds
  • 51. Summary I have two approaches: • CSP: • Fast • Suboptimal (myopic, only 1-step ahead) • UCT: • needs a generative model (probability of next states, given my action), • Asymptotically optimal
  • 52. The best of both worlds ? • CSP: • Fast • Suboptimal (myopic, only 1-step ahead) • UCT: • needs a generative model by CSP, • Asymptotically optimal
  • 53. What do I need for implementing UCT ? A complete generative model. Given a state and an action, I must be able to simulate possible transitions. State S, Action a: (S,a) ==> S' Example: given the state below, and the action “top left”, what are the possible next states ?
  • 54. What do I need for implementing UCT ? A complete generative model. Given a state and an action, I must be able to simulate possible transitions. State S, Action a: (S,a) ==> S' Example: given the state below, and the action “top left”, what are the possible next states ?
  • 55. What do I need for implementing UCT ? A complete generative model. Given a state and an action, I must be able to simulate possible transitions. State S, Action a: (S,a) ==> S' Example: given the state below, and the action “top left”, what are the possible next states ?
  • 56. What do I need for implementing UCT ? A complete generative model. Given a state and an action, I must be able to simulate possible transitions. State S, Action a: (S,a) ==> S' Example: given the state below, and the action “top left”, what are the possible next states ?
  • 57. What do I need for implementing UCT ? A complete generative model. Given a state and an action, I must be able to simulate possible transitions. State S, Action a: (S,a) ==> S' Example: given the state below, and the action “top left”, what are the possible next states ?
  • 58. We published a version of UCT for MineSweeper in which this was What do I need for implementing UCT ? implemented using A complete generative model. Given a state and an action, the rejection method only. I must be able to simulate possible transitions. State S, Action a: (S,a) ==> S' Example: given the state below, and the action “top left”, what are the possible next states ?
  • 59. Rejection algorithm: 1- randomly draw the mines What do I need for implementing UCT ? Given 2- if and an action, return the new observation a state it's ok, A complete generative model. 3- otherwise, go back to 1. I must be able to simulate possible transitions. State S, Action a: (S,a) ==> S' Example: given the state below, and the action “top left”, what are the possible next states ?
  • 60. It is mathematically ok, but it is too slow. Then,need for used a UCT ? CSP implementation. What do I we implementing weak A complete generative model. Given a state and an action, Still too slow. Now a reasonably fast implementation, with I must be able to simulate possible transitions. State S, Action a: (S,a) ==> S' Legendre et al heuristic. Example: given the state below, and the action “top left”, what are the possible next states ?
  • 61. EXPERIMENTAL RESULTS Huge computation 10 000 UCT-simulations time per move Our results (total = a few days)
  • 62. CONCLUSIONS: a methodology for sequential decision making - When you have a myopic solver (i.e. which neglects long term effects, as too often in industry!) ==> improve it with heuristics (as Legendre et al) ==> combine with UCT (as we did) ==> significant improvements - We have similar experiments on industrial testbeds
  • 63. Thanks for your attention! 9 Mines. What is the optimal move ?