2. Game Theory,
Optimal Decisions in Games,
Heuristic Alpha–Beta Tree Search,
Monte Carlo Tree Search,
Stochastic Games, Partially Observable
Games,
Limitations of Game Search Algorithms,
Constraint Satisfaction Problems (CSP),
Constraint Propagation: Inference in CSPs,
Backtracking Search for CSPs.
3. Many applications for AI
Computer vision, natural language processing,
speech recognition, search …
But games are some of the more interesting
Opponents that are challenging, or allies that
are helpful
Unit that is credited with acting on own
Human-level intelligence too hard
But under narrow circumstances can do pretty well
(ex: chess and Deep Blue)
For many games, often constrained (by game rules)
4. we cover competitive environments, in
which the agents’
goals are in conflict, giving rise GAME to
adversarial search problems—often known
as games.
5. MinMax the heart of almost every computer board
game
Applies to games where:
Players take turns
Have perfect information
Chess, Checkers, Tactics
But can work for games without perfect information
or chance
Poker, Monopoly, Dice
Can work in real-time (ie- not turn based) with timer
(iterative deepening, later)
6. Search tree
Squares represent decision states (ie- after a move)
Branches are decisions (ie- the move)
Start at root
Nodes at end are leaf nodes
Ex: Tic-Tac-Toe (symmetrical positions removed)
• Unlike binary trees can have any number of children
– Depends on the game situation
• Levels usually called plies (a ply is one level)
– Each ply is where "turn" switches to other player
• Players called Min and Max (next)
7. Named MinMax because of algorithm behind data
structure
Assign points to the outcome of a game
Ex: Tic-Tac-Toe: X wins, value of 1. O wins, value -
1.
Max (X) tries to maximize point value, while Min
(O) tries to minimize point value
Assume both players play to best of their ability
Always make a move to minimize or maximize
points
So, in choosing, Max will choose best move to
get highest points, assuming Min will choose best
move to get lowest points
Click to add text
8. With full tree, can determine best possible move
However, full tree impossible for some games! Ex: Chess
At a given time, chess has ~ 35 legal moves. Exponential
growth:
35 at one ply, 352 = 1225 at two plies … 356 = 2 billion and 3510 = 2
quadrillion
Games can last 40 moves (or more), so 3540 … Stars in universe:
~ 228
For large games (Chess) can’t see end of the game. Must estimate
winning or losing from top portion
Evaluate() function to guess end given board
A numeric value, much smaller than victory (ie- Checkmate for
Max will be one million, for Min minus one million)
So, computer’s strength at chess comes from:
How deep can search
How well can evaluate a board position
(In some sense, like a human – a chess grand master can
evaluate board better and can look further ahead)
9. How do we search this tree to find the optimal move?
10. Search – no adversary
Solution is (heuristic) method for finding goal
Heuristics and CSP techniques can find optimal solution
Evaluation function: estimate of cost from start to goal through given node
Examples: path planning, scheduling activities
Games – adversary
Solution is strategy
strategy specifies move for every possible opponent reply.
Time limits force an approximate solution
Evaluation function: evaluate “goodness” of game position
Examples: chess, checkers, Othello, backgammon
11. Two players: MAX and MIN
MAX moves first and they take turns until the game is over
Winner gets reward, loser gets penalty.
“Zero sum” means the sum of the reward and the penalty is a constant.
Formal definition as a search problem:
Initial state: Set-up specified by the rules, e.g., initial board configuration
of chess.
Player(s): Defines which player has the move in a state.
Actions(s): Returns the set of legal moves in a state.
Result(s,a): Transition model defines the result of a move.
(2nd ed.: Successor function: list of (move,state) pairs specifying legal
moves.)
Terminal-Test(s): Is the game finished? True if finished, false otherwise.
Utility function(s,p): Gives numerical value of terminal state s for player p.
E.g., win (+1), lose (-1), and draw (0) in tic-tac-toe.
E.g., win (+1), lose (0), and draw (1/2) in chess.
MAX uses search tree to determine next move.
12. Designed to find the optimal strategy for Max and find
best move:
1. Generate the whole game tree, down to the
leaves.
2. Apply utility (payoff) function to each leaf.
3. Back-up values from leaves through branch nodes:
a Max node computes the Max of its child values
a Min node computes the Min of its child values
4. At root: choose the move leading to the child of
highest value.
16. Mini-max algorithm is a recursive or backtracking algorithm
which is used in decision-making and game theory.
It provides an optimal move for the player assuming that
opponent is also playing optimally.
Mini-Max algorithm uses recursion to search through the
game-tree.
Min-Max algorithm is mostly used for game playing in AI.
Such as Chess, Checkers, tic-tac-toe, go, and various two-
players game. This Algorithm computes the minimax decision
for the current state.
17. In this algorithm two players play the game, one is called MAX
and other is called MIN.
Both the players fight it as the opponent player gets the
minimum benefit while they get the maximum benefit.
Both Players of the game are opponent of each other, where MAX
will select the maximized value and MIN will select the
minimized value.
The minimax algorithm performs a depth-first search algorithm
for the exploration of the complete game tree.
The minimax algorithm proceeds all the way down to the
terminal node of the tree, then backtrack the tree as the
recursion.
18. function MINIMAX-DECISION(state) returns an action
return argmax
a ∈ ACTIONS(s) MIN-VALUE(RESULT(state, a))
function MAX-VALUE(state) returns a utility value
if TERMINAL-TEST(state) then return UTILITY(state)
v ←−∞
for each a in ACTIONS(state) do
v ←MAX(v, MIN-VALUE(RESULT(s, a)))
return v
function MIN-VALUE(state) returns a utility value
if TERMINAL-TEST(state) then return UTILITY(state)
v←∞
for each a in ACTIONS(state) do
v ←MIN(v, MAX-VALUE(RESULT(s, a)))
return v
19. The main drawback of the minimax algorithm is that
it gets really slow for complex games such as Chess,
go, etc.
This type of games has a huge branching factor, and
the player has lots of choices to decide.
This limitation of the minimax algorithm can be
improved from alpha-beta pruning
20.
21.
22. •Alpha-beta pruning is a modified version of the
minimax algorithm.
• It is an optimization technique for the minimax
algorithm.
•game tree we can compute the correct minimax
decision, and this technique is called pruning.
23. The two-parameter can be defined as:
Alpha: The best (highest-value) choice we
have found so far at any point along the
path of Maximizer. The initial value of
alpha is -∞.
Beta: The best (lowest-value) choice we
have found so far at any point along the
path of Minimizer. The initial value of beta
is +∞.
24. Depth first search
only considers nodes along a single path from root at any time
a = highest-value choice found at any choice point of path for MAX
(initially, a = −infinity)
b = lowest-value choice found at any choice point of path for MIN
(initially, b = +infinity)
Pass current values of a and b down to child nodes during
search.
Update values of a and b during search:
MAX updates a at MAX nodes
MIN updates b at MIN nodes
Prune remaining branches at a node when a ≥ b
25. Prune whenever a ≥ b.
Prune below a Max node whose alpha value becomes greater
than or equal to the beta value of its ancestors.
Max nodes update alpha based on children’s returned
values.
Prune below a Min node whose beta value becomes less than or
equal to the alpha value of its ancestors.
Min nodes update beta based on children’s returned values.
26. a, b, initial values
Do DF-search until first leaf
a=−
b =+
a=−
b =+
a, b, passed to kids
39. Worst-Case
branches are ordered so that no pruning takes place. In this case
alpha-beta gives no improvement over exhaustive search
Best-Case
each player’s best move is the left-most child (i.e., evaluated first)
in practice, performance is closer to best rather than worst-case
E.g., sort moves by the remembered move values found last time.
E.g., expand captures first, then threats, then forward moves, etc.
E.g., run Iterative Deepening search, sort by value last iteration.
In practice often get O(b(d/2)) rather than O(bd)
this is the same as having a branching factor of sqrt(b),
(sqrt(b))d = b(d/2),i.e., we effectively go from b to square root of b
e.g., in chess go from b ~ 35 to b ~ 6
this permits much deeper search in the same amount of time
40. Pruning does not affect final results
Entire subtrees can be pruned.
Good move ordering improves effectiveness of pruning
Repeated states are again possible.
Store them in memory = transposition table
41.
42. Monte Carlo Tree Search (MCTS) is a search
technique in the field of Artificial Intelligence
(AI).
It is a probabilistic and heuristic driven search
algorithm that combines the classic tree
search implementations alongside machine
learning principles of reinforcement learning.
43. MCTS algorithm becomes useful as it
continues to evaluate other alternatives
periodically during the learning phase by
executing them, instead of the current
perceived optimal strategy. This is known as
the ” exploration-exploitation trade-off “.
Search can be broken down into four distinct steps, viz.,
1. selection,
2.expansion,
3.simulation and
4. backpropagation.
44.
45. •the MCTS algorithm traverses the current
tree from the root node using a specific
strategy.
•The strategy uses an evaluation function to
optimally select nodes with the highest
estimated value.
•MCTS uses the Upper Confidence Bound
(UCB) formula applied to trees as the
strategy in the selection process to traverse
the tree.
46. where;
Si = value of a node i
xi = empirical mean of a node i
C = a constant
t = total number of simulations
When traversing a tree during the selection
process, the child node that returns the greatest
value from the above equation will be one that
will get selected.
47. Expansion: In this process, a new child node is
added to the tree to that node which was optimally
reached during the selection process.
Simulation: In this process, a simulation is
performed by choosing moves or strategies until a
result or predefined state is achieved.
Backpropagation: After determining the value of
the newly added node, the remaining tree must be
updated. So, the backpropagation process is
performed, where it backpropagates from the new
node to the root node.
48.
49.
50. These types of algorithms are particularly useful
in turn based games where there is no element of
chance in the game mechanics, such as Tic Tac
Toe, Connect 4, Checkers, Chess, Go, etc.
51.
52.
53. Many games mirror this unpredictability by
including a random element, such as the
throwing of dice. We call these stochastic
games.
Backgammon is a typical game that combines luck and skill.
Dice are rolled at the beginning of a player’s turn to determine
the legal moves.
In the backgammon ,for example, White has rolled a 6–5 and has
four possible moves.
P(1,1)=1/36 (36 are ways to roll two dice.)
15 distinct roll each have 1/18 probability
54.
55.
56. Checkers:
Chinook ended 40-year-reign of human world champion Marion
Tinsley in 1994.
Chess:
Deep Blue defeated human world champion Garry Kasparov in a
six-game match in 1997.
Othello:
human champions refuse to compete against computers: they are
too good.
Go:
human champions refuse to compete against computers: they are
too bad
b > 300 (!)
See (e.g.) http://www.cs.ualberta.ca/~games/ for more information
57.
58. 1957: Herbert Simon
“within 10 years a computer will beat the world chess
champion”
1997: Deep Blue beats Kasparov
Parallel machine with 30 processors for “software” and 480 VLSI
processors for “hardware search”
Searched 126 million nodes per second on average
Generated up to 30 billion positions per move
Reached depth 14 routinely
Uses iterative-deepening alpha-beta search with transpositioning
Can explore beyond depth-limit for interesting moves
59.
60. Many problems in AI can be considered as problems
of constraint satisfaction, in which the goal state
satisfies a given set of constraint.
constraint satisfaction problems can be solved by
using any of the search strategies.
A constraint satisfaction problem (CSP) is
a problem that requires its solution to be within
some limitations or conditions, also known
as constraints, consisting of a finite variable set, a
domain set and a finite constraint set. ... The
optimal solution should satisfy all constraints.
61. 63
Variables WA, NT, Q, NSW, V, SA, T
Domains Di = {red,green,blue}
Constraints: adjacent regions must have different colors
e.g., WA ≠ NT
62. 64
Solutions are complete and consistent
assignments, e.g., WA = red, NT = green,Q =
red,NSW = green,V = red,SA = blue,T = green
63. 65
Binary CSP: each constraint relates two variables
Constraint graph: nodes are variables, arcs are
constraints
68. 70
General-purpose methods can give huge
gains in speed:
Which variable should be assigned next?
In what order should its values be tried?
Can we detect inevitable failure early?
69. 71
Most constrained variable:
choose the variable with the fewest legal values
a.k.a. minimum remaining values (MRV)
heuristic
Picks a variable which will cause failure as
soon as possible, allowing the tree to be
pruned.
70. 72
Tie-breaker among most constrained
variables
Most constraining variable:
choose the variable with the most constraints on
remaining variables (most edges in graph)
71. 73
Given a variable, choose the least
constraining value:
the one that rules out the fewest values in the
remaining variables
Leaves maximal flexibility for a solution.
Combining these heuristics makes 1000
queens feasible
72. 74
Idea:
Keep track of remaining legal values for unassigned
variables
Terminate search when any variable has no legal values
73. 75
Idea:
Keep track of remaining legal values for unassigned
variables
Terminate search when any variable has no legal values
74. 76
Idea:
Keep track of remaining legal values for unassigned
variables
Terminate search when any variable has no legal values
75. 77
Idea:
Keep track of remaining legal values for unassigned
variables
Terminate search when any variable has no legal values
76. 78
Forward checking propagates information from
assigned to unassigned variables, but doesn't
provide early detection for all failures:
NT and SA cannot both be blue!
Constraint propagation repeatedly enforces
constraints locally
77.
78. 80
Simplest form of propagation makes each arc
consistent
X Y is consistent iff
for every value x of X there is some allowed y
constraint propagation propagates arc consistency on the graph.
79. 81
Simplest form of propagation makes each arc
consistent
X Y is consistent iff
for every value x of X there is some allowed y
80. 82
Simplest form of propagation makes each arc
consistent
X Y is consistent iff
for every value x of X there is some allowed y
If X loses a value, neighbors of X need to be
rechecked
81. 83
Simplest form of propagation makes each arc consistent
X Y is consistent iff
for every value x of X there is some allowed y
If X loses a value, neighbors of X need to be rechecked
Arc consistency detects failure earlier than forward
checking
Can be run as a preprocessor or after each assignment
Time complexity: O(n2d3)
84. 86
Note: The path to the solution is unimportant, so
we can
apply local search!
To apply to CSPs:
allow states with unsatisfied constraints
operators reassign variable values
Variable selection: randomly select any
conflicted variable
Value selection by min-conflicts heuristic:
choose value that violates the fewest constraints
i.e., hill-climb with h(n) = total number of violated
constraints
85. Cryptarithmetic Problem is a type of
constraint satisfaction problem where the
game is about digits and its unique
replacement either with alphabets or other
symbols. In cryptarithmetic problem, the
digits (0-9) get substituted by some possible
alphabets or symbols.
86. The rules or constraints on a crypt arithmetic
problem are as follows:
There should be a unique digit to be replaced
with a unique alphabet.
The result should satisfy the predefined
arithmetic rules, i.e., 2+2 =4, nothing else.
Digits should be from 0-9 only.
There should be only one carry forward, while
performing the addition operation on a problem.
The problem can be solved from both sides,
i.e., lefthand side (L.H.S), or righthand side
(R.H.S)
87. Given a cryptarithmetic problem, i.e.,
Starting from the left hand side (L.H.S) , the terms
are S and M. Assign a digit which could give a
satisfactory result. Let’s assign S->9 and M->1.
88. Now, move ahead to the next
terms E and O to get N as its output
Adding E and O, which means 5+0=0, which is not possible
because according to cryptarithmetic constraints, we cannot assign the
same digit to two letters. So, we need to think more and assign some
other value.
89. Further, adding the next two
terms N and R we get,
But, we have already assigned E->5. Thus, the above result does not
satisfy the values
90. where 1 will be carry forward to the above
term
Let’s move ahead.
Again, on adding the last two terms, i.e., the
rightmost terms D and E, we get Y as its
result.
94. We decided to look at the value of O again.
If O = 0, then R would also be 0 so that doesn’t work
and O can’t be 1 because F = 1.
If O = 2,
TW2
+TW2
−−−−−−−
12UR
then R = 4 and T = 6 and we also know that W < 5
because there can’t be anything carried to the
hundreds column. The only possible value of W that
hasn’t already been used is 3 but this would mean
that U is 6 which is the same as T.
95. If O = 3,
TW3
+TW3
−−−−−−−
13UR
then R = 6 and T = 6 which doesn’t work.
96. If O = 4,
TW4
+TW4
−−−−−−−
14UR
then R = 8 and T = 7 and we also know that W < 5
because there can’t be anything carried to the
hundreds column. So W could be 0, 2 or 3.
W can’t be 0 because then U would be 0 and it
can’t be 2 because U would be 4.
If W = 3, U = 6 which works: 734 + 734 = 1468.
97. If O = 5,
TW5
+TW5−−−−−−−
15UR−−−−−−−
11
then R = 0 and T = 7 and we also know that W ≥ 5
because there has to be 1 carried to the
hundreds column.
W can’t be 5 because O = 5.
If W = 6, U = 3 which works: 765 + 765 = 1530.
98. So there are seven possible answers:
938+938=1876
928+928=1856
867+867=1734
846+846=1692
836+836=1672
765+765=1530
734+734=1468
99. Game playing is best modeled as a search problem
Game trees represent alternate computer/opponent moves
Evaluation functions estimate the quality of a given board
configuration for the Max player.
Minimax is a procedure which chooses moves by assuming that the
opponent will always choose the move which is best for them
Alpha-Beta is a procedure which can prune large parts of the search
tree and allow search to go deeper
For many well-known games, computer algorithms based on heuristic
search match or out-perform human world experts.
100. Comment on Backtracking and look ahead
strategies (forward)in constraint
satisfaction problems. [6]
Apply crypt arithmetic to solve the problem
and represent the state search space to solve
,TWO+TWO=FOUR (OCT2019)