Knowing When to Hold 'Em, When to Fold 'Em and When to Blow 'Em Up

Knowing When to Hold
‘em, When to Fold ‘em and
When to Blow ‘em Up

Luke Dicken

Firstly
• This image has been used to
advertise this talk.

2

Firstly
• This image has been used to
advertise this talk.
‣ To clarify, this is not me...

2

About Me
• Undergrad in AI and CS at Edinburgh
• MSc in Bio-Informatics also Edinburgh
• MRes in “Automated Planning for Autonomous
Systems” Strathclyde
• RA/PhD Student attached to “Strathclyde Planning
Group” and “Strathclyde AI in Games Group”
• Staff Writer for AIGameDev.com

3

Summary
• Intro to AI for Games
• AI for Poker
• AI for Ms Pac-Man
• The Integrated Inﬂuence Architecture
• The Future of AI in Games

4

What is AI?
• Any time a computer makes any sort of decision
between a number of options, it can be thought of
as acting “intelligently”.

5

What is AI?
• Any time a computer makes any sort of decision
between a number of options, it can be thought of
as acting “intelligently”.
• Whether or not those decisions are the right ones
is how “good” the intelligence is.

5

AI Applications
• Automatic Translation

6

AI Applications
• Statistical Analysis

6

AI Applications
• Optimising Resource Usage

6

AI Applications
• Scheduling Problems

6

AI Applications
• Automated Planning

6

AI Applications
• Image/Facial Recognition

6

AI Applications
• Image/Facial Recognition
• And many more...

6

Why Games?
• Games provide AI research with some interesting
properties

7

Why Games?
properties
‣ We ﬁnd predicting the behaviour of other agents (e.g.
players) difﬁcult.

7

Why Games?
properties
‣ We need to be able to automatically generate long-term
strategies to be able to win the game

7

Why Games?
properties
‣ Working with existing games, we’re not biasing our
results designing games that are suited to AI

7

Why Games?
properties
‣ Working with existing games, we’re not biasing our
results designing games that are suited to AI
‣ Why do extra work to make simulations and demos?
7

Summary
• AI for Poker

8

Limit Texas Hold ‘em
• Texas Hold ‘em is a popular variant of poker
• Players are dealt two cards each and share 5
communal cards.
• The “Limit” part caps the amount of betting to a
discrete amount - typically only 3 raises per round
of betting, with the value of each raise being
determined by the “level” of the table.

9

Why Limit?
• Limit makes life a lot easier for us as decision
makers - a raise is a raise is a raise.

11

Why Limit?
• At each decision point in the game, a player can call/
check, bet/raise or fold.

11

Why Limit?
• At each decision point in the game, a player can call/
check, bet/raise or fold.
• This means at the k th decision point, the space of
possible states is approximated by 3k.

11

Current
State

1st Player

Call Fold Raise

2nd Player

Call Fold Raise
12

Current
State

1st Player

Call Fold Raise

2nd Player

Call Fold Raise
13

StrathPoker
• StrathPoker is a system under development at
Strathclyde based around Opponent Modelling
(OM) and “Monte Carlo Simulation” (MC).
• Core idea is that if we can categorise players into
archetypes, we can predict their actions more
accurately, and push this prediction into the MC to
get a better understanding of how the game will go.

14

Categorising Players
• We have sample data of around a million hands of
poker taken from online poker sites.

15

• Each player is identiﬁed by a unique ID so we can
see what the individuals are up to.

15

• Each player is identiﬁed by a unique ID so we can
see what the individuals are up to.
• The data is not complete, players who fold do not
show their cards.

15

Players as Datapoints
• We do have a lot of info about the actions a player
took, and we can generate aggregate stats such as
how much a player has won (that we’ve seen).

16

• We can generate a data point representing a player
in about 30 dimensions.

16

• We can generate a data point representing a player
in about 30 dimensions.
• Contrast this with professional (human)
categorisation based on just 3 dimensions.

16

Human Categorisation
• Pro poker players use stats software to monitor the
ﬂow of the game and track individual players’
performance.

17

performance.
• Typically classify players based on three stats:

17

performance.
‣ VPiP - Voluntarily Put in Pot

17

performance.
‣ WSD - Percentage of show downs won

17

performance.
‣ PFR - Pre-Flop Raise

17

performance.
‣ PFR - Pre-Flop Raise
• Interface applies a HUD to online game with info.
17

Categorisation
• We throw as much data into our categorisation as
possible.

18

Categorisation
possible.
• Run “Principal Components Analysis” to ﬁnd a new
set of basis vectors for the data and compress ~30
to 8 dimensions with minimal loss.

18

Categorisation
possible.
• Run “Principal Components Analysis” to ﬁnd a new
set of basis vectors for the data and compress ~30
to 8 dimensions with minimal loss.
• Cluster datapoints in the 8 dimensions using Fuzzy
c-Means.

18

Using Categorisation
• Monte Carlo plays out simulations of the game to
get a feel for how good each of the possible
decisions now will end up in the future.

20

Using Categorisation
• Monte Carlo plays out simulations of the game to
get a feel for how good each of the possible
decisions now will end up in the future.
• Our approach attempts to get a more accurate
simulation by modelling how each player acts much
better.

20

Evaluation
• How do you evaluate an AI poker player?

21

Evaluation
• Problem is that random deals may or may not
favour our player.

21

Evaluation
favour our player.
• This is true for real players - how do they evaluate?

21

Evaluation
favour our player.
• This is true for real players - how do they evaluate?
• Trick lies in not looking at the individual games for
evaluation, but a total playing session.

21

Experiments
• We set up a sequence of different hands of Poker
for which the deal is ﬁxed.
‣ I.e. deck of cards in hand 1 is set to:
- Ac, 10s, 2h, 4d ......

22

Experiments
- Ac, 10s, 2h, 4d ......
• We use 6-max games, 6 players gives us 12 cards
dealt initially, and then 5 for the table.

22

Experiments
- Ac, 10s, 2h, 4d ......
• We use 6-max games, 6 players gives us 12 cards
dealt initially, and then 5 for the table.
• We use bots that conform to the archetypes to
populate the table.
22

Experiments (ii)
• This gives us the closest to a “controlled”
environment we can get for comparing bots.

23

Experiments (ii)
‣ Not entirely controlled since opponents may still react
differently to a different test-bot’s actions

23

Experiments (ii)
• We can compare bots by setting them loose at this
“table” and compare ﬁnal results to see how well
they each do.

23

Experiments (ii)
• We can compare bots by setting them loose at this
“table” and compare ﬁnal results to see how well
they each do.
• Experiments running right now - no results yet.
23

Summary
• AI for Poker

24

Ms Pac-Man
• Pac-Man is a deterministic game.
‣ From the start of the game, if the player makes exactly
the same set of moves, the result will always be the same

25

Ms Pac-Man
• That’s not a very interesting problem for AI.
‣ Optimal solutions exist and can be found with enough
horsepower and/or time.

25

Ms Pac-Man
• That’s not a very interesting problem for AI.
‣ Optimal solutions exist and can be found with enough
horsepower and/or time.
• Ms Pac-Man is non-deterministic.
‣ The ghosts act in a reasonably unpredictable manner.

25

StrathPac
• StrathPac is the highly original name for our project
to tackle writing AI systems to play Ms. Pac-Man.

26

StrathPac
• I’ve been working on this off and on, primarily with
undergraduates, for a couple of years.

26

StrathPac
• Aim is to maximise score.

26

StrathPac
‣ Pill clearing is very secondary.

26

StrathPac
‣ Pill clearing is very secondary.
• Different approaches to this have been tried, with
limited success.
26

Why?
• Who cares if we make a good Ms. Pac-Man player?

27

Why?
• The aspects that make this game challenging make
other things challenging too.

27

Why?
‣ Real-time operation

27

Why?
‣ Completing objectives with adversaries

27

Why?
‣ Contrasting objectives (e.g. staying alive vs killing ghosts)

27

Why?
• Good solutions to this will have other applications

27

Why?
• Good solutions to this will have other applications
• Also, its part of many competition tracks!
27

How?
• The previous versions of StrathPac have been based
on a “Screen Scraping” framework developed by
Lucas (U. Essex)

28

How?
• The previous versions of StrathPac have been based
on a “Screen Scraping” framework developed by
Lucas (U. Essex)
• Interesting challenge in as much as no interaction
between AI and game except “seeing” and “acting”
‣ Models closely the way actual intelligence is
compartmentalised from the world.

28

One Approach
• Driven by three “motivations”

29

One Approach
‣ Hunger for pills

29

One Approach
‣ Fear of ghosts

29

One Approach
‣ Fear of ghosts
‣ Aggression towards blue ghosts

29

One Approach
‣ Fear of ghosts
‣ Aggression towards blue ghosts
• These motivations generate “Inﬂuence Maps” that
attract and repel the agent from points of the game
world.

29

Balancing Motivations
• Lot of experimentation trying to tune the
parameters governing how the inﬂuence is
generated.

31

generated.
• Unsupervised learning using “Genetic Algorithms”,
ideal for ﬁddling with multiple variables at once.

31

generated.
• 20 computers playing 20 games each per
conﬁguration, across 30 generations of evolution

31

generated.
• 20 computers playing 20 games each per
conﬁguration, across 30 generations of evolution
‣ About 12,000 games of Ms Pac-Man...
31

Results
• Not great.

32

Results
• Not great.
• GA gave a 100% increase in score over initial
conﬁguration, but ﬁnal score still not particularly
competitive.

32

Results
• Not great.
• GA gave a 100% increase in score over initial
conﬁguration, but ﬁnal score still not particularly
competitive.
• Principal take away is that naive solutions that don’t
do reasoning don’t do very well.

32

Summary
• AI for Poker

33

Motivation
• Work with Ms. Pac-Man highlighted deﬁciencies in
current AI techniques

34

Motivation
‣ Fast, ﬂexible long term planning currently impossible

34

Motivation
‣ Fast techniques too stupid

34

Motivation
‣ Smart techniques too slow

34

Motivation
• My research is aimed at bridging the gap

34

Motivation
• Additionally, looking at “real” environments

34

Motivation
• Additionally, looking at “real” environments
‣ Dynamic, multi-agent, real-time etc.
34

Core Premise
• Searching state or trajectory spaces is a slow
process. Most Automated Planning domains are rich
enough to describe P-Space Complete problems

35

Core Premise
‣ Though human-solvable problems tend towards NP-Hard
and below...

35

Core Premise
and below...
• Evaluating functions is trivial by comparison.

35

Core Premise
and below...
• Evaluating functions is trivial by comparison.
• Extends the notion of Inﬂuence Maps into
conceptual space we call “Inﬂuence Landscapes”
35

Architecture
• Most similar systems either choose to respond
reactively or deliberatively to speciﬁc aspects of the
environment, or can act reactively within certain
parameters of the deliberative system.

36

Architecture
• Most similar systems either choose to respond
reactively or deliberatively to speciﬁc aspects of the
environment, or can act reactively within certain
parameters of the deliberative system.
• The I2 system aims to continually be inﬂuenced by
input from both a purely reactive evaluator and a
deliberative evaluator at all decision points.

36

Example
Goal

Agent
37

Example
Goal

Obstacle

Agent
38

Example
Goal

Obstacle

Agent
39

To Be Continued...
• In depth discussion of exactly how this works is
beyond the scope of this presentation.
• Tune in next week when I’ll be presenting this work
as part of the Computer Science Departmental
Seminar series next week.

40

Summary
• AI for Poker

41

Where are we going?
• Game AI research broken into two different factions

42

Where are we going?
‣ People using games to frame “serious” AI questions.

42

Where are we going?
‣ People using AI to make “better” games

42

Where are we going?
• Major dichotomy as a very good enemy AI

42

Where are we going?
‣ Is frustrating to play against

42

Where are we going?
‣ Takes resources to create

42

Where are we going?
‣ Takes resources to create
‣ Does not give a good player experience

42

Where are we going?

Does NOT increase sales!

43

Where are we going?
• At the same time though, there’s an increasing drive
to use AI in more and more interesting ways.
• Means that AI is no longer restricted to “controlling
the enemies” and not bound to a level of human-like
(or below) intelligence in order to be beatable.

44

Left 4 Dead
• Left 4 Dead is a Survival Horror shooter game.
• As one of four people not infected, kill waves of
zombies as you push to escape the quarantine zone.
• Introduced the concept of an “AI Director” which
controls the pacing of the game.
• Aim is to replicate horror ﬁlm cycle of:
Calm => Build-up => Frenzy => Relax => Calm

46

Galactic Arms Race
• GAR is a game created by University of Central
Florida. Project led by Ken Stanley (who you may
know as the creator of NERO)
• Uses an algorithm to produce content within the
game - Content Generating NeuroEvolution of
Augmenting Topologies
• Evolves new and unique weapons unsupervised

47

Batman Arkham Asylum
• The new Batman game is a great example of a game
where improving the NPC AI would be detrimental.
• Chosen by some for action Game of the Year 2009,
relies on the use of “thugs” to generate iconic feel.
• Making enemies smarter, or even slightly less
predictable, would destroy the KAPOW! aspect.

49

Cinematic Games
• Designers are often loathe to impart true AI into
their NPCs because this could detract from the
cinematic experience.
• They want to stage-manage how the AI acts, and
what the player experiences.
• They don’t want AI smart enough to realise standing
next to the big red exploding barrel might be bad!

50

The Sandbox
• Sandbox games are where AI techniques can come
into their own from an NPC perspective.
• Imagine MMOs where small player populations were
masked by imperceptible bots.
• Imagine games where NPCs interact with the world
on an equal footing to players.
‣ Unpredictable, sometimes novel, solutions to problems

51

Final Remarks
• A* still ridiculously over-represented in Game AI
• Lots of interesting things happening in Academia
‣ Lots of unanswered AI questions still exist to work on
‣ We’re a long way from having the kind of General AI that
you see in movies.
• Lots of techniques being adopted by Industry

52

Final Remarks (ii)
• Graphics has been pushed about far as it can go.
• Increasingly more emphasis on other aspects:
‣ Physics
‣ Animation
‣ AI
• AI has a lot more to offer in terms of content
creation, experience management etc.

53

Shameless Plugs
• Me
‣ http://lukedicken.com
‣ luke@cis.strath.ac.uk
‣ Next talk - Next week, here.
• Strathclyde AI in Games Group
‣ johnl@cis.strath.ac.uk

54

Shameless Plugs (ii)
• AIGameDev.com
‣ One stop shop for all things Games and AI
‣ Regular posts about current AI techniques
‣ Interviews with Industry ﬁgures
‣ Masterclasses explaining techniques
• Paris Game AI Conference
‣ Organised by AIGameDev.com
‣ Heavy Industry focus rather than Academic

55

Shameless Plugs (iii)
• IEEE Symposium on Computational Intelligence and
Games 2010
‣ Call for papers out now
‣ Submission deadline March 15th
‣ Conference in Copenhagen in August
‣ Excellent competition track

56

Questions?

(assuming you’re still awake)

Knowing When to Hold 'Em, When to Fold 'Em and When to Blow 'Em Up

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (7)

Similaire à Knowing When to Hold 'Em, When to Fold 'Em and When to Blow 'Em Up

Similaire à Knowing When to Hold 'Em, When to Fold 'Em and When to Blow 'Em Up (20)

Plus de Luke Dicken

Plus de Luke Dicken (12)

Dernier

Dernier (20)

Knowing When to Hold 'Em, When to Fold 'Em and When to Blow 'Em Up

Notes de l'éditeur