IPPC 2011 Markov Domain Results and Planning Techniques

Results from International Probabilistic
Planning Competition
IPPC 2011
@raimonbosch

Why Markov domains? (Crossing Traffic)
(1) Solutions are functions (policies) mapping states into actions
(2) Given an observation, stochastic behaviors can emerge.

Missing
information /
Stochastic CAN'T PREDICT n+1 !!
behavior !!

We can obtain
better rewards
depending on a
policy!!

IPPC 2011: DOMAINS AND EVALUATION

• 8 domains
– Traffic Control: highly exogenous, concurrent
– Elevator Control: highly exogenous, concurrent
– Game of Life: highly combinatoric
– SysAdmin: highly exogenous, complex transitions
– Navigation: goal-oriented, determinization killer
– Crossing Traffic: goal-oriented, deterministic if move far left
– Skill Teaching: few exogenous events
– Reconnaissance: few exogenous events

• Conditions
– 24 hours for all runs
– 10 instances per domain, 30 runs per instance

Changes from IPPC 2008

- Not Goal Based.

- Large branching factors.

- Finite-horizon reward minimization.

- More realistic planning scenarios.

MDP winners
PROST
(Eyerich, Keller – Uni. Freiburg)
UCT/Single Outcome
Determinization, Caching

Glutton
(Kolobov, Dai, Mausam, Weld – UW)
Iterative Deepening
RTDP, Caching

POMDP winners
POMDPX_NUS
(Wu, WS Lee, D Hsu – NUS)
SARSOP / UCT
(POMCP)

KAIST-AILAB
(D Kim, K Lee, K-E Kim – KAIST)
Symbolic HSVI (ADDs),
Symmetry Detection

Understanding UCT:
Montecarlo tree search

Understanding UCT:
Multi-armed bandit problem

UCT Algorithm by Kocsis and
Szepesvari (2006)

Parts of UCT
(1) Monte-Carlo Tree Search

(2) Performs rollouts in a tree of decision
and chance nodes
In decision nodes:
* Choose any unvisited successor
randomly if there is one
* Choose the successor maximizing the
UCB1 policy otherwise

1st MDP: PROST
Domain-independent probabilistic planning
based on UCT combined with additional
techniques:

- Reasonable Action Pruning
- Q-value initialization
- Search Depth Limitation
- Reward Lock Detection

2nd MDP: GLUTTON
LRTDP with reverse iterative deepening

• Subsampling transition function
• Correlated transition function samples
• Caching

POMDP Track: Challenges

- Agent acting under uncertainty.
- Stochastic sequential decision problems.
- Very large number of states.
- Compact representation needed.

1st POMDP: SARSOP
Successive Approximations of the Reachable Space
under Optimal Policies
- Solve POMDPs by sampling belief space.

2nd POMDP: KAIST-AILAB
Uses symbolic heuristic search value iteration
(symbolic HSVI) for factored POMDPs

- Alpha vector masking method.
- Algebraic decision diagram (ADD) representation.
- Elimination of symmetric structures in the domains.

Thanks!
[1]
T. Keller and P. Eyerich, “PROST: Probabilistic Planning Based on UCT,”
ICAPS’12, 2012.
[2]
A. Kolobov, P. Dai, M. Mausam, and D. S. Weld, “Reverse Iterative
Deepening for Finite-Horizon MDPs with Large Branching Factors,” in
Twenty-Second International Conference on Automated Planning and
Scheduling, 2012.
[3]
H. Kurniawati, D. Hsu, and W. S. Lee, “SARSOP: Efficient point-based
POMDP planning by approximating optimally reachable belief spaces,” in
Proc. Robotics: Science and Systems, 2008, vol. 62.
[4]
H. S. Sim, K. E. Kim, J. H. Kim, D. S. Chang, and M. W. Koo, “Symbolic
heuristic search value iteration for factored POMDPs,” in Proc. Nat. Conf. on
Artificial Intelligence, 2008, pp. 1088–1093.

IPPC 2011 Markov Domain Results and Planning Techniques

Recommandé

Recommandé

Contenu connexe

Similaire à IPPC 2011 Markov Domain Results and Planning Techniques

Similaire à IPPC 2011 Markov Domain Results and Planning Techniques (20)

IPPC 2011 Markov Domain Results and Planning Techniques