2. Why Markov domains? (Crossing Traffic)
(1) Solutions are functions (policies) mapping states into actions
(2) Given an observation, stochastic behaviors can emerge.
Missing
information /
Stochastic CAN'T PREDICT n+1 !!
behavior !!
We can obtain
better rewards
depending on a
policy!!
3. IPPC 2011: DOMAINS AND EVALUATION
• 8 domains
– Traffic Control: highly exogenous, concurrent
– Elevator Control: highly exogenous, concurrent
– Game of Life: highly combinatoric
– SysAdmin: highly exogenous, complex transitions
– Navigation: goal-oriented, determinization killer
– Crossing Traffic: goal-oriented, deterministic if move far left
– Skill Teaching: few exogenous events
– Reconnaissance: few exogenous events
• Conditions
– 24 hours for all runs
– 10 instances per domain, 30 runs per instance
4. Changes from IPPC 2008
- Not Goal Based.
- Large branching factors.
- Finite-horizon reward minimization.
- More realistic planning scenarios.
10. Parts of UCT
(1) Monte-Carlo Tree Search
(2) Performs rollouts in a tree of decision
and chance nodes
In decision nodes:
* Choose any unvisited successor
randomly if there is one
* Choose the successor maximizing the
UCB1 policy otherwise
11. 1st MDP: PROST
Domain-independent probabilistic planning
based on UCT combined with additional
techniques:
- Reasonable Action Pruning
- Q-value initialization
- Search Depth Limitation
- Reward Lock Detection
12. 2nd MDP: GLUTTON
LRTDP with reverse iterative deepening
• Subsampling transition function
• Correlated transition function samples
• Caching
13. POMDP Track: Challenges
- Agent acting under uncertainty.
- Stochastic sequential decision problems.
- Very large number of states.
- Compact representation needed.
14. 1st POMDP: SARSOP
Successive Approximations of the Reachable Space
under Optimal Policies
- Solve POMDPs by sampling belief space.
15. 2nd POMDP: KAIST-AILAB
Uses symbolic heuristic search value iteration
(symbolic HSVI) for factored POMDPs
- Alpha vector masking method.
- Algebraic decision diagram (ADD) representation.
- Elimination of symmetric structures in the domains.
16. Thanks!
[1]
T. Keller and P. Eyerich, “PROST: Probabilistic Planning Based on UCT,”
ICAPS’12, 2012.
[2]
A. Kolobov, P. Dai, M. Mausam, and D. S. Weld, “Reverse Iterative
Deepening for Finite-Horizon MDPs with Large Branching Factors,” in
Twenty-Second International Conference on Automated Planning and
Scheduling, 2012.
[3]
H. Kurniawati, D. Hsu, and W. S. Lee, “SARSOP: Efficient point-based
POMDP planning by approximating optimally reachable belief spaces,” in
Proc. Robotics: Science and Systems, 2008, vol. 62.
[4]
H. S. Sim, K. E. Kim, J. H. Kim, D. S. Chang, and M. W. Koo, “Symbolic
heuristic search value iteration for factored POMDPs,” in Proc. Nat. Conf. on
Artificial Intelligence, 2008, pp. 1088–1093.