SlideShare une entreprise Scribd logo
1  sur  16
Télécharger pour lire hors ligne
Results from International Probabilistic
         Planning Competition
              IPPC 2011
            @raimonbosch
Why Markov domains? (Crossing Traffic)
                (1) Solutions are functions (policies) mapping states into actions
                (2) Given an observation, stochastic behaviors can emerge.




Missing
information /
Stochastic                                  CAN'T PREDICT n+1 !!
behavior !!




                                                                                     We can obtain
                                                                                     better rewards
                                                                                     depending on a
                                                                                     policy!!
IPPC 2011: DOMAINS AND EVALUATION

 • 8 domains
– Traffic Control: highly exogenous, concurrent
– Elevator Control: highly exogenous, concurrent
– Game of Life: highly combinatoric
– SysAdmin: highly exogenous, complex transitions
– Navigation: goal-oriented, determinization killer
– Crossing Traffic: goal-oriented, deterministic if move far left
– Skill Teaching: few exogenous events
– Reconnaissance: few exogenous events


• Conditions
– 24 hours for all runs
– 10 instances per domain, 30 runs per instance
Changes from IPPC 2008

- Not Goal Based.

- Large branching factors.

- Finite-horizon reward minimization.

- More realistic planning scenarios.
MDP winners
PROST
(Eyerich, Keller – Uni. Freiburg)
UCT/Single Outcome
Determinization, Caching

Glutton
(Kolobov, Dai, Mausam, Weld – UW)
Iterative Deepening
RTDP, Caching
POMDP winners
POMDPX_NUS
(Wu, WS Lee, D Hsu – NUS)
SARSOP / UCT
(POMCP)

KAIST-AILAB
(D Kim, K Lee, K-E Kim – KAIST)
Symbolic HSVI (ADDs),
Symmetry Detection
Understanding UCT:
Montecarlo tree search
Understanding UCT:
Multi-armed bandit problem
UCT Algorithm by Kocsis and
Szepesvari (2006)
Parts of UCT
(1) Monte-Carlo Tree Search

(2) Performs rollouts in a tree of decision
and chance nodes
  In decision nodes:
  * Choose any unvisited successor
randomly if there is one
  * Choose the successor maximizing the
UCB1 policy otherwise
1st MDP: PROST
Domain-independent probabilistic planning
based on UCT combined with additional
techniques:

- Reasonable Action Pruning
- Q-value initialization
- Search Depth Limitation
- Reward Lock Detection
2nd MDP: GLUTTON
LRTDP with reverse iterative deepening




• Subsampling transition function
• Correlated transition function samples
• Caching
POMDP Track: Challenges




- Agent acting under uncertainty.
- Stochastic sequential decision problems.
- Very large number of states.
- Compact representation needed.
1st POMDP: SARSOP
Successive Approximations of the Reachable Space
             under Optimal Policies
- Solve POMDPs by sampling belief space.
2nd POMDP: KAIST-AILAB
Uses symbolic heuristic search value iteration
(symbolic HSVI) for factored POMDPs




- Alpha vector masking method.
- Algebraic decision diagram (ADD) representation.
- Elimination of symmetric structures in the domains.
Thanks!
                                                                           [1]
T. Keller and P. Eyerich, “PROST: Probabilistic Planning Based on UCT,”
ICAPS’12, 2012.
                                                                           [2]
A. Kolobov, P. Dai, M. Mausam, and D. S. Weld, “Reverse Iterative
Deepening for Finite-Horizon MDPs with Large Branching Factors,” in
Twenty-Second International Conference on Automated Planning and
Scheduling, 2012.
                                                                           [3]
H. Kurniawati, D. Hsu, and W. S. Lee, “SARSOP: Efficient point-based
POMDP planning by approximating optimally reachable belief spaces,” in
Proc. Robotics: Science and Systems, 2008, vol. 62.
                                                                           [4]
H. S. Sim, K. E. Kim, J. H. Kim, D. S. Chang, and M. W. Koo, “Symbolic
heuristic search value iteration for factored POMDPs,” in Proc. Nat. Conf. on
Artificial Intelligence, 2008, pp. 1088–1093.

Contenu connexe

Similaire à IPPC 2011 Markov Domain Results and Planning Techniques

Why Neurons have thousands of synapses? A model of sequence memory in the brain
Why Neurons have thousands of synapses? A model of sequence memory in the brainWhy Neurons have thousands of synapses? A model of sequence memory in the brain
Why Neurons have thousands of synapses? A model of sequence memory in the brainNumenta
 
High Dimensional Fused-Informatics
High Dimensional Fused-InformaticsHigh Dimensional Fused-Informatics
High Dimensional Fused-InformaticsJoel Saltz
 
Opportunistic Routing Based on Daily Routines
Opportunistic Routing Based on Daily RoutinesOpportunistic Routing Based on Daily Routines
Opportunistic Routing Based on Daily RoutinesWaldir Moreira
 
2007 03-16 modeling and static analysis of complex biological systems dsr
2007 03-16 modeling and static analysis of complex biological systems dsr2007 03-16 modeling and static analysis of complex biological systems dsr
2007 03-16 modeling and static analysis of complex biological systems dsrDebora Da Rosa
 
SPPRA'2013 Paper Presentation
SPPRA'2013 Paper PresentationSPPRA'2013 Paper Presentation
SPPRA'2013 Paper PresentationActiveEon
 
Study on the Effect of Network Dynamics on Opportunistic Routing
Study on the Effect of Network Dynamics on Opportunistic RoutingStudy on the Effect of Network Dynamics on Opportunistic Routing
Study on the Effect of Network Dynamics on Opportunistic RoutingWaldir Moreira
 
Presentation2 2000
Presentation2 2000Presentation2 2000
Presentation2 2000suvobgd
 
[20240318_LabSeminar_Huy]GSTNet: Global Spatial-Temporal Network for Traffic ...
[20240318_LabSeminar_Huy]GSTNet: Global Spatial-Temporal Network for Traffic ...[20240318_LabSeminar_Huy]GSTNet: Global Spatial-Temporal Network for Traffic ...
[20240318_LabSeminar_Huy]GSTNet: Global Spatial-Temporal Network for Traffic ...thanhdowork
 
Tools to Analyze Morphology and Spatially Mapped Molecular Data - Informatio...
Tools to Analyze Morphology and Spatially Mapped Molecular Data -  Informatio...Tools to Analyze Morphology and Spatially Mapped Molecular Data -  Informatio...
Tools to Analyze Morphology and Spatially Mapped Molecular Data - Informatio...Joel Saltz
 
An Information-Theoretic Approach for Clonal Selection Algorithms
An Information-Theoretic Approach for Clonal Selection AlgorithmsAn Information-Theoretic Approach for Clonal Selection Algorithms
An Information-Theoretic Approach for Clonal Selection AlgorithmsMario Pavone
 
Model-Based Reinforcement Learning @NIPS2017
Model-Based Reinforcement Learning @NIPS2017Model-Based Reinforcement Learning @NIPS2017
Model-Based Reinforcement Learning @NIPS2017mooopan
 
MediaEval 2017 - Satellite Task: Flood detection using Social Media Data and ...
MediaEval 2017 - Satellite Task: Flood detection using Social Media Data and ...MediaEval 2017 - Satellite Task: Flood detection using Social Media Data and ...
MediaEval 2017 - Satellite Task: Flood detection using Social Media Data and ...multimediaeval
 
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...Geoffrey Fox
 

Similaire à IPPC 2011 Markov Domain Results and Planning Techniques (20)

Why Neurons have thousands of synapses? A model of sequence memory in the brain
Why Neurons have thousands of synapses? A model of sequence memory in the brainWhy Neurons have thousands of synapses? A model of sequence memory in the brain
Why Neurons have thousands of synapses? A model of sequence memory in the brain
 
High Dimensional Fused-Informatics
High Dimensional Fused-InformaticsHigh Dimensional Fused-Informatics
High Dimensional Fused-Informatics
 
Opportunistic Routing Based on Daily Routines
Opportunistic Routing Based on Daily RoutinesOpportunistic Routing Based on Daily Routines
Opportunistic Routing Based on Daily Routines
 
2007 03-16 modeling and static analysis of complex biological systems dsr
2007 03-16 modeling and static analysis of complex biological systems dsr2007 03-16 modeling and static analysis of complex biological systems dsr
2007 03-16 modeling and static analysis of complex biological systems dsr
 
20210226 esa-science-coffee-v2.0
20210226 esa-science-coffee-v2.020210226 esa-science-coffee-v2.0
20210226 esa-science-coffee-v2.0
 
SPPRA'2013 Paper Presentation
SPPRA'2013 Paper PresentationSPPRA'2013 Paper Presentation
SPPRA'2013 Paper Presentation
 
Modeling full scale-data(2)
Modeling full scale-data(2)Modeling full scale-data(2)
Modeling full scale-data(2)
 
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
 
Study on the Effect of Network Dynamics on Opportunistic Routing
Study on the Effect of Network Dynamics on Opportunistic RoutingStudy on the Effect of Network Dynamics on Opportunistic Routing
Study on the Effect of Network Dynamics on Opportunistic Routing
 
upload3.pptx
upload3.pptxupload3.pptx
upload3.pptx
 
Presentation2 2000
Presentation2 2000Presentation2 2000
Presentation2 2000
 
[20240318_LabSeminar_Huy]GSTNet: Global Spatial-Temporal Network for Traffic ...
[20240318_LabSeminar_Huy]GSTNet: Global Spatial-Temporal Network for Traffic ...[20240318_LabSeminar_Huy]GSTNet: Global Spatial-Temporal Network for Traffic ...
[20240318_LabSeminar_Huy]GSTNet: Global Spatial-Temporal Network for Traffic ...
 
Tools to Analyze Morphology and Spatially Mapped Molecular Data - Informatio...
Tools to Analyze Morphology and Spatially Mapped Molecular Data -  Informatio...Tools to Analyze Morphology and Spatially Mapped Molecular Data -  Informatio...
Tools to Analyze Morphology and Spatially Mapped Molecular Data - Informatio...
 
uploadscribd.pptx
uploadscribd.pptxuploadscribd.pptx
uploadscribd.pptx
 
AAG_2011
AAG_2011AAG_2011
AAG_2011
 
An Information-Theoretic Approach for Clonal Selection Algorithms
An Information-Theoretic Approach for Clonal Selection AlgorithmsAn Information-Theoretic Approach for Clonal Selection Algorithms
An Information-Theoretic Approach for Clonal Selection Algorithms
 
Model-Based Reinforcement Learning @NIPS2017
Model-Based Reinforcement Learning @NIPS2017Model-Based Reinforcement Learning @NIPS2017
Model-Based Reinforcement Learning @NIPS2017
 
MediaEval 2017 - Satellite Task: Flood detection using Social Media Data and ...
MediaEval 2017 - Satellite Task: Flood detection using Social Media Data and ...MediaEval 2017 - Satellite Task: Flood detection using Social Media Data and ...
MediaEval 2017 - Satellite Task: Flood detection using Social Media Data and ...
 
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
 
SAX-VSM
SAX-VSMSAX-VSM
SAX-VSM
 

IPPC 2011 Markov Domain Results and Planning Techniques

  • 1. Results from International Probabilistic Planning Competition IPPC 2011 @raimonbosch
  • 2. Why Markov domains? (Crossing Traffic) (1) Solutions are functions (policies) mapping states into actions (2) Given an observation, stochastic behaviors can emerge. Missing information / Stochastic CAN'T PREDICT n+1 !! behavior !! We can obtain better rewards depending on a policy!!
  • 3. IPPC 2011: DOMAINS AND EVALUATION • 8 domains – Traffic Control: highly exogenous, concurrent – Elevator Control: highly exogenous, concurrent – Game of Life: highly combinatoric – SysAdmin: highly exogenous, complex transitions – Navigation: goal-oriented, determinization killer – Crossing Traffic: goal-oriented, deterministic if move far left – Skill Teaching: few exogenous events – Reconnaissance: few exogenous events • Conditions – 24 hours for all runs – 10 instances per domain, 30 runs per instance
  • 4. Changes from IPPC 2008 - Not Goal Based. - Large branching factors. - Finite-horizon reward minimization. - More realistic planning scenarios.
  • 5. MDP winners PROST (Eyerich, Keller – Uni. Freiburg) UCT/Single Outcome Determinization, Caching Glutton (Kolobov, Dai, Mausam, Weld – UW) Iterative Deepening RTDP, Caching
  • 6. POMDP winners POMDPX_NUS (Wu, WS Lee, D Hsu – NUS) SARSOP / UCT (POMCP) KAIST-AILAB (D Kim, K Lee, K-E Kim – KAIST) Symbolic HSVI (ADDs), Symmetry Detection
  • 9. UCT Algorithm by Kocsis and Szepesvari (2006)
  • 10. Parts of UCT (1) Monte-Carlo Tree Search (2) Performs rollouts in a tree of decision and chance nodes In decision nodes: * Choose any unvisited successor randomly if there is one * Choose the successor maximizing the UCB1 policy otherwise
  • 11. 1st MDP: PROST Domain-independent probabilistic planning based on UCT combined with additional techniques: - Reasonable Action Pruning - Q-value initialization - Search Depth Limitation - Reward Lock Detection
  • 12. 2nd MDP: GLUTTON LRTDP with reverse iterative deepening • Subsampling transition function • Correlated transition function samples • Caching
  • 13. POMDP Track: Challenges - Agent acting under uncertainty. - Stochastic sequential decision problems. - Very large number of states. - Compact representation needed.
  • 14. 1st POMDP: SARSOP Successive Approximations of the Reachable Space under Optimal Policies - Solve POMDPs by sampling belief space.
  • 15. 2nd POMDP: KAIST-AILAB Uses symbolic heuristic search value iteration (symbolic HSVI) for factored POMDPs - Alpha vector masking method. - Algebraic decision diagram (ADD) representation. - Elimination of symmetric structures in the domains.
  • 16. Thanks! [1] T. Keller and P. Eyerich, “PROST: Probabilistic Planning Based on UCT,” ICAPS’12, 2012. [2] A. Kolobov, P. Dai, M. Mausam, and D. S. Weld, “Reverse Iterative Deepening for Finite-Horizon MDPs with Large Branching Factors,” in Twenty-Second International Conference on Automated Planning and Scheduling, 2012. [3] H. Kurniawati, D. Hsu, and W. S. Lee, “SARSOP: Efficient point-based POMDP planning by approximating optimally reachable belief spaces,” in Proc. Robotics: Science and Systems, 2008, vol. 62. [4] H. S. Sim, K. E. Kim, J. H. Kim, D. S. Chang, and M. W. Koo, “Symbolic heuristic search value iteration for factored POMDPs,” in Proc. Nat. Conf. on Artificial Intelligence, 2008, pp. 1088–1093.