End-to-end Planning with Value Iteration Networks and The Predictron

•

2 j'aime•325 vues

Yoonho Lee

Presentation slides on model-based reinforcement learning methods

Présentations et discours publics

The Predictron: End-to-end Learning and
Planning
Yoonho Lee
Department of Computer Science and Engineering
Pohang University of Science and Technology
December 27, 2016

Reward augmentation1
1
Bellemare et al. Unifying Count-Based Exploration and Intrinsic
Motivation, NIPS 2016

Hierarchical actions2
2
Florensa et al. Stochastic Neural Networks for Hierarchical Reinforcement
Learning, under review for ICLR 2017

Model-based video prediction3
3
Oh et al. Action-Conditional Video Prediction using Deep Networks in
Atari Games, NIPS 2015

Value Iteration Networks
Aviv Tamar, Yi Wu, Garrett Thomas, Sergey Levine, Pieter Abbeel
Best paper at NIPS 2016

Value Iteration Networks
Network diagram

Value Iteration Networks
Bellman Operator as a CNN forward pass
Bellman Operator
Every V converges to V ∗ under the Bellman Operator T deﬁned
as:
(TV )(s) = max
a∈A
{R(s, a) + γ
s ∈S
P(s |s, a)V (s )} (1)
V ∗
= lim
n→∞
Tn
V ∀V (2)
The Bellman Operator can be viewed as a CNN forward pass:
(T∗
V )(s) = max
a∈A
{R(s, a) + γconvP(a)(V (s))}

Value Iteration Networks
Summary
Neural network architecture that plans using value iteration
Assumes that the state is a suﬃcient statistic for the reward
function
The small MDP must have a ﬁnite state space
Uses prior knowledge about the environment’s structure

The Predictron: End-to-End Learning and Planning
David Silver, Hado van Hasselt, Matteo Hessel, Tom Schaul,
Arthur Guez, Tim Harley, Gabriel Dulac-Arnold, David Reichert,
Neil Rabinowitz, Andre Barreto, Thomas Degris
Under review for ICLR 2017

Predictron
Summary
End-to-end simulation of an MRP
Works with arbitrary state space for small MRP
Small MRP has a non-interpretable state space
Designed for RL, but does not take actions into account

Summary
Hierarchical Reinforcement Learning attempts to identify
crucial decisions
Agents can now use NN-based planning for better decision
making
Research directions
Theoretical bounds for optimal policy of a smaller MDP
Learning a smaller MDP with abstract actions
End-to-end planning based Q network or policy network

Thank You
Value Iteration Networks
The Predictron: End-to-End Learning and Planning

Recommandé

Modular Multitask Reinforcement Learning with Policy SketchesYoonho Lee

Model-Agnostic Meta-Learning for Fast Adaptation of Deep NetworksYoonho Lee

Evolution Strategies as a Scalable Alternative to Reinforcement LearningYoonho Lee

Gradient-Based Meta-Learning with Learned Layerwise Metric and SubspaceYoonho Lee

Introduction to Few shot learningRidge-i, Inc.

Zero shot-learning: paper presentationJérémie Kalfon

Introduction to Model-Based Machine LearningDaniel Emaasit

Icml2018 naver reviewNAVER Engineering

Recommandé

Modular Multitask Reinforcement Learning with Policy SketchesYoonho Lee

Model-Agnostic Meta-Learning for Fast Adaptation of Deep NetworksYoonho Lee

Evolution Strategies as a Scalable Alternative to Reinforcement LearningYoonho Lee

Gradient-Based Meta-Learning with Learned Layerwise Metric and SubspaceYoonho Lee

Introduction to Few shot learningRidge-i, Inc.

Zero shot-learning: paper presentationJérémie Kalfon

Introduction to Model-Based Machine LearningDaniel Emaasit

Icml2018 naver reviewNAVER Engineering

Transfer learning-presentationBushra Jbawi

FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETScsandit

Efficient projectionsTomasz Waszczyk

Continual Learning with Deep Architectures - Tutorial ICML 2021Vincenzo Lomonaco

Deep learning with tensorflowCharmi Chokshi

Linear RegressionEng Teong Cheah

Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...Dongmin Choi

Multi-Chart Generative Surface ModelingHeliBenHamu

MultiObjective(11) - CopyAMIT KUMAR

Noura2Dr-mahmoud Algamel

Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017MLconf

Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...Sujit Pal

final seminarAMIT KUMAR

Corinna Cortes, Head of Research, Google, at MLconf NYC 2017MLconf

EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"IJDKP

Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15MLconf

Context-aware preference modeling with factorizationBalázs Hidasi

A h k clustering algorithm for high dimensional data using ensemble learningijitcs

Super Resolution with OCR OptimizationniveditJain

Model-Based Reinforcement Learning @NIPS2017mooopan

Distributed deep learning_over_spark_20_nov_2014_ver_2.8Vijay Srinivas Agneeswaran, Ph.D

Recognition and Detection of Real-Time Objects Using Unified Network of Faste...dbpublications

Contenu connexe

Tendances

Transfer learning-presentationBushra Jbawi

FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETScsandit

Efficient projectionsTomasz Waszczyk

Continual Learning with Deep Architectures - Tutorial ICML 2021Vincenzo Lomonaco

Deep learning with tensorflowCharmi Chokshi

Linear RegressionEng Teong Cheah

Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...Dongmin Choi

Multi-Chart Generative Surface ModelingHeliBenHamu

MultiObjective(11) - CopyAMIT KUMAR

Noura2Dr-mahmoud Algamel

Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017MLconf

Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...Sujit Pal

final seminarAMIT KUMAR

Corinna Cortes, Head of Research, Google, at MLconf NYC 2017MLconf

EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"IJDKP

Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15MLconf

Context-aware preference modeling with factorizationBalázs Hidasi

A h k clustering algorithm for high dimensional data using ensemble learningijitcs

Super Resolution with OCR OptimizationniveditJain

Tendances (19)

Transfer learning-presentation

FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS

Efficient projections

Continual Learning with Deep Architectures - Tutorial ICML 2021

Deep learning with tensorflow

Linear Regression

Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...

Multi-Chart Generative Surface Modeling

MultiObjective(11) - Copy

Noura2

Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017

Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...

final seminar

Corinna Cortes, Head of Research, Google, at MLconf NYC 2017

EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"

Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15

Context-aware preference modeling with factorization

A h k clustering algorithm for high dimensional data using ensemble learning

Super Resolution with OCR Optimization

Similaire à End-to-end Planning with Value Iteration Networks and The Predictron

Model-Based Reinforcement Learning @NIPS2017mooopan

Distributed deep learning_over_spark_20_nov_2014_ver_2.8Vijay Srinivas Agneeswaran, Ph.D

Recognition and Detection of Real-Time Objects Using Unified Network of Faste...dbpublications

ViT (Vision Transformer) Review [CDM]Dongmin Choi

Graph Inductive Biases in Transformers without Message Passing.pptxssuser2624f71

A reinforcement learning based routing protocol with qo s support for biomedi...Iffat Anjum

Pruning convolutional neural networks for resource efficient inferenceKaushalya Madhawa

Distributed Deep Learning + others for Spark MeetupVijay Srinivas Agneeswaran, Ph.D

Reservoir computing fast deep learning for sequencesClaudio Gallicchio

Dp2 ppt by_bikramjit_chowdhury_finalBikramjit Chowdhury

Implementing a neural network potential for exascale molecular dynamicsPFHub PFHub

Hao hsiang ma resumeEliot Ma

A Hybrid Deep Neural Network Model For Time Series ForecastingMartha Brown

Text prediction based on Recurrent Neural Network Language ModelANIRUDHMALODE2

最近の研究情勢についていくために - Deep Learningを中心に - Hiroshi Fukui

A study on rough set theory basedijaia

Tensor Networks and Their Applications on Machine LearningKwan-yuet Ho

Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...MLconf

Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...Impetus Technologies

Complex systemMunnangi Anirudh

Similaire à End-to-end Planning with Value Iteration Networks and The Predictron (20)

Model-Based Reinforcement Learning @NIPS2017

Distributed deep learning_over_spark_20_nov_2014_ver_2.8

Recognition and Detection of Real-Time Objects Using Unified Network of Faste...

ViT (Vision Transformer) Review [CDM]

Graph Inductive Biases in Transformers without Message Passing.pptx

A reinforcement learning based routing protocol with qo s support for biomedi...

Pruning convolutional neural networks for resource efficient inference

Distributed Deep Learning + others for Spark Meetup

Reservoir computing fast deep learning for sequences

Dp2 ppt by_bikramjit_chowdhury_final

Implementing a neural network potential for exascale molecular dynamics

Hao hsiang ma resume

A Hybrid Deep Neural Network Model For Time Series Forecasting

Text prediction based on Recurrent Neural Network Language Model

最近の研究情勢についていくために - Deep Learningを中心に -

A study on rough set theory based

Tensor Networks and Their Applications on Machine Learning

Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...

Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...

Complex system

Plus de Yoonho Lee

Meta-learning and the ELBOYoonho Lee

On First-Order Meta-Learning AlgorithmsYoonho Lee

New Insights and Perspectives on the Natural Gradient MethodYoonho Lee

Parameter Space Noise for ExplorationYoonho Lee

Meta Learning Shared HierarchiesYoonho Lee

Continuous Adaptation via Meta Learning in Nonstationary and Competitive Envi...Yoonho Lee

Dueling Network Architectures for Deep Reinforcement LearningYoonho Lee

Gradient Estimation Using Stochastic Computation GraphsYoonho Lee

Plus de Yoonho Lee (8)

Meta-learning and the ELBO

On First-Order Meta-Learning Algorithms

New Insights and Perspectives on the Natural Gradient Method

Parameter Space Noise for Exploration

Meta Learning Shared Hierarchies

Continuous Adaptation via Meta Learning in Nonstationary and Competitive Envi...

Dueling Network Architectures for Deep Reinforcement Learning

Gradient Estimation Using Stochastic Computation Graphs

Dernier

miladyskindiseases-200705210221 2.!!pptxCarrieButtitta

INDIAN GCP GUIDELINE. for Regulatory affair 1st sem CRRsarwankumar4524

Dutch Power - 26 maart 2024 - Henk Kras - Circular PlasticsDutch Power

THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...漢銘謝

SBFT Tool Competition 2024 -- Python Test Case Generation TrackSebastiano Panichella

PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.KathleenAnnCordero2

Quality by design.. ppt for RA (1ST SEMCharmi13

Chizaram's Women Tech Makers Deck. .pptxogubuikealex

Genshin Impact PPT Template by EaTemp.pptxJohnree4

Work Remotely with Confluence ACE 2.pptxmavinoikein

Mathan flower ppt.pptx slide orchids ✨🌸mathanramanathan2005

SaaStr Workshop Wednesday w/ Kyle Norton, Owner.comsaastr

PHYSICS PROJECT BY MSC - NANOTECHNOLOGYpruthirajnayak525

RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATIONRachelAnnTenibroAmaz

Early Modern Spain. All about this periodSaraIsabelJimenez

The Ten Facts About People With Autism PresentationNathan Young

DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...Henrik Hanke

The 3rd Intl. Workshop on NL-based Software EngineeringSebastiano Panichella

Engaging Eid Ul Fitr Presentation for Kindergartners.pptxAsifArshad8

Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSebastiano Panichella

Dernier (20)

miladyskindiseases-200705210221 2.!!pptx

INDIAN GCP GUIDELINE. for Regulatory affair 1st sem CRR

Dutch Power - 26 maart 2024 - Henk Kras - Circular Plastics

THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...

SBFT Tool Competition 2024 -- Python Test Case Generation Track

PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.

Quality by design.. ppt for RA (1ST SEM

Chizaram's Women Tech Makers Deck. .pptx

Genshin Impact PPT Template by EaTemp.pptx

Work Remotely with Confluence ACE 2.pptx

Mathan flower ppt.pptx slide orchids ✨🌸

SaaStr Workshop Wednesday w/ Kyle Norton, Owner.com

PHYSICS PROJECT BY MSC - NANOTECHNOLOGY

RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATION

Early Modern Spain. All about this period

The Ten Facts About People With Autism Presentation

DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...

The 3rd Intl. Workshop on NL-based Software Engineering

Engaging Eid Ul Fitr Presentation for Kindergartners.pptx

Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist

End-to-end Planning with Value Iteration Networks and The Predictron

1. The Predictron: End-to-end Learning and Planning Yoonho Lee Department of Computer Science and Engineering Pohang University of Science and Technology December 27, 2016

2. Hierarchical RL motivation

3. Hierarchical RL

4. Hierarchical RL

5. Reward augmentation1 1 Bellemare et al. Unifying Count-Based Exploration and Intrinsic Motivation, NIPS 2016

6. Hierarchical RL

7. Hierarchical actions2 2 Florensa et al. Stochastic Neural Networks for Hierarchical Reinforcement Learning, under review for ICLR 2017

8. Hierarchical RL

9. Model-based video prediction3 3 Oh et al. Action-Conditional Video Prediction using Deep Networks in Atari Games, NIPS 2015

10. Value Iteration Networks Aviv Tamar, Yi Wu, Garrett Thomas, Sergey Levine, Pieter Abbeel Best paper at NIPS 2016

11. Value Iteration Networks Motivation

12. Value Iteration Networks Network diagram

13. Value Iteration Networks Bellman Operator as a CNN forward pass Bellman Operator Every V converges to V ∗ under the Bellman Operator T deﬁned as: (TV )(s) = max a∈A {R(s, a) + γ s ∈S P(s |s, a)V (s )} (1) V ∗ = lim n→∞ Tn V ∀V (2) The Bellman Operator can be viewed as a CNN forward pass: (T∗ V )(s) = max a∈A {R(s, a) + γconvP(a)(V (s))}

14. Value Iteration Networks VI Module

15. Value Iteration Networks Results

16. Value Iteration Networks Summary Neural network architecture that plans using value iteration Assumes that the state is a suﬃcient statistic for the reward function The small MDP must have a ﬁnite state space Uses prior knowledge about the environment’s structure

17. The Predictron: End-to-End Learning and Planning David Silver, Hado van Hasselt, Matteo Hessel, Tom Schaul, Arthur Guez, Tim Harley, Gabriel Dulac-Arnold, David Reichert, Neil Rabinowitz, Andre Barreto, Thomas Degris Under review for ICLR 2017

18. Predictron

19. Predictron

20. Predictron

21. Predictron

22. Predictron

23. Predictron

24. Predictron

25. Predictron Summary End-to-end simulation of an MRP Works with arbitrary state space for small MRP Small MRP has a non-interpretable state space Designed for RL, but does not take actions into account

26. Summary Hierarchical Reinforcement Learning attempts to identify crucial decisions Agents can now use NN-based planning for better decision making Research directions Theoretical bounds for optimal policy of a smaller MDP Learning a smaller MDP with abstract actions End-to-end planning based Q network or policy network

27. Thank You Value Iteration Networks The Predictron: End-to-End Learning and Planning