SlideShare a Scribd company logo
1 of 25
Dream To Control:
Learning Behaviors by Latent Imagination
LEE, DOHYEON
leadh991114@gmail.com
4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต
ICLR 2020 (Oral)
NeurIPS Deep RL Workshop 2019 (Oral)
Contents
1. Introduction
2. Methods
3. Experiments
4. Conclusion
4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต 1
1. RL Comparison
4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต 2
INDEX
Introduction
Methods
Performance
Conclusion
Model-Free RL
โ€ข No Model
โ€ข Learn value function(and/or policy) from real experience
Modelโ€“Based RL
โ€ข Learn a model from real experience
โ€ข Plan value function(and/or policy) from the simulated experience
RL Comparison, from slides of Sergey Levine
2. World Model
4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต 3
INDEX
Introduction
Methods
Performance
Conclusion
โ€œIntelligent agents can achieve goals in complex environments
even through they never encouter the exact same situation twice.โ€
โ€œThis ability requires building representations of the world from past
experience that enable generalization to novel situations.โ€
โ€œWorld models offer an explicit way to represent an agentโ€™s knowledge
about the world in a parametric model that can make predictions about the
futureโ€
A World Model, from Scott McCloudโ€™s Understanding Comics.
3. Visual Control
4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต 4
INDEX
Introduction
Methods
Performance
Conclusion
โ€œSensory inputs are high-dimensional images, latent dynamic models can
abstract observations to predict forward in compact state spaces.โ€
โ†’ latent states have a small memory footprint
โ€œBehaviors can be derived from dynamic models in many ways.โ€
โ†’ Considering only rewards within a fixed imagination horizon results in shortsighted behaviors
โ†’ Prior work commonly resorts to derivative-free optimization for robustness
4. PlaNet
4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต 5
INDEX
Introduction
Methods
Performance
Conclusion
An RL agent that learns the environment dynamics from
images and chooses actions through fast online planning in
latent space.
Learning Latent Dynamics for Planning from Pixels(Danijar Hafner et al., 2019)
5. Dreamer
4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต 6
INDEX
Introduction
Methods
Performance
Conclusion
An RL agent that learns long-horizontal behaviors from
images purely by latent imagination.
The three processes of the Dreamer agent.
1. The world model is learned from past experience.
2. From predictions of this model, the agent then learns a value network
to predict future rewards and an actor network to select actions.
3. The actor network is used to interact with the environment.
QnA
4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต 7
INDEX
Introduction
Methods
Performance
Conclusion
1. Learning the World Model
4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต 8
INDEX
Introduction
Methods
Performance
Conclusion
Dreamer learns a world model from experience. Using past images ๐‘œ1 ~ ๐‘œ3 and actions ๐‘Ž1 ~
๐‘Ž2, it computes a sequence of compact model states (green circles) from which it reconstructs
the images ๐‘œ1 ~ ๐‘œ3 and predicts the rewards ๐‘Ÿ1 ~ ๐‘Ÿ3. โ†’ Leveraging PlaNet
2. Learning Behavior in Imagination
4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต 9
INDEX
Introduction
Methods
Performance
Conclusion
Dreamer learns long-sighted behaviors from predicted sequences of model states. It first
learns the long-term value ๐‘ฃ2 ~ ๐‘ฃ3 of each state, and then predicts actions ๐‘Ž1 ~ ๐‘Ž2 that lead to
high rewards and values by backpropagating them through the state sequence to the actor
network.
2. Learning Behavior in Imagination
4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต 10
INDEX
Introduction
Methods
Performance
Conclusion
PlaNet vs Dreamer
โ€ข For a given situation in the environment, PlaNet searches for the best action among many predictions for different
action sequences.
โ€ข Dreamer side-steps this expensive search by decoupling planning and acting. Once its actor network has been
trained on predicted sequences, it computes the actions for interacting with the environment without additional search.
In addition, Dreamer considers rewards beyond the planning horizon using a value function and leverages
backpropagation for efficient planning.
3. Act in the Environment
4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต 11
INDEX
Introduction
Methods
Performance
Conclusion
The agent encodes the history of the episode to compute the current model state and the next
action to execute in the environment.
4. Explained
4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต 12
INDEX
Introduction
Methods
Performance
Conclusion
โ†– PlaNet(Omitted.)
4. Explained
4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต 13
INDEX
Introduction
Methods
Performance
Conclusion
1. Transition Model
2. Reward Model
3. Policy
4. Objective
(Imgained Rewards)
Actor-Critic Method
From papers on Deep Multi-Agent(Taiki Fuji et al.)
5. Actor-Critic Model
(Parametrized by ๐œ™ & ๐œ“, respectively)
4. Explained
4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต 14
INDEX
Introduction
Methods
Performance
Conclusion
6. Value Estimation
7. Learning Object
Flow of Actor-Critic Method
From slides of Deep RL, Sergey Levine
w/ Imgained Trajectories
โ†– Exponential decaying for old Trajectory
โ†– Rewards beyond k steps with the learned value model
โ†– Actor
โ†– Critic
QnA
4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต 15
INDEX
Introduction
Methods
Performance
Conclusion
1. Control Tasks
4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต 16
INDEX
Introduction
Methods
Performance
Conclusion
โ€ข Dreamer learns to solve 20 challenging continuous control tasks with
image inputs, 5 of which are displayed here.
The tasks are designed to pose a variety of challenges to the RL agent, including difficult to
predict collisions, sparse rewards, chaotic dynamics, small but relevant objects, high degrees
of freedom, and 3D perspectives
โ€ข The visualizations show the same 64x64 images that the agent receives
from the environment.
2. Comparison
4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต 17
INDEX
Introduction
Methods
Performance
Conclusion
Dreamer outperforms the previous best model-free
(D4PG) and model-based (PlaNet) methods on the
benchmark of 20 tasks in terms of final performance,
data efficiency, and computation time.
3. Atari Games
4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต 18
INDEX
Introduction
Methods
Performance
Conclusion
Dreamer learns successful behaviors on Atari games and DeepMind Lab
levels, which feature discrete actions and visually more diverse scenes,
including 3D environments with multiple objects.
QnA
4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต 19
INDEX
Introduction
Methods
Performance
Conclusion
Conclusion
4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต 20
INDEX
Introduction
Methods
Performance
Conclusion
1. Learning behaviors from sequences predicted by world models alone can solve
challenging visual control tasks from image inputs, surpassing the performance
of previous model-free approaches.
2. Dreamer demonstrates that learning behaviors by backpropagating value gradients through
predicted sequences of compact model states is successful and robust, solving a diverse
collection of continuous and discrete control tasks.
My Questions
4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต 21
INDEX
Introduction
Methods
Performance
Conclusion
1. Is there any relation between world model and โ€œcommon senseโ€ mentioned by Yann LeCun?
2. Is there any evidence for the mechanism of human prediction and dream?
What we see is based on our brainโ€™s prediction of the future,
A. Kitaoka.Kanzen. 2002.
Dreamer Series!
4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต 22
INDEX
Introduction
Methods
Performance
Conclusion
Dreamer Series!
4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต 23
INDEX
Introduction
Methods
Performance
Conclusion
Thank You For Your Listening!
4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต 24
INDEX
Introduction
Methods
Performance
Conclusion
Learning Behaviors by Latent Imagination
DQN model for Text2image, PixRay

More Related Content

Similar to Dream2Control paper review

ObjectDetection.pptx
ObjectDetection.pptxObjectDetection.pptx
ObjectDetection.pptxRitikPabbaraju2
ย 
HIGHLY SCALABLE, PARALLEL AND DISTRIBUTED ADABOOST ALGORITHM USING LIGHT WEIG...
HIGHLY SCALABLE, PARALLEL AND DISTRIBUTED ADABOOST ALGORITHM USING LIGHT WEIG...HIGHLY SCALABLE, PARALLEL AND DISTRIBUTED ADABOOST ALGORITHM USING LIGHT WEIG...
HIGHLY SCALABLE, PARALLEL AND DISTRIBUTED ADABOOST ALGORITHM USING LIGHT WEIG...ijdpsjournal
ย 
Software engineering model based smart indoor localization system using deep-...
Software engineering model based smart indoor localization system using deep-...Software engineering model based smart indoor localization system using deep-...
Software engineering model based smart indoor localization system using deep-...TELKOMNIKA JOURNAL
ย 
Inferring and executing programs for visual reasoning (UPC Reading Group)
Inferring and executing programs for visual reasoning (UPC Reading Group)Inferring and executing programs for visual reasoning (UPC Reading Group)
Inferring and executing programs for visual reasoning (UPC Reading Group)Universitat Politรจcnica de Catalunya
ย 
A Parallel Architecture for Multiple-Face Detection Technique Using AdaBoost ...
A Parallel Architecture for Multiple-Face Detection Technique Using AdaBoost ...A Parallel Architecture for Multiple-Face Detection Technique Using AdaBoost ...
A Parallel Architecture for Multiple-Face Detection Technique Using AdaBoost ...Hadi Santoso
ย 
Memory Efficient Graph Convolutional Network based Distributed Link Prediction
Memory Efficient Graph Convolutional Network based Distributed Link PredictionMemory Efficient Graph Convolutional Network based Distributed Link Prediction
Memory Efficient Graph Convolutional Network based Distributed Link Predictionmiyurud
ย 
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4IRJET Journal
ย 
RL_UpsideDown
RL_UpsideDownRL_UpsideDown
RL_UpsideDowntaeseon ryu
ย 
FACE PHOTO-SKETCH RECOGNITION USING DEEP LEARNING TECHNIQUES - A REVIEW
FACE PHOTO-SKETCH RECOGNITION USING DEEP LEARNING TECHNIQUES - A REVIEWFACE PHOTO-SKETCH RECOGNITION USING DEEP LEARNING TECHNIQUES - A REVIEW
FACE PHOTO-SKETCH RECOGNITION USING DEEP LEARNING TECHNIQUES - A REVIEWIRJET Journal
ย 
[RecSys2023] Challenging the Myth of Graph Collaborative Filtering: a Reasone...
[RecSys2023] Challenging the Myth of Graph Collaborative Filtering: a Reasone...[RecSys2023] Challenging the Myth of Graph Collaborative Filtering: a Reasone...
[RecSys2023] Challenging the Myth of Graph Collaborative Filtering: a Reasone...Daniele Malitesta
ย 
Deep Reinforcement Learning for Visual Navigation
Deep Reinforcement Learning for Visual NavigationDeep Reinforcement Learning for Visual Navigation
Deep Reinforcement Learning for Visual NavigationManish Pandey
ย 
A Survey of Machine Learning Methods Applied to Computer ...
A Survey of Machine Learning Methods Applied to Computer ...A Survey of Machine Learning Methods Applied to Computer ...
A Survey of Machine Learning Methods Applied to Computer ...butest
ย 
RE@Next_final.pptx
RE@Next_final.pptxRE@Next_final.pptx
RE@Next_final.pptxXinranZhang13
ย 
Pratik ibm-open power-ppt
Pratik ibm-open power-pptPratik ibm-open power-ppt
Pratik ibm-open power-pptVaibhav R
ย 
Feature Fusion and Classifier Ensemble Technique for Robust Face Recognition
Feature Fusion and Classifier Ensemble Technique for Robust Face RecognitionFeature Fusion and Classifier Ensemble Technique for Robust Face Recognition
Feature Fusion and Classifier Ensemble Technique for Robust Face RecognitionCSCJournals
ย 
Rapid object detection using boosted cascade of simple features
Rapid object detection using boosted  cascade of simple featuresRapid object detection using boosted  cascade of simple features
Rapid object detection using boosted cascade of simple featuresHirantha Pradeep
ย 
sibgrapi2015
sibgrapi2015sibgrapi2015
sibgrapi2015Waner Miranda
ย 
Graph convolutional neural networks for web-scale recommender systems.pptx
Graph convolutional neural networks for web-scale recommender systems.pptxGraph convolutional neural networks for web-scale recommender systems.pptx
Graph convolutional neural networks for web-scale recommender systems.pptxssuser2624f71
ย 
IRJET- Mango Classification using Convolutional Neural Networks
IRJET- Mango Classification using Convolutional Neural NetworksIRJET- Mango Classification using Convolutional Neural Networks
IRJET- Mango Classification using Convolutional Neural NetworksIRJET Journal
ย 

Similar to Dream2Control paper review (20)

ObjectDetection.pptx
ObjectDetection.pptxObjectDetection.pptx
ObjectDetection.pptx
ย 
HIGHLY SCALABLE, PARALLEL AND DISTRIBUTED ADABOOST ALGORITHM USING LIGHT WEIG...
HIGHLY SCALABLE, PARALLEL AND DISTRIBUTED ADABOOST ALGORITHM USING LIGHT WEIG...HIGHLY SCALABLE, PARALLEL AND DISTRIBUTED ADABOOST ALGORITHM USING LIGHT WEIG...
HIGHLY SCALABLE, PARALLEL AND DISTRIBUTED ADABOOST ALGORITHM USING LIGHT WEIG...
ย 
Software engineering model based smart indoor localization system using deep-...
Software engineering model based smart indoor localization system using deep-...Software engineering model based smart indoor localization system using deep-...
Software engineering model based smart indoor localization system using deep-...
ย 
Inferring and executing programs for visual reasoning (UPC Reading Group)
Inferring and executing programs for visual reasoning (UPC Reading Group)Inferring and executing programs for visual reasoning (UPC Reading Group)
Inferring and executing programs for visual reasoning (UPC Reading Group)
ย 
A Parallel Architecture for Multiple-Face Detection Technique Using AdaBoost ...
A Parallel Architecture for Multiple-Face Detection Technique Using AdaBoost ...A Parallel Architecture for Multiple-Face Detection Technique Using AdaBoost ...
A Parallel Architecture for Multiple-Face Detection Technique Using AdaBoost ...
ย 
Memory Efficient Graph Convolutional Network based Distributed Link Prediction
Memory Efficient Graph Convolutional Network based Distributed Link PredictionMemory Efficient Graph Convolutional Network based Distributed Link Prediction
Memory Efficient Graph Convolutional Network based Distributed Link Prediction
ย 
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
ย 
RL_UpsideDown
RL_UpsideDownRL_UpsideDown
RL_UpsideDown
ย 
FACE PHOTO-SKETCH RECOGNITION USING DEEP LEARNING TECHNIQUES - A REVIEW
FACE PHOTO-SKETCH RECOGNITION USING DEEP LEARNING TECHNIQUES - A REVIEWFACE PHOTO-SKETCH RECOGNITION USING DEEP LEARNING TECHNIQUES - A REVIEW
FACE PHOTO-SKETCH RECOGNITION USING DEEP LEARNING TECHNIQUES - A REVIEW
ย 
[RecSys2023] Challenging the Myth of Graph Collaborative Filtering: a Reasone...
[RecSys2023] Challenging the Myth of Graph Collaborative Filtering: a Reasone...[RecSys2023] Challenging the Myth of Graph Collaborative Filtering: a Reasone...
[RecSys2023] Challenging the Myth of Graph Collaborative Filtering: a Reasone...
ย 
Deep Reinforcement Learning for Visual Navigation
Deep Reinforcement Learning for Visual NavigationDeep Reinforcement Learning for Visual Navigation
Deep Reinforcement Learning for Visual Navigation
ย 
A Survey of Machine Learning Methods Applied to Computer ...
A Survey of Machine Learning Methods Applied to Computer ...A Survey of Machine Learning Methods Applied to Computer ...
A Survey of Machine Learning Methods Applied to Computer ...
ย 
RE@Next_final.pptx
RE@Next_final.pptxRE@Next_final.pptx
RE@Next_final.pptx
ย 
Pratik ibm-open power-ppt
Pratik ibm-open power-pptPratik ibm-open power-ppt
Pratik ibm-open power-ppt
ย 
Feature Fusion and Classifier Ensemble Technique for Robust Face Recognition
Feature Fusion and Classifier Ensemble Technique for Robust Face RecognitionFeature Fusion and Classifier Ensemble Technique for Robust Face Recognition
Feature Fusion and Classifier Ensemble Technique for Robust Face Recognition
ย 
Rapid object detection using boosted cascade of simple features
Rapid object detection using boosted  cascade of simple featuresRapid object detection using boosted  cascade of simple features
Rapid object detection using boosted cascade of simple features
ย 
sibgrapi2015
sibgrapi2015sibgrapi2015
sibgrapi2015
ย 
Graph convolutional neural networks for web-scale recommender systems.pptx
Graph convolutional neural networks for web-scale recommender systems.pptxGraph convolutional neural networks for web-scale recommender systems.pptx
Graph convolutional neural networks for web-scale recommender systems.pptx
ย 
OOP in java
OOP in javaOOP in java
OOP in java
ย 
IRJET- Mango Classification using Convolutional Neural Networks
IRJET- Mango Classification using Convolutional Neural NetworksIRJET- Mango Classification using Convolutional Neural Networks
IRJET- Mango Classification using Convolutional Neural Networks
ย 

More from taeseon ryu

VoxelNet
VoxelNetVoxelNet
VoxelNettaeseon ryu
ย 
OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...taeseon ryu
ย 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splattingtaeseon ryu
ย 
JetsonTX2 Python
 JetsonTX2 Python  JetsonTX2 Python
JetsonTX2 Python taeseon ryu
ย 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptxtaeseon ryu
ย 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_๋ณ€ํ˜„์ •
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_๋ณ€ํ˜„์ •MCSE_Multimodal Contrastive Learning of Sentence Embeddings_๋ณ€ํ˜„์ •
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_๋ณ€ํ˜„์ •taeseon ryu
ย 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdftaeseon ryu
ย 
YOLO V6
YOLO V6YOLO V6
YOLO V6taeseon ryu
ย 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories taeseon ryu
ย 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extractiontaeseon ryu
ย 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learningtaeseon ryu
ย 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Modelstaeseon ryu
ย 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuningtaeseon ryu
ย 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdftaeseon ryu
ย 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdftaeseon ryu
ย 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithmtaeseon ryu
ย 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networkstaeseon ryu
ย 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarizationtaeseon ryu
ย 
ProximalPolicyOptimization
ProximalPolicyOptimizationProximalPolicyOptimization
ProximalPolicyOptimizationtaeseon ryu
ย 

More from taeseon ryu (20)

VoxelNet
VoxelNetVoxelNet
VoxelNet
ย 
OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...
ย 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splatting
ย 
JetsonTX2 Python
 JetsonTX2 Python  JetsonTX2 Python
JetsonTX2 Python
ย 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptx
ย 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_๋ณ€ํ˜„์ •
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_๋ณ€ํ˜„์ •MCSE_Multimodal Contrastive Learning of Sentence Embeddings_๋ณ€ํ˜„์ •
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_๋ณ€ํ˜„์ •
ย 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
ย 
YOLO V6
YOLO V6YOLO V6
YOLO V6
ย 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories
ย 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extraction
ย 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learning
ย 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
ย 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuning
ย 
mPLUG
mPLUGmPLUG
mPLUG
ย 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
ย 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
ย 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithm
ย 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networks
ย 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarization
ย 
ProximalPolicyOptimization
ProximalPolicyOptimizationProximalPolicyOptimization
ProximalPolicyOptimization
ย 

Recently uploaded

Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
ย 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
ย 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
ย 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
ย 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
ย 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
ย 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
ย 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
ย 
CHEAP Call Girls in Saket (-DELHI )๐Ÿ” 9953056974๐Ÿ”(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )๐Ÿ” 9953056974๐Ÿ”(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )๐Ÿ” 9953056974๐Ÿ”(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )๐Ÿ” 9953056974๐Ÿ”(=)/CALL GIRLS SERVICE9953056974 Low Rate Call Girls In Saket, Delhi NCR
ย 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
ย 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
ย 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
ย 
Delhi Call Girls CP 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Callshivangimorya083
ย 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
ย 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
ย 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
ย 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171โœ”๏ธBody to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171โœ”๏ธBody to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171โœ”๏ธBody to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171โœ”๏ธBody to body massage wit...shivangimorya083
ย 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
ย 

Recently uploaded (20)

Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
ย 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
ย 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
ย 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
ย 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
ย 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
ย 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
ย 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
ย 
CHEAP Call Girls in Saket (-DELHI )๐Ÿ” 9953056974๐Ÿ”(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )๐Ÿ” 9953056974๐Ÿ”(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )๐Ÿ” 9953056974๐Ÿ”(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )๐Ÿ” 9953056974๐Ÿ”(=)/CALL GIRLS SERVICE
ย 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
ย 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
ย 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
ย 
Delhi Call Girls CP 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Call
ย 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
ย 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
ย 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
ย 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
ย 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
ย 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171โœ”๏ธBody to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171โœ”๏ธBody to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171โœ”๏ธBody to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171โœ”๏ธBody to body massage wit...
ย 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
ย 

Dream2Control paper review

  • 1. Dream To Control: Learning Behaviors by Latent Imagination LEE, DOHYEON leadh991114@gmail.com 4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต ICLR 2020 (Oral) NeurIPS Deep RL Workshop 2019 (Oral)
  • 2. Contents 1. Introduction 2. Methods 3. Experiments 4. Conclusion 4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต 1
  • 3. 1. RL Comparison 4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต 2 INDEX Introduction Methods Performance Conclusion Model-Free RL โ€ข No Model โ€ข Learn value function(and/or policy) from real experience Modelโ€“Based RL โ€ข Learn a model from real experience โ€ข Plan value function(and/or policy) from the simulated experience RL Comparison, from slides of Sergey Levine
  • 4. 2. World Model 4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต 3 INDEX Introduction Methods Performance Conclusion โ€œIntelligent agents can achieve goals in complex environments even through they never encouter the exact same situation twice.โ€ โ€œThis ability requires building representations of the world from past experience that enable generalization to novel situations.โ€ โ€œWorld models offer an explicit way to represent an agentโ€™s knowledge about the world in a parametric model that can make predictions about the futureโ€ A World Model, from Scott McCloudโ€™s Understanding Comics.
  • 5. 3. Visual Control 4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต 4 INDEX Introduction Methods Performance Conclusion โ€œSensory inputs are high-dimensional images, latent dynamic models can abstract observations to predict forward in compact state spaces.โ€ โ†’ latent states have a small memory footprint โ€œBehaviors can be derived from dynamic models in many ways.โ€ โ†’ Considering only rewards within a fixed imagination horizon results in shortsighted behaviors โ†’ Prior work commonly resorts to derivative-free optimization for robustness
  • 6. 4. PlaNet 4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต 5 INDEX Introduction Methods Performance Conclusion An RL agent that learns the environment dynamics from images and chooses actions through fast online planning in latent space. Learning Latent Dynamics for Planning from Pixels(Danijar Hafner et al., 2019)
  • 7. 5. Dreamer 4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต 6 INDEX Introduction Methods Performance Conclusion An RL agent that learns long-horizontal behaviors from images purely by latent imagination. The three processes of the Dreamer agent. 1. The world model is learned from past experience. 2. From predictions of this model, the agent then learns a value network to predict future rewards and an actor network to select actions. 3. The actor network is used to interact with the environment.
  • 8. QnA 4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต 7 INDEX Introduction Methods Performance Conclusion
  • 9. 1. Learning the World Model 4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต 8 INDEX Introduction Methods Performance Conclusion Dreamer learns a world model from experience. Using past images ๐‘œ1 ~ ๐‘œ3 and actions ๐‘Ž1 ~ ๐‘Ž2, it computes a sequence of compact model states (green circles) from which it reconstructs the images ๐‘œ1 ~ ๐‘œ3 and predicts the rewards ๐‘Ÿ1 ~ ๐‘Ÿ3. โ†’ Leveraging PlaNet
  • 10. 2. Learning Behavior in Imagination 4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต 9 INDEX Introduction Methods Performance Conclusion Dreamer learns long-sighted behaviors from predicted sequences of model states. It first learns the long-term value ๐‘ฃ2 ~ ๐‘ฃ3 of each state, and then predicts actions ๐‘Ž1 ~ ๐‘Ž2 that lead to high rewards and values by backpropagating them through the state sequence to the actor network.
  • 11. 2. Learning Behavior in Imagination 4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต 10 INDEX Introduction Methods Performance Conclusion PlaNet vs Dreamer โ€ข For a given situation in the environment, PlaNet searches for the best action among many predictions for different action sequences. โ€ข Dreamer side-steps this expensive search by decoupling planning and acting. Once its actor network has been trained on predicted sequences, it computes the actions for interacting with the environment without additional search. In addition, Dreamer considers rewards beyond the planning horizon using a value function and leverages backpropagation for efficient planning.
  • 12. 3. Act in the Environment 4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต 11 INDEX Introduction Methods Performance Conclusion The agent encodes the history of the episode to compute the current model state and the next action to execute in the environment.
  • 13. 4. Explained 4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต 12 INDEX Introduction Methods Performance Conclusion โ†– PlaNet(Omitted.)
  • 14. 4. Explained 4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต 13 INDEX Introduction Methods Performance Conclusion 1. Transition Model 2. Reward Model 3. Policy 4. Objective (Imgained Rewards) Actor-Critic Method From papers on Deep Multi-Agent(Taiki Fuji et al.) 5. Actor-Critic Model (Parametrized by ๐œ™ & ๐œ“, respectively)
  • 15. 4. Explained 4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต 14 INDEX Introduction Methods Performance Conclusion 6. Value Estimation 7. Learning Object Flow of Actor-Critic Method From slides of Deep RL, Sergey Levine w/ Imgained Trajectories โ†– Exponential decaying for old Trajectory โ†– Rewards beyond k steps with the learned value model โ†– Actor โ†– Critic
  • 16. QnA 4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต 15 INDEX Introduction Methods Performance Conclusion
  • 17. 1. Control Tasks 4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต 16 INDEX Introduction Methods Performance Conclusion โ€ข Dreamer learns to solve 20 challenging continuous control tasks with image inputs, 5 of which are displayed here. The tasks are designed to pose a variety of challenges to the RL agent, including difficult to predict collisions, sparse rewards, chaotic dynamics, small but relevant objects, high degrees of freedom, and 3D perspectives โ€ข The visualizations show the same 64x64 images that the agent receives from the environment.
  • 18. 2. Comparison 4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต 17 INDEX Introduction Methods Performance Conclusion Dreamer outperforms the previous best model-free (D4PG) and model-based (PlaNet) methods on the benchmark of 20 tasks in terms of final performance, data efficiency, and computation time.
  • 19. 3. Atari Games 4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต 18 INDEX Introduction Methods Performance Conclusion Dreamer learns successful behaviors on Atari games and DeepMind Lab levels, which feature discrete actions and visually more diverse scenes, including 3D environments with multiple objects.
  • 20. QnA 4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต 19 INDEX Introduction Methods Performance Conclusion
  • 21. Conclusion 4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต 20 INDEX Introduction Methods Performance Conclusion 1. Learning behaviors from sequences predicted by world models alone can solve challenging visual control tasks from image inputs, surpassing the performance of previous model-free approaches. 2. Dreamer demonstrates that learning behaviors by backpropagating value gradients through predicted sequences of compact model states is successful and robust, solving a diverse collection of continuous and discrete control tasks.
  • 22. My Questions 4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต 21 INDEX Introduction Methods Performance Conclusion 1. Is there any relation between world model and โ€œcommon senseโ€ mentioned by Yann LeCun? 2. Is there any evidence for the mechanism of human prediction and dream? What we see is based on our brainโ€™s prediction of the future, A. Kitaoka.Kanzen. 2002.
  • 23. Dreamer Series! 4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต 22 INDEX Introduction Methods Performance Conclusion
  • 24. Dreamer Series! 4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต 23 INDEX Introduction Methods Performance Conclusion
  • 25. Thank You For Your Listening! 4/18/2023 ๋”ฅ๋…ผ์ฝ ์„ธ๋ฏธ๋‚˜ - ๊ฐ•ํ™”ํ•™์Šต 24 INDEX Introduction Methods Performance Conclusion Learning Behaviors by Latent Imagination DQN model for Text2image, PixRay