SlideShare une entreprise Scribd logo
Kyonggi Univ. AI Lab.
STOCHASTIC LATENT ACTOR-CRITIC : DEEP REINFORCEMENT
LEARNING WITH A LATENT VARIABLE MODEL
2020.11.16
정규열
Artificial Intelligence Lab
Kyonggi Univiersity
Kyonggi Univ. AI Lab.
Index
 도입 배경
 SLAC (stochastic latent actor-critic)
 실험
 결론 및 의견
Kyonggi Univ. AI Lab.
도입 배경
Kyonggi Univ. AI Lab.
도입 배경
 고 차원 이미지로 학습 하는 것은 어려운 일이다.
 다음 두가지를 해결해야 한다.
 표현 학습(representation learning)
 행동 학습(task learning)
 SLAC을 제안함
 고차원의 이미지에서 latent representation 을 학습한다.
 VAE(변분적 오토 인코더)를 도입 하였다.
 latent representation으로 부터 강화학습을 진행한다.
 Soft Actor-Critic을 도입 하였다.
• 원 저자 코드 (tensorflow): https://github.com/alexlee-gk/slac
• Pytorch 코드 : https://github.com/ku2482/slac.pytorch
Kyonggi Univ. AI Lab.
SLAC (STOCHASTIC LATENT ACTOR-CRITIC)
Kyonggi Univ. AI Lab.
SLAC (stochastic latent actor-critic)
 학습 과정
1단계 : latent 학습(3H)
2단계 : latent 학습 및 강화학습 진행(20H)
• 행동을 임의대로 설정하여 행동과
이미지를 확보한다.
• 확보한 이미지로 latent를 학습한다.
• 학습된 latent를 이용하여 강화학습을
진행한다.
• 탐색을 장려하기 위한 Soft-Actor-Critic
을 이용한다.
2080TI로 학습 시 거의 24시간 소요되었음
Kyonggi Univ. AI Lab.
SLAC (stochastic latent actor-critic)
 1단계 : latent 학습을 우선 진행한다.
 일정 time-step 만큼 설정하여 데이터를 모은다.
 State, action등
 이 데이터들을 이용하여 VAE를 학습한다.
 학습 후 올바른 latent(z)를 얻을 수 있다.
state
실제로는 CNN을 사용함.
Kyonggi Univ. AI Lab.
SLAC (stochastic latent actor-critic)
 VAE (변분적 오토 인코더)
차원을 축소하여 알짜 정보(latent)를 추출함
Encoder Decoder
차원축소
변분적 추론 : latent 분포를 간단한 확률 분포로 근사 한다.
𝒑 𝒛 𝒙) ≈ 𝒑(𝒛)
Kyonggi Univ. AI Lab.
SLAC (stochastic latent actor-critic)
 2단계 : latent와 강화학습 진행한다.
 Soft actor-critic 도입함
Latent 학습
Critic 학습
Actor 학습
Kyonggi Univ. AI Lab.
SLAC (stochastic latent actor-critic)
 SAC (soft Actor-Critic)의 도입 목적
 Exploration 과 Exploitation간의 Trade Off를 해결 하고자 함.
 On-Policy에 대한 sample의 비효율성을 해결하고자 함.
Entropy RL
일반적 RL
Entropy
• 탐색을 더 진행하게 된다
• 보상이 많이 낮은 행동을 시도할 위험도 적어진
hyperparameter
• Entropy 반영 크기 조절
• 옵션 1 : 고정 값으로 사용
• 옵션 2 : 변동 값으로 사용
Entropy 값에 따라 조절 한다.
Kyonggi Univ. AI Lab.
실험
Kyonggi Univ. AI Lab.
실험
 실험 환경
cheetah walker ball-in-
cup catch
finger spin
half cheetah walker hopper ant
DeepMind Control
Open AI
Kyonggi Univ. AI Lab.
실험
 환경 예시 (cheetah)
Kyonggi Univ. AI Lab.
실험
 정량적 평가
 이미지로 학습하는 모델 들과의 비교(DeepMind Control)
전반적으로 제안한 SLAC의 성능이 좋은 편이다.
Kyonggi Univ. AI Lab.
실험
 정량적 평가
 이미지로 학습하는 모델 들과의 비교(Open AI)
전반적으로 제안한 SLAC의 성능이 좋은 편이다.
Kyonggi Univ. AI Lab.
실험
 정성적 평가 (cheetah)
Encoder Decoder
Ground Truth
Decoder로 부터 생성된 순서 이미지
Latent로 부터 생성된 순서 이미지
Encoder로 부터 생성된 순서 이미지
Kyonggi Univ. AI Lab.
실험
 자체 실험 결과 (cheetah)
 Latent
Decoder loss KL loss
고차원 이미지를 시간이 지날수록 잘 처리 하였다.
Kyonggi Univ. AI Lab.
실험
 자체 실험 결과 (cheetah)
 강화학습
Return α 값 entropy
• 성능은 논문과 비슷한 수준으로 나왔다
• Entropy 값에 따라 탐색의 정도가 달라졌다.
• 이에 맞춰 α값 또한 조절 되었다.
Kyonggi Univ. AI Lab.
결론 및 의견
Kyonggi Univ. AI Lab.
결론 및 의견
 논문의 결론
 고차원의 이미지로 부터 강화학습을 진행 하고자 함
 Latent를 이용하여 진행한다.
 VAE기반으로 변분적 추론을 한다.
 이후 Soft Actor-Critic을 통하여 강화학습을 진행한다.
 Exploration 과 Exploitation간의 Trade Off를 해결 할 수 있다.
 On-Policy에 대한 sample의 비효율성을 해결 할 수 있다.
Kyonggi Univ. AI Lab.
결론 및 의견
 개인적 의견
 이미지 기반의 학습일 경우
 복잡한 환경이면 Latent 자체 학습도 오래 소요 될 것으로 판단됨.
 Cheetah의 경우는 3시간 소요 되었다.
 이미지 투사 위치가 달라지면 재 학습 시켜야 한다.
 병렬적으로 학습 진행을 하는게 좋다고 판단됨.
 Soft Actor-Critic에서 α 관련(개인 경험적 사례)
 쉬운 Task는 고정 값을 사용해도 무방
 복잡 할 수록 변동 값을 사용하는 것이 좋을 듯 함.

Contenu connexe

Tendances

距離まとめられませんでした
距離まとめられませんでした距離まとめられませんでした
距離まとめられませんでしたHaruka Ozaki
 
Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine Learning
Sri Ambati
 
Contrastive learning 20200607
Contrastive learning 20200607Contrastive learning 20200607
Contrastive learning 20200607
ぱんいち すみもと
 
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...
Deep Learning JP
 
[論文紹介] Understanding and improving transformer from a multi particle dynamic ...
[論文紹介] Understanding and improving transformer from a multi particle dynamic ...[論文紹介] Understanding and improving transformer from a multi particle dynamic ...
[論文紹介] Understanding and improving transformer from a multi particle dynamic ...
Makoto Takenaka
 
GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)
Masahiro Suzuki
 
SSDC2022 - AI for Everyone 딥러닝 논문읽고 성장하는 모임이야기
SSDC2022 - AI for Everyone 딥러닝 논문읽고 성장하는 모임이야기SSDC2022 - AI for Everyone 딥러닝 논문읽고 성장하는 모임이야기
SSDC2022 - AI for Everyone 딥러닝 논문읽고 성장하는 모임이야기
taeseon ryu
 
SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法
SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法
SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法
SSII
 
強化学習技術とゲーム AI 〜 今できる事と今後できて欲しい事 〜
強化学習技術とゲーム AI  〜 今できる事と今後できて欲しい事 〜強化学習技術とゲーム AI  〜 今できる事と今後できて欲しい事 〜
強化学習技術とゲーム AI 〜 今できる事と今後できて欲しい事 〜
佑 甲野
 
【DL輪読会】Prompting Decision Transformer for Few-Shot Policy Generalization
【DL輪読会】Prompting Decision Transformer for Few-Shot Policy Generalization【DL輪読会】Prompting Decision Transformer for Few-Shot Policy Generalization
【DL輪読会】Prompting Decision Transformer for Few-Shot Policy Generalization
Deep Learning JP
 
PRML 10.4 - 10.6
PRML 10.4 - 10.6PRML 10.4 - 10.6
PRML 10.4 - 10.6
Akira Miyazawa
 
深層学習の数理:カーネル法, スパース推定との接点
深層学習の数理:カーネル法, スパース推定との接点深層学習の数理:カーネル法, スパース推定との接点
深層学習の数理:カーネル法, スパース推定との接点
Taiji Suzuki
 
A (Very) Gentle Introduction to Generative Adversarial Networks (a.k.a GANs)
 A (Very) Gentle Introduction to Generative Adversarial Networks (a.k.a GANs) A (Very) Gentle Introduction to Generative Adversarial Networks (a.k.a GANs)
A (Very) Gentle Introduction to Generative Adversarial Networks (a.k.a GANs)
Thomas da Silva Paula
 
[DL輪読会]Xception: Deep Learning with Depthwise Separable Convolutions
[DL輪読会]Xception: Deep Learning with Depthwise Separable Convolutions[DL輪読会]Xception: Deep Learning with Depthwise Separable Convolutions
[DL輪読会]Xception: Deep Learning with Depthwise Separable Convolutions
Deep Learning JP
 
Transformerを用いたAutoEncoderの設計と実験
Transformerを用いたAutoEncoderの設計と実験Transformerを用いたAutoEncoderの設計と実験
Transformerを用いたAutoEncoderの設計と実験
myxymyxomatosis
 
【DL輪読会】Segment Anything
【DL輪読会】Segment Anything【DL輪読会】Segment Anything
【DL輪読会】Segment Anything
Deep Learning JP
 
Flow based generative models
Flow based generative modelsFlow based generative models
Flow based generative models
수철 박
 
[DL輪読会]Recent Advances in Autoencoder-Based Representation Learning
[DL輪読会]Recent Advances in Autoencoder-Based Representation Learning[DL輪読会]Recent Advances in Autoencoder-Based Representation Learning
[DL輪読会]Recent Advances in Autoencoder-Based Representation Learning
Deep Learning JP
 
Finding connections among images using CycleGAN
Finding connections among images using CycleGANFinding connections among images using CycleGAN
Finding connections among images using CycleGAN
NAVER Engineering
 
論文紹介-Multi-Objective Deep Reinforcement Learning
論文紹介-Multi-Objective Deep Reinforcement Learning論文紹介-Multi-Objective Deep Reinforcement Learning
論文紹介-Multi-Objective Deep Reinforcement Learning
Shunta Nomura
 

Tendances (20)

距離まとめられませんでした
距離まとめられませんでした距離まとめられませんでした
距離まとめられませんでした
 
Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine Learning
 
Contrastive learning 20200607
Contrastive learning 20200607Contrastive learning 20200607
Contrastive learning 20200607
 
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...
 
[論文紹介] Understanding and improving transformer from a multi particle dynamic ...
[論文紹介] Understanding and improving transformer from a multi particle dynamic ...[論文紹介] Understanding and improving transformer from a multi particle dynamic ...
[論文紹介] Understanding and improving transformer from a multi particle dynamic ...
 
GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)
 
SSDC2022 - AI for Everyone 딥러닝 논문읽고 성장하는 모임이야기
SSDC2022 - AI for Everyone 딥러닝 논문읽고 성장하는 모임이야기SSDC2022 - AI for Everyone 딥러닝 논문읽고 성장하는 모임이야기
SSDC2022 - AI for Everyone 딥러닝 논문읽고 성장하는 모임이야기
 
SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法
SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法
SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法
 
強化学習技術とゲーム AI 〜 今できる事と今後できて欲しい事 〜
強化学習技術とゲーム AI  〜 今できる事と今後できて欲しい事 〜強化学習技術とゲーム AI  〜 今できる事と今後できて欲しい事 〜
強化学習技術とゲーム AI 〜 今できる事と今後できて欲しい事 〜
 
【DL輪読会】Prompting Decision Transformer for Few-Shot Policy Generalization
【DL輪読会】Prompting Decision Transformer for Few-Shot Policy Generalization【DL輪読会】Prompting Decision Transformer for Few-Shot Policy Generalization
【DL輪読会】Prompting Decision Transformer for Few-Shot Policy Generalization
 
PRML 10.4 - 10.6
PRML 10.4 - 10.6PRML 10.4 - 10.6
PRML 10.4 - 10.6
 
深層学習の数理:カーネル法, スパース推定との接点
深層学習の数理:カーネル法, スパース推定との接点深層学習の数理:カーネル法, スパース推定との接点
深層学習の数理:カーネル法, スパース推定との接点
 
A (Very) Gentle Introduction to Generative Adversarial Networks (a.k.a GANs)
 A (Very) Gentle Introduction to Generative Adversarial Networks (a.k.a GANs) A (Very) Gentle Introduction to Generative Adversarial Networks (a.k.a GANs)
A (Very) Gentle Introduction to Generative Adversarial Networks (a.k.a GANs)
 
[DL輪読会]Xception: Deep Learning with Depthwise Separable Convolutions
[DL輪読会]Xception: Deep Learning with Depthwise Separable Convolutions[DL輪読会]Xception: Deep Learning with Depthwise Separable Convolutions
[DL輪読会]Xception: Deep Learning with Depthwise Separable Convolutions
 
Transformerを用いたAutoEncoderの設計と実験
Transformerを用いたAutoEncoderの設計と実験Transformerを用いたAutoEncoderの設計と実験
Transformerを用いたAutoEncoderの設計と実験
 
【DL輪読会】Segment Anything
【DL輪読会】Segment Anything【DL輪読会】Segment Anything
【DL輪読会】Segment Anything
 
Flow based generative models
Flow based generative modelsFlow based generative models
Flow based generative models
 
[DL輪読会]Recent Advances in Autoencoder-Based Representation Learning
[DL輪読会]Recent Advances in Autoencoder-Based Representation Learning[DL輪読会]Recent Advances in Autoencoder-Based Representation Learning
[DL輪読会]Recent Advances in Autoencoder-Based Representation Learning
 
Finding connections among images using CycleGAN
Finding connections among images using CycleGANFinding connections among images using CycleGAN
Finding connections among images using CycleGAN
 
論文紹介-Multi-Objective Deep Reinforcement Learning
論文紹介-Multi-Objective Deep Reinforcement Learning論文紹介-Multi-Objective Deep Reinforcement Learning
論文紹介-Multi-Objective Deep Reinforcement Learning
 

Similaire à Stochastic latent actor critic - deep reinforcement learning with a latent variable model

Marl의 개념 및 군사용 적용방안
Marl의 개념 및 군사용 적용방안Marl의 개념 및 군사용 적용방안
Marl의 개념 및 군사용 적용방안
KyuYeolJung
 
Style gan
Style ganStyle gan
Style gan
KyuYeolJung
 
MARL based on role
MARL based on roleMARL based on role
MARL based on role
KyuYeolJung
 
Action Genome: Action As Composition of Spatio Temporal Scene Graphs
Action Genome: Action As Composition of Spatio Temporal Scene GraphsAction Genome: Action As Composition of Spatio Temporal Scene Graphs
Action Genome: Action As Composition of Spatio Temporal Scene Graphs
Sangmin Woo
 
Image Translation with GAN
Image Translation with GANImage Translation with GAN
Image Translation with GAN
Junho Cho
 
Avihu Efrat's Viola and Jones face detection slides
Avihu Efrat's Viola and Jones face detection slidesAvihu Efrat's Viola and Jones face detection slides
Avihu Efrat's Viola and Jones face detection slides
wolf
 
Prospector Osq 2004 Final
Prospector Osq 2004 FinalProspector Osq 2004 Final
Prospector Osq 2004 Finalkurniawan.kuga
 
Python metaprogramming in linear time language for automated runtime verifica...
Python metaprogramming in linear time language for automated runtime verifica...Python metaprogramming in linear time language for automated runtime verifica...
Python metaprogramming in linear time language for automated runtime verifica...
ISSEL
 
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
ISSEL
 
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
ISSEL
 
plug and play language models a simple approach to controlled text generation
plug and play language models a simple approach to controlled text generationplug and play language models a simple approach to controlled text generation
plug and play language models a simple approach to controlled text generation
KyuYeolJung
 
alexVAE_New.pdf
alexVAE_New.pdfalexVAE_New.pdf
alexVAE_New.pdf
sourabhgothe1
 
Tools for Building Confidence in Using Simulation To Inform or Replace Real-W...
Tools for Building Confidence in Using Simulation To Inform or Replace Real-W...Tools for Building Confidence in Using Simulation To Inform or Replace Real-W...
Tools for Building Confidence in Using Simulation To Inform or Replace Real-W...
Kieran Alden
 
Andrii Belas "Overview of object detection approaches: cases, algorithms and...
Andrii Belas  "Overview of object detection approaches: cases, algorithms and...Andrii Belas  "Overview of object detection approaches: cases, algorithms and...
Andrii Belas "Overview of object detection approaches: cases, algorithms and...
Lviv Startup Club
 
Object detection
Object detectionObject detection
Object detection
Somesh Vyas
 
Cpgan content-parsing generative
Cpgan   content-parsing generativeCpgan   content-parsing generative
Cpgan content-parsing generative
KyuYeolJung
 
Continual Reinforcement Learning in 3D Non-stationary Environments
Continual Reinforcement Learning in 3D Non-stationary EnvironmentsContinual Reinforcement Learning in 3D Non-stationary Environments
Continual Reinforcement Learning in 3D Non-stationary Environments
Vincenzo Lomonaco
 
Software Engineering for Robotics - The RoboStar Technology
Software Engineering for Robotics - The RoboStar TechnologySoftware Engineering for Robotics - The RoboStar Technology
Software Engineering for Robotics - The RoboStar Technology
AdaCore
 
DLint: dynamically checking bad coding practices in JavaScript (ISSTA'15 Slides)
DLint: dynamically checking bad coding practices in JavaScript (ISSTA'15 Slides)DLint: dynamically checking bad coding practices in JavaScript (ISSTA'15 Slides)
DLint: dynamically checking bad coding practices in JavaScript (ISSTA'15 Slides)
Liang Gong
 
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII
 

Similaire à Stochastic latent actor critic - deep reinforcement learning with a latent variable model (20)

Marl의 개념 및 군사용 적용방안
Marl의 개념 및 군사용 적용방안Marl의 개념 및 군사용 적용방안
Marl의 개념 및 군사용 적용방안
 
Style gan
Style ganStyle gan
Style gan
 
MARL based on role
MARL based on roleMARL based on role
MARL based on role
 
Action Genome: Action As Composition of Spatio Temporal Scene Graphs
Action Genome: Action As Composition of Spatio Temporal Scene GraphsAction Genome: Action As Composition of Spatio Temporal Scene Graphs
Action Genome: Action As Composition of Spatio Temporal Scene Graphs
 
Image Translation with GAN
Image Translation with GANImage Translation with GAN
Image Translation with GAN
 
Avihu Efrat's Viola and Jones face detection slides
Avihu Efrat's Viola and Jones face detection slidesAvihu Efrat's Viola and Jones face detection slides
Avihu Efrat's Viola and Jones face detection slides
 
Prospector Osq 2004 Final
Prospector Osq 2004 FinalProspector Osq 2004 Final
Prospector Osq 2004 Final
 
Python metaprogramming in linear time language for automated runtime verifica...
Python metaprogramming in linear time language for automated runtime verifica...Python metaprogramming in linear time language for automated runtime verifica...
Python metaprogramming in linear time language for automated runtime verifica...
 
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
 
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
 
plug and play language models a simple approach to controlled text generation
plug and play language models a simple approach to controlled text generationplug and play language models a simple approach to controlled text generation
plug and play language models a simple approach to controlled text generation
 
alexVAE_New.pdf
alexVAE_New.pdfalexVAE_New.pdf
alexVAE_New.pdf
 
Tools for Building Confidence in Using Simulation To Inform or Replace Real-W...
Tools for Building Confidence in Using Simulation To Inform or Replace Real-W...Tools for Building Confidence in Using Simulation To Inform or Replace Real-W...
Tools for Building Confidence in Using Simulation To Inform or Replace Real-W...
 
Andrii Belas "Overview of object detection approaches: cases, algorithms and...
Andrii Belas  "Overview of object detection approaches: cases, algorithms and...Andrii Belas  "Overview of object detection approaches: cases, algorithms and...
Andrii Belas "Overview of object detection approaches: cases, algorithms and...
 
Object detection
Object detectionObject detection
Object detection
 
Cpgan content-parsing generative
Cpgan   content-parsing generativeCpgan   content-parsing generative
Cpgan content-parsing generative
 
Continual Reinforcement Learning in 3D Non-stationary Environments
Continual Reinforcement Learning in 3D Non-stationary EnvironmentsContinual Reinforcement Learning in 3D Non-stationary Environments
Continual Reinforcement Learning in 3D Non-stationary Environments
 
Software Engineering for Robotics - The RoboStar Technology
Software Engineering for Robotics - The RoboStar TechnologySoftware Engineering for Robotics - The RoboStar Technology
Software Engineering for Robotics - The RoboStar Technology
 
DLint: dynamically checking bad coding practices in JavaScript (ISSTA'15 Slides)
DLint: dynamically checking bad coding practices in JavaScript (ISSTA'15 Slides)DLint: dynamically checking bad coding practices in JavaScript (ISSTA'15 Slides)
DLint: dynamically checking bad coding practices in JavaScript (ISSTA'15 Slides)
 
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
 

Dernier

Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 

Dernier (20)

Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 

Stochastic latent actor critic - deep reinforcement learning with a latent variable model

  • 1. Kyonggi Univ. AI Lab. STOCHASTIC LATENT ACTOR-CRITIC : DEEP REINFORCEMENT LEARNING WITH A LATENT VARIABLE MODEL 2020.11.16 정규열 Artificial Intelligence Lab Kyonggi Univiersity
  • 2. Kyonggi Univ. AI Lab. Index  도입 배경  SLAC (stochastic latent actor-critic)  실험  결론 및 의견
  • 3. Kyonggi Univ. AI Lab. 도입 배경
  • 4. Kyonggi Univ. AI Lab. 도입 배경  고 차원 이미지로 학습 하는 것은 어려운 일이다.  다음 두가지를 해결해야 한다.  표현 학습(representation learning)  행동 학습(task learning)  SLAC을 제안함  고차원의 이미지에서 latent representation 을 학습한다.  VAE(변분적 오토 인코더)를 도입 하였다.  latent representation으로 부터 강화학습을 진행한다.  Soft Actor-Critic을 도입 하였다. • 원 저자 코드 (tensorflow): https://github.com/alexlee-gk/slac • Pytorch 코드 : https://github.com/ku2482/slac.pytorch
  • 5. Kyonggi Univ. AI Lab. SLAC (STOCHASTIC LATENT ACTOR-CRITIC)
  • 6. Kyonggi Univ. AI Lab. SLAC (stochastic latent actor-critic)  학습 과정 1단계 : latent 학습(3H) 2단계 : latent 학습 및 강화학습 진행(20H) • 행동을 임의대로 설정하여 행동과 이미지를 확보한다. • 확보한 이미지로 latent를 학습한다. • 학습된 latent를 이용하여 강화학습을 진행한다. • 탐색을 장려하기 위한 Soft-Actor-Critic 을 이용한다. 2080TI로 학습 시 거의 24시간 소요되었음
  • 7. Kyonggi Univ. AI Lab. SLAC (stochastic latent actor-critic)  1단계 : latent 학습을 우선 진행한다.  일정 time-step 만큼 설정하여 데이터를 모은다.  State, action등  이 데이터들을 이용하여 VAE를 학습한다.  학습 후 올바른 latent(z)를 얻을 수 있다. state 실제로는 CNN을 사용함.
  • 8. Kyonggi Univ. AI Lab. SLAC (stochastic latent actor-critic)  VAE (변분적 오토 인코더) 차원을 축소하여 알짜 정보(latent)를 추출함 Encoder Decoder 차원축소 변분적 추론 : latent 분포를 간단한 확률 분포로 근사 한다. 𝒑 𝒛 𝒙) ≈ 𝒑(𝒛)
  • 9. Kyonggi Univ. AI Lab. SLAC (stochastic latent actor-critic)  2단계 : latent와 강화학습 진행한다.  Soft actor-critic 도입함 Latent 학습 Critic 학습 Actor 학습
  • 10. Kyonggi Univ. AI Lab. SLAC (stochastic latent actor-critic)  SAC (soft Actor-Critic)의 도입 목적  Exploration 과 Exploitation간의 Trade Off를 해결 하고자 함.  On-Policy에 대한 sample의 비효율성을 해결하고자 함. Entropy RL 일반적 RL Entropy • 탐색을 더 진행하게 된다 • 보상이 많이 낮은 행동을 시도할 위험도 적어진 hyperparameter • Entropy 반영 크기 조절 • 옵션 1 : 고정 값으로 사용 • 옵션 2 : 변동 값으로 사용 Entropy 값에 따라 조절 한다.
  • 11. Kyonggi Univ. AI Lab. 실험
  • 12. Kyonggi Univ. AI Lab. 실험  실험 환경 cheetah walker ball-in- cup catch finger spin half cheetah walker hopper ant DeepMind Control Open AI
  • 13. Kyonggi Univ. AI Lab. 실험  환경 예시 (cheetah)
  • 14. Kyonggi Univ. AI Lab. 실험  정량적 평가  이미지로 학습하는 모델 들과의 비교(DeepMind Control) 전반적으로 제안한 SLAC의 성능이 좋은 편이다.
  • 15. Kyonggi Univ. AI Lab. 실험  정량적 평가  이미지로 학습하는 모델 들과의 비교(Open AI) 전반적으로 제안한 SLAC의 성능이 좋은 편이다.
  • 16. Kyonggi Univ. AI Lab. 실험  정성적 평가 (cheetah) Encoder Decoder Ground Truth Decoder로 부터 생성된 순서 이미지 Latent로 부터 생성된 순서 이미지 Encoder로 부터 생성된 순서 이미지
  • 17. Kyonggi Univ. AI Lab. 실험  자체 실험 결과 (cheetah)  Latent Decoder loss KL loss 고차원 이미지를 시간이 지날수록 잘 처리 하였다.
  • 18. Kyonggi Univ. AI Lab. 실험  자체 실험 결과 (cheetah)  강화학습 Return α 값 entropy • 성능은 논문과 비슷한 수준으로 나왔다 • Entropy 값에 따라 탐색의 정도가 달라졌다. • 이에 맞춰 α값 또한 조절 되었다.
  • 19. Kyonggi Univ. AI Lab. 결론 및 의견
  • 20. Kyonggi Univ. AI Lab. 결론 및 의견  논문의 결론  고차원의 이미지로 부터 강화학습을 진행 하고자 함  Latent를 이용하여 진행한다.  VAE기반으로 변분적 추론을 한다.  이후 Soft Actor-Critic을 통하여 강화학습을 진행한다.  Exploration 과 Exploitation간의 Trade Off를 해결 할 수 있다.  On-Policy에 대한 sample의 비효율성을 해결 할 수 있다.
  • 21. Kyonggi Univ. AI Lab. 결론 및 의견  개인적 의견  이미지 기반의 학습일 경우  복잡한 환경이면 Latent 자체 학습도 오래 소요 될 것으로 판단됨.  Cheetah의 경우는 3시간 소요 되었다.  이미지 투사 위치가 달라지면 재 학습 시켜야 한다.  병렬적으로 학습 진행을 하는게 좋다고 판단됨.  Soft Actor-Critic에서 α 관련(개인 경험적 사례)  쉬운 Task는 고정 값을 사용해도 무방  복잡 할 수록 변동 값을 사용하는 것이 좋을 듯 함.