SlideShare une entreprise Scribd logo
1  sur  27
Télécharger pour lire hors ligne
2023.02.22.
Sangwoo Mo
1
• Some data naturally inherits a hierarchical tree structure
Motivation
2
Image from https://towardsdatascience.com/https-medium-com-noa-weiss-the-hitchhikers-guide-to-hierarchical-classification-f8428ea1e076
Hierarchy of classes Tree structures of episodes
• Euclidian space may not reflect the hierarchical structure well
Motivation
3
Pets
Cats
Dogs
Can we say that…
d( pets, dogs ) > d( cats, dogs ) ?
• Hyperbolic space is a natural choice to embed trees
• Embed a few parents (e.g., “pets”) near center, and many children (e.g., “dogs”) near boundary
• In hyperbolic space, distance exponentially grows near boundary, suitable to embedding many children
Motivation
4
Image from Peng et al., “Hyperbolic Deep Neural Networks: A Survey”
Pets
Dogs
Cats
• Why I choose this paper?
• Interpreting a sequence (e.g., episode, video) as a tree is an interesting and useful idea
• It is the first paper to make hyperbolic embedding to work in RL
• Contributions
• The concept of hyperbolic space is appealing… but was not successful in practice
• It is mostly due to an optimization issue, and a simple regularization trick makes it work well
• Trick. Reduce the gradient norm of NN
(apply spectral normalization (SN) and then rescaling outputs)
TL;DR
5
• Episodes moves from the root (center) to leaves (boundary) as the agent progresses
• Green lines (good policy) moves from the center to boundary
• Red lines (random policy) moves to random directions
• Hyperbolic embeddings give a natural explanations for the RL agents
Results
6
• S-RYM (proposed method) improves the learned policies
• Apply S-RYM to PPO (policy gradient) on Procgen
• Reducing embedding dim to 32 improves the performance (hyperbolic space more efficiently embeds episodes)
Results
7
• S-RYM (proposed method) improves the learned policies
• Apply S-RYM to Rainbow (Q-learning) on Atari 100K
Results
8
• Motivation. Deep RL policies already makes the embeddings to be hyperbolic
• Return (both train -- and test –) increases as 𝜹-hyperbolicity decreases
Method – What’s new?
9
• Motivation. Deep RL policies already makes the embeddings to be hyperbolic
• Return (both train -- and test –) increases as 𝜹-hyperbolicity decreases
• Gap (of train -- and test –) enlarges when 𝜹-hyperbolicity shows U-curve
Method – What’s new?
10
• Motivation. Deep RL policies already makes the embeddings to be hyperbolic
• Return (both train -- and test –) increases as 𝜹-hyperbolicity decreases
• Gap (of train -- and test –) enlarges when 𝜹-hyperbolicity shows U-curve
Method – What’s new?
11
𝜹-hyperbolicity
See https://en.wikipedia.org/wiki/Hyperbolic_metric_space for detailed explanations for 𝛿-hyperbolicity
• Motivation. However, naïve implementation of hyperbolic PPO is not successful
• Hyperbolic PPO is worse than PPO
• Return of hyperbolic models are worse than PPO
Method – What’s new?
12
• Motivation. However, naïve implementation of hyperbolic PPO is not successful
• Hyperbolic PPO is worse than PPO
• …because entropy loss of hyperbolic models converge slower (i.e., bad exploitation)
Method – What’s new?
13
• Motivation. However, naïve implementation of hyperbolic PPO is not successful
• Hyperbolic PPO is worse than PPO
• …because magnitudes and variances of gradients explode for hyperbolic models
Method – What’s new?
14
• Motivation. However, naïve implementation of hyperbolic PPO is not successful
• Hyperbolic PPO is worse than PPO; Clipping (activation norm) helps hyperbolic PPO but not enough
• …because magnitudes and variances of gradients explode for hyperbolic models
Method – What’s new?
15
• Solution. Apply spectral normalization (SN) to regularize the gradient norms
• SN normalizes the weights 𝑾 to have spectral norm (largest singular value) 𝝈 𝑾 ≈ 𝟏
• It regularizes the gradients neither explode nor vanish (updated in the normalized weight space)
• S-RYM (spectrally-regularized hyperbolic mappings) applies SN to all layers except the final hyperbolic layer
Method – What’s new?
16
Miyato et al. “Spectral Normalization for Generative Adversarial Networks,” ICLR 2018.
Image from https://blog.ml.cmu.edu/2022/01/21/why-spectral-normalization-stabilizes-gans-analysis-and-improvements/
Transform to
hyperbolic embedding
• Solution. Apply spectral normalization (SN) to regularize the gradient norms
• SN normalizes the weights 𝑾 to have spectral norm (largest singular value) 𝝈 𝑾 ≈ 𝟏
• It regularizes the gradients neither explode nor vanish (updated in the normalized weight space)
• S-RYM (spectrally-regularized hyperbolic mappings) applies SN to all layers except the final hyperbolic layer
• However, to maintain the overall scale of activations, it rescales the activations by 𝑛 (embedding dim)
• If 𝑧 ∈ ℝ! follows Gaussian distribution, 𝑧 follows scaled Chi distribution 𝒳! and 𝔼 𝑧 ≈ 𝑛
Method – What’s new?
17
Miyato et al. “Spectral Normalization for Generative Adversarial Networks,” ICLR 2018.
Image from https://blog.ml.cmu.edu/2022/01/21/why-spectral-normalization-stabilizes-gans-analysis-and-improvements/
× 𝒏𝟏 × 𝒏𝟐
Transform to
hyperbolic embedding
• Solution. S-RYM makes hyperbolic PPO works well in practice
• Hyperbolic PPO + S-RYM outperforms PPO and Hyperbolic PPO, so as Euclidean PPO + S-RYM
Method – What’s new?
18
• Solution. S-RYM makes hyperbolic PPO works well in practice
• Hyperbolic PPO + S-RYM outperforms PPO and Hyperbolic PPO, so as Euclidean PPO + S-RYM
• ...gradient norms of S-RYM are indeed reduced
Method – What’s new?
19
• Spectral normalization (SN)
• Normalize the weights as 𝑾/𝝈(𝑾) at forward time… but how to compute the spectral norm 𝝈(𝑾)?
• A. Apply power iteration!
Method – Technical details
20
• Spectral normalization (SN)
• Power iteration finds the largest singular value by iterating
• Why it works?
• 𝑏" converges to the singular vector 𝑣# and thus can find the corresponding (largest) singular value 𝜆#
Method – Technical details
21
See https://en.wikipedia.org/wiki/Power_iteration
• Hyperbolic embedding
• Recent hyperbolic models follow the standard Euclidean embeddings for all layers
except converting the final embedding 𝐱$ = 𝑓$(𝐱) to the hyperbolic embedding 𝐱%
Method – Technical details
22
• Hyperbolic embedding
• Recent hyperbolic models follow the standard Euclidean embeddings for all layers
except converting the final embedding 𝐱$ = 𝑓$(𝐱) to the hyperbolic embedding 𝐱%
• Specifically, 𝐱% is given by an exponential map 𝐱% = exp𝟎(𝐱$) from the origin 𝟎 using the velocity 𝐱$
• Exponential map is a projection of a (local tangent) vector 𝑋𝑌 to the manifold 𝑀
Method – Technical details
23
Image from https://www.researchgate.net/figure/An-exponential-map-exp-X-TX-M-M_fig1_224150233
• Hyperbolic embedding
• Hyperbolic space have several coordinate systems
• Similar to Euclidean space has Cartesian, spherical, or etc. systems
• How to represent the hyperbolic space is another research topic, but S-RYM uses Poincaré ball
Method – Technical details
24
Image from https://en.wikipedia.org/wiki/Coordinate_system
• Hyperbolic embedding
• In Poincaré ball, the operations (to compute the final output) are given by:
• Exponential map (from origin):
• Addition (of two vectors):
• Distance (from a generalized hyperplane parametrized by 𝐩 and 𝐰 ):
• S-RYM computes the final policy/value scalars 𝑓% 𝐱% = 𝑓'(𝐱%) '∈) for all (discrete) actions 𝑖 ∈ 𝐴
Method – Technical details
25
Image from https://en.wikipedia.org/wiki/Coordinate_system
• Tree structures may become more important in the era of multimodal (VL) and temporal (video) data
• Less impactful in the era of ImageNet classification (no hierarchy over classes)
• For example, CLIP should understand the hierarchy of visual end textual information
• Using hyperbolic embedding for model-based RL would also be an interesting direction
Final Remarks
26
Image from OpenAI CLIP and https://lexicala.com/review/2020/mccrae-rudnicka-bond-english-wordnet
Thank you for listening! 😀
27

Contenu connexe

Tendances

Tendances (20)

[DL輪読会]Causality Inspired Representation Learning for Domain Generalization
[DL輪読会]Causality Inspired Representation Learning for Domain Generalization[DL輪読会]Causality Inspired Representation Learning for Domain Generalization
[DL輪読会]Causality Inspired Representation Learning for Domain Generalization
 
Soft Actor Critic 解説
Soft Actor Critic 解説Soft Actor Critic 解説
Soft Actor Critic 解説
 
[DL輪読会]モデルベース強化学習とEnergy Based Model
[DL輪読会]モデルベース強化学習とEnergy Based Model[DL輪読会]モデルベース強化学習とEnergy Based Model
[DL輪読会]モデルベース強化学習とEnergy Based Model
 
方策勾配型強化学習の基礎と応用
方策勾配型強化学習の基礎と応用方策勾配型強化学習の基礎と応用
方策勾配型強化学習の基礎と応用
 
[DL輪読会]1次近似系MAMLとその理論的背景
[DL輪読会]1次近似系MAMLとその理論的背景[DL輪読会]1次近似系MAMLとその理論的背景
[DL輪読会]1次近似系MAMLとその理論的背景
 
[DL輪読会]Dream to Control: Learning Behaviors by Latent Imagination
[DL輪読会]Dream to Control: Learning Behaviors by Latent Imagination[DL輪読会]Dream to Control: Learning Behaviors by Latent Imagination
[DL輪読会]Dream to Control: Learning Behaviors by Latent Imagination
 
[DL輪読会]Pyramid Stereo Matching Network
[DL輪読会]Pyramid Stereo Matching Network[DL輪読会]Pyramid Stereo Matching Network
[DL輪読会]Pyramid Stereo Matching Network
 
【DL輪読会】マルチエージェント強化学習における近年の 協調的方策学習アルゴリズムの発展
【DL輪読会】マルチエージェント強化学習における近年の 協調的方策学習アルゴリズムの発展【DL輪読会】マルチエージェント強化学習における近年の 協調的方策学習アルゴリズムの発展
【DL輪読会】マルチエージェント強化学習における近年の 協調的方策学習アルゴリズムの発展
 
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
 
猫でも分かるVariational AutoEncoder
猫でも分かるVariational AutoEncoder猫でも分かるVariational AutoEncoder
猫でも分かるVariational AutoEncoder
 
強化学習 DQNからPPOまで
強化学習 DQNからPPOまで強化学習 DQNからPPOまで
強化学習 DQNからPPOまで
 
ICLR2019 読み会in京都 ICLRから読み取るFeature Disentangleの研究動向
ICLR2019 読み会in京都 ICLRから読み取るFeature Disentangleの研究動向ICLR2019 読み会in京都 ICLRから読み取るFeature Disentangleの研究動向
ICLR2019 読み会in京都 ICLRから読み取るFeature Disentangleの研究動向
 
Generative Adversarial Imitation Learningの紹介(RLアーキテクチャ勉強会)
Generative Adversarial Imitation Learningの紹介(RLアーキテクチャ勉強会)Generative Adversarial Imitation Learningの紹介(RLアーキテクチャ勉強会)
Generative Adversarial Imitation Learningの紹介(RLアーキテクチャ勉強会)
 
[DL輪読会]AlphaStarとその関連技術
[DL輪読会]AlphaStarとその関連技術[DL輪読会]AlphaStarとその関連技術
[DL輪読会]AlphaStarとその関連技術
 
独立成分分析 ICA
独立成分分析 ICA独立成分分析 ICA
独立成分分析 ICA
 
[DL輪読会]World Models
[DL輪読会]World Models[DL輪読会]World Models
[DL輪読会]World Models
 
強化学習の分散アーキテクチャ変遷
強化学習の分散アーキテクチャ変遷強化学習の分散アーキテクチャ変遷
強化学習の分散アーキテクチャ変遷
 
【DL輪読会】DayDreamer: World Models for Physical Robot Learning
【DL輪読会】DayDreamer: World Models for Physical Robot Learning【DL輪読会】DayDreamer: World Models for Physical Robot Learning
【DL輪読会】DayDreamer: World Models for Physical Robot Learning
 
[DL輪読会] マルチエージェント強化学習と心の理論
[DL輪読会] マルチエージェント強化学習と心の理論[DL輪読会] マルチエージェント強化学習と心の理論
[DL輪読会] マルチエージェント強化学習と心の理論
 
【DL輪読会】Contrastive Learning as Goal-Conditioned Reinforcement Learning
【DL輪読会】Contrastive Learning as Goal-Conditioned Reinforcement Learning【DL輪読会】Contrastive Learning as Goal-Conditioned Reinforcement Learning
【DL輪読会】Contrastive Learning as Goal-Conditioned Reinforcement Learning
 

Similaire à Hyperbolic Deep Reinforcement Learning

Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy...
Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy...Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy...
Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy...
WiMLDSMontreal
 
Feature Selection using Complementary Particle Swarm Optimization for DNA Mic...
Feature Selection using Complementary Particle Swarm Optimization for DNA Mic...Feature Selection using Complementary Particle Swarm Optimization for DNA Mic...
Feature Selection using Complementary Particle Swarm Optimization for DNA Mic...
sky chang
 
Particle swarm optimization
Particle swarm optimizationParticle swarm optimization
Particle swarm optimization
Hanya Mohammed
 

Similaire à Hyperbolic Deep Reinforcement Learning (20)

Lec16: Medical Image Registration (Advanced): Deformable Registration
Lec16: Medical Image Registration (Advanced): Deformable RegistrationLec16: Medical Image Registration (Advanced): Deformable Registration
Lec16: Medical Image Registration (Advanced): Deformable Registration
 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptx
 
Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy...
Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy...Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy...
Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy...
 
Feature Selection using Complementary Particle Swarm Optimization for DNA Mic...
Feature Selection using Complementary Particle Swarm Optimization for DNA Mic...Feature Selection using Complementary Particle Swarm Optimization for DNA Mic...
Feature Selection using Complementary Particle Swarm Optimization for DNA Mic...
 
Machine Learning Tools and Particle Swarm Optimization for Content-Based Sear...
Machine Learning Tools and Particle Swarm Optimization for Content-Based Sear...Machine Learning Tools and Particle Swarm Optimization for Content-Based Sear...
Machine Learning Tools and Particle Swarm Optimization for Content-Based Sear...
 
The deep bootstrap 논문 리뷰
The deep bootstrap 논문 리뷰The deep bootstrap 논문 리뷰
The deep bootstrap 논문 리뷰
 
Moving object detection in complex scene
Moving object detection in complex sceneMoving object detection in complex scene
Moving object detection in complex scene
 
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
 
Jaemin_230701_Simple_Copy_paste.pptx
Jaemin_230701_Simple_Copy_paste.pptxJaemin_230701_Simple_Copy_paste.pptx
Jaemin_230701_Simple_Copy_paste.pptx
 
Particle swarm optimization
Particle swarm optimizationParticle swarm optimization
Particle swarm optimization
 
ResNeSt: Split-Attention Networks
ResNeSt: Split-Attention NetworksResNeSt: Split-Attention Networks
ResNeSt: Split-Attention Networks
 
DeepStrip: High Resolution Boundary Refinement
DeepStrip: High Resolution Boundary RefinementDeepStrip: High Resolution Boundary Refinement
DeepStrip: High Resolution Boundary Refinement
 
safe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningsafe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learning
 
Unpaired Deep Learning for Accelerated MRI Using Optimal Transport Driven Cyc...
Unpaired Deep Learning for Accelerated MRI Using Optimal Transport Driven Cyc...Unpaired Deep Learning for Accelerated MRI Using Optimal Transport Driven Cyc...
Unpaired Deep Learning for Accelerated MRI Using Optimal Transport Driven Cyc...
 
Information Theoretic aspect of reinforcement learning
Information Theoretic aspect of reinforcement learningInformation Theoretic aspect of reinforcement learning
Information Theoretic aspect of reinforcement learning
 
A Fully Progressive approach to Single image super-resolution
A Fully Progressive approach to Single image super-resolution A Fully Progressive approach to Single image super-resolution
A Fully Progressive approach to Single image super-resolution
 
NS-CUK Seminar: J.H.Lee, Review on "Hyperbolic graph convolutional neural net...
NS-CUK Seminar: J.H.Lee, Review on "Hyperbolic graph convolutional neural net...NS-CUK Seminar: J.H.Lee, Review on "Hyperbolic graph convolutional neural net...
NS-CUK Seminar: J.H.Lee, Review on "Hyperbolic graph convolutional neural net...
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
 
The deep bootstrap framework review
The deep bootstrap framework reviewThe deep bootstrap framework review
The deep bootstrap framework review
 
Slides for "Do Deep Generative Models Know What They Don't know?"
Slides for "Do Deep Generative Models Know What They Don't know?"Slides for "Do Deep Generative Models Know What They Don't know?"
Slides for "Do Deep Generative Models Know What They Don't know?"
 

Plus de Sangwoo Mo

Plus de Sangwoo Mo (20)

Brief History of Visual Representation Learning
Brief History of Visual Representation LearningBrief History of Visual Representation Learning
Brief History of Visual Representation Learning
 
Learning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated DataLearning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated Data
 
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
 
Self-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSelf-supervised Learning Lecture Note
Self-supervised Learning Lecture Note
 
Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)
 
Deep Learning Theory Seminar (Chap 1-2, part 1)
Deep Learning Theory Seminar (Chap 1-2, part 1)Deep Learning Theory Seminar (Chap 1-2, part 1)
Deep Learning Theory Seminar (Chap 1-2, part 1)
 
Introduction to Diffusion Models
Introduction to Diffusion ModelsIntroduction to Diffusion Models
Introduction to Diffusion Models
 
Object-Region Video Transformers
Object-Region Video TransformersObject-Region Video Transformers
Object-Region Video Transformers
 
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksDeep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural Networks
 
Learning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat MinimaLearning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat Minima
 
Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)
 
Explicit Density Models
Explicit Density ModelsExplicit Density Models
Explicit Density Models
 
Score-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential EquationsScore-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential Equations
 
Self-Attention with Linear Complexity
Self-Attention with Linear ComplexitySelf-Attention with Linear Complexity
Self-Attention with Linear Complexity
 
Meta-Learning with Implicit Gradients
Meta-Learning with Implicit GradientsMeta-Learning with Implicit Gradients
Meta-Learning with Implicit Gradients
 
Generative Models for General Audiences
Generative Models for General AudiencesGenerative Models for General Audiences
Generative Models for General Audiences
 
Bayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-LearningBayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-Learning
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
Domain Transfer and Adaptation Survey
Domain Transfer and Adaptation SurveyDomain Transfer and Adaptation Survey
Domain Transfer and Adaptation Survey
 
Neural Processes
Neural ProcessesNeural Processes
Neural Processes
 

Dernier

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Hyperbolic Deep Reinforcement Learning

  • 2. • Some data naturally inherits a hierarchical tree structure Motivation 2 Image from https://towardsdatascience.com/https-medium-com-noa-weiss-the-hitchhikers-guide-to-hierarchical-classification-f8428ea1e076 Hierarchy of classes Tree structures of episodes
  • 3. • Euclidian space may not reflect the hierarchical structure well Motivation 3 Pets Cats Dogs Can we say that… d( pets, dogs ) > d( cats, dogs ) ?
  • 4. • Hyperbolic space is a natural choice to embed trees • Embed a few parents (e.g., “pets”) near center, and many children (e.g., “dogs”) near boundary • In hyperbolic space, distance exponentially grows near boundary, suitable to embedding many children Motivation 4 Image from Peng et al., “Hyperbolic Deep Neural Networks: A Survey” Pets Dogs Cats
  • 5. • Why I choose this paper? • Interpreting a sequence (e.g., episode, video) as a tree is an interesting and useful idea • It is the first paper to make hyperbolic embedding to work in RL • Contributions • The concept of hyperbolic space is appealing… but was not successful in practice • It is mostly due to an optimization issue, and a simple regularization trick makes it work well • Trick. Reduce the gradient norm of NN (apply spectral normalization (SN) and then rescaling outputs) TL;DR 5
  • 6. • Episodes moves from the root (center) to leaves (boundary) as the agent progresses • Green lines (good policy) moves from the center to boundary • Red lines (random policy) moves to random directions • Hyperbolic embeddings give a natural explanations for the RL agents Results 6
  • 7. • S-RYM (proposed method) improves the learned policies • Apply S-RYM to PPO (policy gradient) on Procgen • Reducing embedding dim to 32 improves the performance (hyperbolic space more efficiently embeds episodes) Results 7
  • 8. • S-RYM (proposed method) improves the learned policies • Apply S-RYM to Rainbow (Q-learning) on Atari 100K Results 8
  • 9. • Motivation. Deep RL policies already makes the embeddings to be hyperbolic • Return (both train -- and test –) increases as 𝜹-hyperbolicity decreases Method – What’s new? 9
  • 10. • Motivation. Deep RL policies already makes the embeddings to be hyperbolic • Return (both train -- and test –) increases as 𝜹-hyperbolicity decreases • Gap (of train -- and test –) enlarges when 𝜹-hyperbolicity shows U-curve Method – What’s new? 10
  • 11. • Motivation. Deep RL policies already makes the embeddings to be hyperbolic • Return (both train -- and test –) increases as 𝜹-hyperbolicity decreases • Gap (of train -- and test –) enlarges when 𝜹-hyperbolicity shows U-curve Method – What’s new? 11 𝜹-hyperbolicity See https://en.wikipedia.org/wiki/Hyperbolic_metric_space for detailed explanations for 𝛿-hyperbolicity
  • 12. • Motivation. However, naïve implementation of hyperbolic PPO is not successful • Hyperbolic PPO is worse than PPO • Return of hyperbolic models are worse than PPO Method – What’s new? 12
  • 13. • Motivation. However, naïve implementation of hyperbolic PPO is not successful • Hyperbolic PPO is worse than PPO • …because entropy loss of hyperbolic models converge slower (i.e., bad exploitation) Method – What’s new? 13
  • 14. • Motivation. However, naïve implementation of hyperbolic PPO is not successful • Hyperbolic PPO is worse than PPO • …because magnitudes and variances of gradients explode for hyperbolic models Method – What’s new? 14
  • 15. • Motivation. However, naïve implementation of hyperbolic PPO is not successful • Hyperbolic PPO is worse than PPO; Clipping (activation norm) helps hyperbolic PPO but not enough • …because magnitudes and variances of gradients explode for hyperbolic models Method – What’s new? 15
  • 16. • Solution. Apply spectral normalization (SN) to regularize the gradient norms • SN normalizes the weights 𝑾 to have spectral norm (largest singular value) 𝝈 𝑾 ≈ 𝟏 • It regularizes the gradients neither explode nor vanish (updated in the normalized weight space) • S-RYM (spectrally-regularized hyperbolic mappings) applies SN to all layers except the final hyperbolic layer Method – What’s new? 16 Miyato et al. “Spectral Normalization for Generative Adversarial Networks,” ICLR 2018. Image from https://blog.ml.cmu.edu/2022/01/21/why-spectral-normalization-stabilizes-gans-analysis-and-improvements/ Transform to hyperbolic embedding
  • 17. • Solution. Apply spectral normalization (SN) to regularize the gradient norms • SN normalizes the weights 𝑾 to have spectral norm (largest singular value) 𝝈 𝑾 ≈ 𝟏 • It regularizes the gradients neither explode nor vanish (updated in the normalized weight space) • S-RYM (spectrally-regularized hyperbolic mappings) applies SN to all layers except the final hyperbolic layer • However, to maintain the overall scale of activations, it rescales the activations by 𝑛 (embedding dim) • If 𝑧 ∈ ℝ! follows Gaussian distribution, 𝑧 follows scaled Chi distribution 𝒳! and 𝔼 𝑧 ≈ 𝑛 Method – What’s new? 17 Miyato et al. “Spectral Normalization for Generative Adversarial Networks,” ICLR 2018. Image from https://blog.ml.cmu.edu/2022/01/21/why-spectral-normalization-stabilizes-gans-analysis-and-improvements/ × 𝒏𝟏 × 𝒏𝟐 Transform to hyperbolic embedding
  • 18. • Solution. S-RYM makes hyperbolic PPO works well in practice • Hyperbolic PPO + S-RYM outperforms PPO and Hyperbolic PPO, so as Euclidean PPO + S-RYM Method – What’s new? 18
  • 19. • Solution. S-RYM makes hyperbolic PPO works well in practice • Hyperbolic PPO + S-RYM outperforms PPO and Hyperbolic PPO, so as Euclidean PPO + S-RYM • ...gradient norms of S-RYM are indeed reduced Method – What’s new? 19
  • 20. • Spectral normalization (SN) • Normalize the weights as 𝑾/𝝈(𝑾) at forward time… but how to compute the spectral norm 𝝈(𝑾)? • A. Apply power iteration! Method – Technical details 20
  • 21. • Spectral normalization (SN) • Power iteration finds the largest singular value by iterating • Why it works? • 𝑏" converges to the singular vector 𝑣# and thus can find the corresponding (largest) singular value 𝜆# Method – Technical details 21 See https://en.wikipedia.org/wiki/Power_iteration
  • 22. • Hyperbolic embedding • Recent hyperbolic models follow the standard Euclidean embeddings for all layers except converting the final embedding 𝐱$ = 𝑓$(𝐱) to the hyperbolic embedding 𝐱% Method – Technical details 22
  • 23. • Hyperbolic embedding • Recent hyperbolic models follow the standard Euclidean embeddings for all layers except converting the final embedding 𝐱$ = 𝑓$(𝐱) to the hyperbolic embedding 𝐱% • Specifically, 𝐱% is given by an exponential map 𝐱% = exp𝟎(𝐱$) from the origin 𝟎 using the velocity 𝐱$ • Exponential map is a projection of a (local tangent) vector 𝑋𝑌 to the manifold 𝑀 Method – Technical details 23 Image from https://www.researchgate.net/figure/An-exponential-map-exp-X-TX-M-M_fig1_224150233
  • 24. • Hyperbolic embedding • Hyperbolic space have several coordinate systems • Similar to Euclidean space has Cartesian, spherical, or etc. systems • How to represent the hyperbolic space is another research topic, but S-RYM uses Poincaré ball Method – Technical details 24 Image from https://en.wikipedia.org/wiki/Coordinate_system
  • 25. • Hyperbolic embedding • In Poincaré ball, the operations (to compute the final output) are given by: • Exponential map (from origin): • Addition (of two vectors): • Distance (from a generalized hyperplane parametrized by 𝐩 and 𝐰 ): • S-RYM computes the final policy/value scalars 𝑓% 𝐱% = 𝑓'(𝐱%) '∈) for all (discrete) actions 𝑖 ∈ 𝐴 Method – Technical details 25 Image from https://en.wikipedia.org/wiki/Coordinate_system
  • 26. • Tree structures may become more important in the era of multimodal (VL) and temporal (video) data • Less impactful in the era of ImageNet classification (no hierarchy over classes) • For example, CLIP should understand the hierarchy of visual end textual information • Using hyperbolic embedding for model-based RL would also be an interesting direction Final Remarks 26 Image from OpenAI CLIP and https://lexicala.com/review/2020/mccrae-rudnicka-bond-english-wordnet
  • 27. Thank you for listening! 😀 27