SlideShare une entreprise Scribd logo
1  sur  40
Télécharger pour lire hors ligne
Learning Dexterous In-Hand Manipulation
(18.08)
Seungjae Ryan Lee
Human vs Robot
• Humans perform various dexterous manipulation tasks with hands
• Robots are designed for specific tasks in constrained settings
• Unable to utilize complex end-effectors
http://lasa.epfl.ch/research_new/grasping/
Robot with Hand
• Human hand has been an inspiration for research
• Limited results due to complexity of system
• Simulation-only manipulation (no attempt on transferring)
• Physical-only manipulation (slow and costly to run)
https://www.cc.gatech.edu/~ybai30/hand/paper_hand.pdf
The Task: In-Hand Object Reorientation
• Object placed on robot hand
• Redirect the object to desired target configuration in-hand
• New goal provided if current goal achieved
• Continues until the object is dropped or after 50 successes
Dactyl: Approach
• Transfer Learning from simulator to real environment
• Use extensive randomization and effects
• Use memory augmented control policies to learn how to adapt
• Use distributed RL
Dactyl: Results
• Shows unprecedented levels of dexterity
• Discovers human-like grasp types without guidance
The Real World: Shadow Dexterous Hand
• 24 Degrees of Freedom (DOF)
• Hand: Cartesian position of 5 fingertips
• Object: Cartesian position and 3 RGB cameras (to handle real-world)
The Simulation: MuJoCo and Unity
• Simulate the physics with MuJoCo
• Render images with Unity
• Rough approximation of physical setup
• Direct torque vs. Tendon-based actuation
• Rigid body vs. Deformable body contact model
• Creates a “reality gap”
Actions and Rewards
• Actions: relative joint angles
• Discretized action into 11 bins (better performance than continuous)
• Smoothed with exponential moving average to prevent robot breakage
• Reward: 𝑟𝑡 = 𝑑 𝑡 − 𝑑 𝑡+1
• 𝑑 𝑡, 𝑑 𝑡+1 are angles between desired and goal object orientation
before and after transition
• +5 when goal is achieved
• -20 when object is dropped
Fostering Transferability: Observations
• Avoid adding some sensors as observations
• Subject to state-dependent noise that cannot be modelled
• Can be noisy and hard to calibrate
• Ex) Fingertip tactile sensor
• Atmospheric pressure
• Temperature
• Shape of contact
• Intersection geometry
https://www.youtube.com/watch?v=W_O-u9PNUMU
Fostering Transferability: Randomization
• Robust: Less likely to overfit to the simulated environment
• More likely to transfer successfully to the physical robot
• Types of randomizations:
• Observation noise
• Physics randomizations
• Unmodeled effects
• Visual appearance randomizations
Randomization: Observation Noise
• Correlated noise sampled each episode
• Uncorrelated noise sampled each timestep
Randomization: Physics
• Parameters centered around values found during model calibration
• Sampled each episode
Randomization: Unmodeled effects
• Physical robot experiences effects not modeled in simulation
• Accommodated by simulating such effects
• Simulate motor backlash
• Simulate action delay and noise
• Simulate motion capture tracking errors
• Sometimes apply random forces on object
Randomization: Visual Appearance
• Randomize various visual aspects
• camera position / intrinsics
• Lighting conditions
• Pose of hand and object
• Materials and textures
Step A: Collect Data
• Use just simulations to generate data
• Randomized environments
Step B: Train Control Policy for Simulation
• Map observed states and rewards to actions
• Fingertip locations: available in both simulation and physical robot
• Object pose: available only in simulation
Policy Architecture: LSTM
• Randomizations persist across episode
• Agent can identify the environment’s property and adapt accordingly
• Use Long Short-term Memory (LSTM)
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Policy Architecture: PPO
• Trained with Proximal Policy Optimization (PPO)
• Policy network (actor) and value network (critic) has same architecture
Policy Architecture: Asymmetric AC
• Value network has access to more info not available in real robots
• Simplifies learning good value estimates
https://arxiv.org/abs/1710.06542
Distributed Training with Rapid
• Same distributed implementation as OpenAI Five
Optimize with minibatch
Sample minibatch from buffer
Pull new experience to buffer
Generate new experience
Step C: Train Pose Estimator
• Estimate the object’s pose from vision
• Object’s pose is not available in real-world
State Estimation from Vision
• Gather 1M states from simulator with trained policy
• 3 RGB cameras with different viewpoints
• Predict both position and orientation of object
• Use minibatch gradient descent to minimize MSE
• Shared CONV, RESNET, SSM layers between cameras
• Use data augmentation
Step D: Combine and Transfer
• Combine B and C to create an agent usable in real world
Qualitative Results
• Naturally learn humanlike grasps dexterous manipulation strategies
Quantitative Results
• # of consecutive successful rotations up to 50 until object is dropped
• Challenging to obtain due to robot breakage during experiments
Quantitative Results: Reality Gap
• # of consecutive successful rotations up to 50 until object is dropped
Quantitative Results: Vision
• Vision model is comparable to state model in real world
Quantitative Results: Different Object
• Learn successful policies that transfer, but larger reality gap
• Failed on spheres due to rolling behaviors
Ablation of Randomizations
• Randomizations are vital to real-world performance
Ablation of Randomizations: Observation
• Little performance drop for observation noise
• Due to accurate motion capture system
• Vital when pose estimates are much more noisy
Ablation of Randomizations: Limitations
• Takes ≈ 33 times more time (100 years of experience vs 3 years)
Effect of Memory in Policies
• LSTM achieves better performance than feedforward (FF) network
• Was able to predict environment randomizations
• Ex) 80% accuracy in predicting if the block’s size was bigger or smaller than
the average
Effect of Memory in Policies
Sample Complexity and Scale
• Diminishing return with further scaling
Sample Complexity and Scale
• Little affect on how fast the agent learns w.r.t years of experience
Vision Performance
• Low error on synthetic data
• Successful Unity-to-MuJoCo transfer
• Higher error on real data
• Reality gap, noises, occlusions, etc…
• Still performs reasonably well
What Surprised Us
1. Tactile sensing was not necessary
2. Randomization for one object generalized to similar objects
• System for a block worked well for an octagonal prism
• Had trouble with sphere due to its rolling behavior
3. System engineering is as important as good algorithm
• System without timing bug performed consistently better
What Didn’t Work
1. Faster reaction time ≠ Better Performance
• No noticeable difference between 40ms and 80ms
2. Using real data for training vision model did not work
• Latency and measurement error
• Hassle to collect due to configuration changes
Thank you!
Original Paper: https://arxiv.org/abs/1808.00177
Paper Recommendations:
• Using Simulation and Domain Adaptation to Improve Efficiency of
Deep Robotic Grasping
• Asymmetric Actor Critic for Image-Based Robot Learning
• Generalizing from Simulation
You can find more content in www.endtoend.ai/slides

Contenu connexe

Tendances

Reinforcement Learning 3. Finite Markov Decision Processes
Reinforcement Learning 3. Finite Markov Decision ProcessesReinforcement Learning 3. Finite Markov Decision Processes
Reinforcement Learning 3. Finite Markov Decision ProcessesSeung Jae Lee
 
Reinforcement Learning 4. Dynamic Programming
Reinforcement Learning 4. Dynamic ProgrammingReinforcement Learning 4. Dynamic Programming
Reinforcement Learning 4. Dynamic ProgrammingSeung Jae Lee
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningKhaled Saleh
 
An Introduction to Reinforcement Learning - The Doors to AGI
An Introduction to Reinforcement Learning - The Doors to AGIAn Introduction to Reinforcement Learning - The Doors to AGI
An Introduction to Reinforcement Learning - The Doors to AGIAnirban Santara
 
Introduction: Asynchronous Methods for Deep Reinforcement Learning
Introduction: Asynchronous Methods for  Deep Reinforcement LearningIntroduction: Asynchronous Methods for  Deep Reinforcement Learning
Introduction: Asynchronous Methods for Deep Reinforcement LearningTakashi Nagata
 
Reinforcement learning
Reinforcement learning Reinforcement learning
Reinforcement learning Chandra Meena
 
Reinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural NetsReinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural NetsPierre de Lacaze
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement LearningUsman Qayyum
 
Deep Q-learning from Demonstrations DQfD
Deep Q-learning from Demonstrations DQfDDeep Q-learning from Demonstrations DQfD
Deep Q-learning from Demonstrations DQfDAmmar Rashed
 
Actor critic algorithm
Actor critic algorithmActor critic algorithm
Actor critic algorithmJie-Han Chen
 
Reinforcement Learning 1. Introduction
Reinforcement Learning 1. IntroductionReinforcement Learning 1. Introduction
Reinforcement Learning 1. IntroductionSeung Jae Lee
 
Episodic Policy Gradient Training
Episodic Policy Gradient TrainingEpisodic Policy Gradient Training
Episodic Policy Gradient TrainingHung Le
 
Model Based Episodic Memory
Model Based Episodic MemoryModel Based Episodic Memory
Model Based Episodic MemoryHung Le
 
1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration
1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration
1118_Seminar_Continuous_Deep Q-Learning with Model based accelerationHye-min Ahn
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningSalem-Kabbani
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learningSubrat Panda, PhD
 
An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms Hakky St
 

Tendances (20)

Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
Reinforcement Learning 3. Finite Markov Decision Processes
Reinforcement Learning 3. Finite Markov Decision ProcessesReinforcement Learning 3. Finite Markov Decision Processes
Reinforcement Learning 3. Finite Markov Decision Processes
 
Reinforcement Learning 4. Dynamic Programming
Reinforcement Learning 4. Dynamic ProgrammingReinforcement Learning 4. Dynamic Programming
Reinforcement Learning 4. Dynamic Programming
 
Deep Q-Learning
Deep Q-LearningDeep Q-Learning
Deep Q-Learning
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement Learning
 
An Introduction to Reinforcement Learning - The Doors to AGI
An Introduction to Reinforcement Learning - The Doors to AGIAn Introduction to Reinforcement Learning - The Doors to AGI
An Introduction to Reinforcement Learning - The Doors to AGI
 
Introduction: Asynchronous Methods for Deep Reinforcement Learning
Introduction: Asynchronous Methods for  Deep Reinforcement LearningIntroduction: Asynchronous Methods for  Deep Reinforcement Learning
Introduction: Asynchronous Methods for Deep Reinforcement Learning
 
Reinforcement learning
Reinforcement learning Reinforcement learning
Reinforcement learning
 
Lec3 dqn
Lec3 dqnLec3 dqn
Lec3 dqn
 
Reinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural NetsReinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural Nets
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
Deep Q-learning from Demonstrations DQfD
Deep Q-learning from Demonstrations DQfDDeep Q-learning from Demonstrations DQfD
Deep Q-learning from Demonstrations DQfD
 
Actor critic algorithm
Actor critic algorithmActor critic algorithm
Actor critic algorithm
 
Reinforcement Learning 1. Introduction
Reinforcement Learning 1. IntroductionReinforcement Learning 1. Introduction
Reinforcement Learning 1. Introduction
 
Episodic Policy Gradient Training
Episodic Policy Gradient TrainingEpisodic Policy Gradient Training
Episodic Policy Gradient Training
 
Model Based Episodic Memory
Model Based Episodic MemoryModel Based Episodic Memory
Model Based Episodic Memory
 
1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration
1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration
1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learning
 
An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms
 

Similaire à [1808.00177] Learning Dexterous In-Hand Manipulation

2.IntelligentAgents.ppt
2.IntelligentAgents.ppt2.IntelligentAgents.ppt
2.IntelligentAgents.pptGopiSuresh8
 
Learning agile and dynamic motor skills for legged robots
Learning agile and dynamic motor skills for legged robotsLearning agile and dynamic motor skills for legged robots
Learning agile and dynamic motor skills for legged robots홍배 김
 
Rapid motor adaptation for legged robots
Rapid motor adaptation for legged robotsRapid motor adaptation for legged robots
Rapid motor adaptation for legged robotsRohit Choudhury
 
Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...
Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...
Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...gabrielesisinna
 
Horizon: Deep Reinforcement Learning at Scale
Horizon: Deep Reinforcement Learning at ScaleHorizon: Deep Reinforcement Learning at Scale
Horizon: Deep Reinforcement Learning at ScaleDatabricks
 
Constrained Optimization with Genetic Algorithms and Project Bonsai
Constrained Optimization with Genetic Algorithms and Project BonsaiConstrained Optimization with Genetic Algorithms and Project Bonsai
Constrained Optimization with Genetic Algorithms and Project BonsaiIvo Andreev
 
Fcv core liu
Fcv core liuFcv core liu
Fcv core liuzukun
 
Testing Dynamic Behavior in Executable Software Models - Making Cyber-physica...
Testing Dynamic Behavior in Executable Software Models - Making Cyber-physica...Testing Dynamic Behavior in Executable Software Models - Making Cyber-physica...
Testing Dynamic Behavior in Executable Software Models - Making Cyber-physica...Lionel Briand
 
[RSS2023] Local Object Crop Collision Network for Efficient Simulation
[RSS2023] Local Object Crop Collision Network for Efficient Simulation[RSS2023] Local Object Crop Collision Network for Efficient Simulation
[RSS2023] Local Object Crop Collision Network for Efficient SimulationDongwonSon1
 
SafeguardAI and Surprise Based Learning -- Protect your AI solutions from Uni...
SafeguardAI and Surprise Based Learning -- Protect your AI solutions from Uni...SafeguardAI and Surprise Based Learning -- Protect your AI solutions from Uni...
SafeguardAI and Surprise Based Learning -- Protect your AI solutions from Uni...NAVER Engineering
 
Robotics-training-classes
Robotics-training-classesRobotics-training-classes
Robotics-training-classesvibrantuser
 
corporate-Robotics-training
corporate-Robotics-trainingcorporate-Robotics-training
corporate-Robotics-trainingvibrantuser
 
Discrete event simulation
Discrete event simulationDiscrete event simulation
Discrete event simulationssusera970cc
 
On Learning Navigation Behaviors for Small Mobile Robots With Reservoir Compu...
On Learning Navigation Behaviors for Small Mobile Robots With Reservoir Compu...On Learning Navigation Behaviors for Small Mobile Robots With Reservoir Compu...
On Learning Navigation Behaviors for Small Mobile Robots With Reservoir Compu...gabrielesisinna
 
Simulation Models as a Research Method.ppt
Simulation Models as a Research Method.pptSimulation Models as a Research Method.ppt
Simulation Models as a Research Method.pptQidiwQidiwQidiw
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingBootNeck1
 
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...Tulipp. Eu
 
Presentation object detection (1)
Presentation object detection (1)Presentation object detection (1)
Presentation object detection (1)AkezhanZholdybaev
 

Similaire à [1808.00177] Learning Dexterous In-Hand Manipulation (20)

2.IntelligentAgents.ppt
2.IntelligentAgents.ppt2.IntelligentAgents.ppt
2.IntelligentAgents.ppt
 
Learning agile and dynamic motor skills for legged robots
Learning agile and dynamic motor skills for legged robotsLearning agile and dynamic motor skills for legged robots
Learning agile and dynamic motor skills for legged robots
 
Rapid motor adaptation for legged robots
Rapid motor adaptation for legged robotsRapid motor adaptation for legged robots
Rapid motor adaptation for legged robots
 
Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...
Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...
Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...
 
Horizon: Deep Reinforcement Learning at Scale
Horizon: Deep Reinforcement Learning at ScaleHorizon: Deep Reinforcement Learning at Scale
Horizon: Deep Reinforcement Learning at Scale
 
Constrained Optimization with Genetic Algorithms and Project Bonsai
Constrained Optimization with Genetic Algorithms and Project BonsaiConstrained Optimization with Genetic Algorithms and Project Bonsai
Constrained Optimization with Genetic Algorithms and Project Bonsai
 
Fcv core liu
Fcv core liuFcv core liu
Fcv core liu
 
Testing Dynamic Behavior in Executable Software Models - Making Cyber-physica...
Testing Dynamic Behavior in Executable Software Models - Making Cyber-physica...Testing Dynamic Behavior in Executable Software Models - Making Cyber-physica...
Testing Dynamic Behavior in Executable Software Models - Making Cyber-physica...
 
[RSS2023] Local Object Crop Collision Network for Efficient Simulation
[RSS2023] Local Object Crop Collision Network for Efficient Simulation[RSS2023] Local Object Crop Collision Network for Efficient Simulation
[RSS2023] Local Object Crop Collision Network for Efficient Simulation
 
SafeguardAI and Surprise Based Learning -- Protect your AI solutions from Uni...
SafeguardAI and Surprise Based Learning -- Protect your AI solutions from Uni...SafeguardAI and Surprise Based Learning -- Protect your AI solutions from Uni...
SafeguardAI and Surprise Based Learning -- Protect your AI solutions from Uni...
 
Robotics training in mumbai
Robotics training in mumbai Robotics training in mumbai
Robotics training in mumbai
 
Robotics-training-classes
Robotics-training-classesRobotics-training-classes
Robotics-training-classes
 
corporate-Robotics-training
corporate-Robotics-trainingcorporate-Robotics-training
corporate-Robotics-training
 
Discrete event simulation
Discrete event simulationDiscrete event simulation
Discrete event simulation
 
m2-agents.pptx
m2-agents.pptxm2-agents.pptx
m2-agents.pptx
 
On Learning Navigation Behaviors for Small Mobile Robots With Reservoir Compu...
On Learning Navigation Behaviors for Small Mobile Robots With Reservoir Compu...On Learning Navigation Behaviors for Small Mobile Robots With Reservoir Compu...
On Learning Navigation Behaviors for Small Mobile Robots With Reservoir Compu...
 
Simulation Models as a Research Method.ppt
Simulation Models as a Research Method.pptSimulation Models as a Research Method.ppt
Simulation Models as a Research Method.ppt
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event Scheduling
 
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
 
Presentation object detection (1)
Presentation object detection (1)Presentation object detection (1)
Presentation object detection (1)
 

Dernier

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Dernier (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

[1808.00177] Learning Dexterous In-Hand Manipulation

  • 1. Learning Dexterous In-Hand Manipulation (18.08) Seungjae Ryan Lee
  • 2. Human vs Robot • Humans perform various dexterous manipulation tasks with hands • Robots are designed for specific tasks in constrained settings • Unable to utilize complex end-effectors http://lasa.epfl.ch/research_new/grasping/
  • 3. Robot with Hand • Human hand has been an inspiration for research • Limited results due to complexity of system • Simulation-only manipulation (no attempt on transferring) • Physical-only manipulation (slow and costly to run) https://www.cc.gatech.edu/~ybai30/hand/paper_hand.pdf
  • 4. The Task: In-Hand Object Reorientation • Object placed on robot hand • Redirect the object to desired target configuration in-hand • New goal provided if current goal achieved • Continues until the object is dropped or after 50 successes
  • 5. Dactyl: Approach • Transfer Learning from simulator to real environment • Use extensive randomization and effects • Use memory augmented control policies to learn how to adapt • Use distributed RL
  • 6. Dactyl: Results • Shows unprecedented levels of dexterity • Discovers human-like grasp types without guidance
  • 7. The Real World: Shadow Dexterous Hand • 24 Degrees of Freedom (DOF) • Hand: Cartesian position of 5 fingertips • Object: Cartesian position and 3 RGB cameras (to handle real-world)
  • 8. The Simulation: MuJoCo and Unity • Simulate the physics with MuJoCo • Render images with Unity • Rough approximation of physical setup • Direct torque vs. Tendon-based actuation • Rigid body vs. Deformable body contact model • Creates a “reality gap”
  • 9. Actions and Rewards • Actions: relative joint angles • Discretized action into 11 bins (better performance than continuous) • Smoothed with exponential moving average to prevent robot breakage • Reward: 𝑟𝑡 = 𝑑 𝑡 − 𝑑 𝑡+1 • 𝑑 𝑡, 𝑑 𝑡+1 are angles between desired and goal object orientation before and after transition • +5 when goal is achieved • -20 when object is dropped
  • 10. Fostering Transferability: Observations • Avoid adding some sensors as observations • Subject to state-dependent noise that cannot be modelled • Can be noisy and hard to calibrate • Ex) Fingertip tactile sensor • Atmospheric pressure • Temperature • Shape of contact • Intersection geometry https://www.youtube.com/watch?v=W_O-u9PNUMU
  • 11. Fostering Transferability: Randomization • Robust: Less likely to overfit to the simulated environment • More likely to transfer successfully to the physical robot • Types of randomizations: • Observation noise • Physics randomizations • Unmodeled effects • Visual appearance randomizations
  • 12. Randomization: Observation Noise • Correlated noise sampled each episode • Uncorrelated noise sampled each timestep
  • 13. Randomization: Physics • Parameters centered around values found during model calibration • Sampled each episode
  • 14. Randomization: Unmodeled effects • Physical robot experiences effects not modeled in simulation • Accommodated by simulating such effects • Simulate motor backlash • Simulate action delay and noise • Simulate motion capture tracking errors • Sometimes apply random forces on object
  • 15. Randomization: Visual Appearance • Randomize various visual aspects • camera position / intrinsics • Lighting conditions • Pose of hand and object • Materials and textures
  • 16. Step A: Collect Data • Use just simulations to generate data • Randomized environments
  • 17. Step B: Train Control Policy for Simulation • Map observed states and rewards to actions • Fingertip locations: available in both simulation and physical robot • Object pose: available only in simulation
  • 18. Policy Architecture: LSTM • Randomizations persist across episode • Agent can identify the environment’s property and adapt accordingly • Use Long Short-term Memory (LSTM) http://colah.github.io/posts/2015-08-Understanding-LSTMs/
  • 19. Policy Architecture: PPO • Trained with Proximal Policy Optimization (PPO) • Policy network (actor) and value network (critic) has same architecture
  • 20. Policy Architecture: Asymmetric AC • Value network has access to more info not available in real robots • Simplifies learning good value estimates https://arxiv.org/abs/1710.06542
  • 21. Distributed Training with Rapid • Same distributed implementation as OpenAI Five Optimize with minibatch Sample minibatch from buffer Pull new experience to buffer Generate new experience
  • 22. Step C: Train Pose Estimator • Estimate the object’s pose from vision • Object’s pose is not available in real-world
  • 23. State Estimation from Vision • Gather 1M states from simulator with trained policy • 3 RGB cameras with different viewpoints • Predict both position and orientation of object • Use minibatch gradient descent to minimize MSE • Shared CONV, RESNET, SSM layers between cameras • Use data augmentation
  • 24. Step D: Combine and Transfer • Combine B and C to create an agent usable in real world
  • 25. Qualitative Results • Naturally learn humanlike grasps dexterous manipulation strategies
  • 26. Quantitative Results • # of consecutive successful rotations up to 50 until object is dropped • Challenging to obtain due to robot breakage during experiments
  • 27. Quantitative Results: Reality Gap • # of consecutive successful rotations up to 50 until object is dropped
  • 28. Quantitative Results: Vision • Vision model is comparable to state model in real world
  • 29. Quantitative Results: Different Object • Learn successful policies that transfer, but larger reality gap • Failed on spheres due to rolling behaviors
  • 30. Ablation of Randomizations • Randomizations are vital to real-world performance
  • 31. Ablation of Randomizations: Observation • Little performance drop for observation noise • Due to accurate motion capture system • Vital when pose estimates are much more noisy
  • 32. Ablation of Randomizations: Limitations • Takes ≈ 33 times more time (100 years of experience vs 3 years)
  • 33. Effect of Memory in Policies • LSTM achieves better performance than feedforward (FF) network • Was able to predict environment randomizations • Ex) 80% accuracy in predicting if the block’s size was bigger or smaller than the average
  • 34. Effect of Memory in Policies
  • 35. Sample Complexity and Scale • Diminishing return with further scaling
  • 36. Sample Complexity and Scale • Little affect on how fast the agent learns w.r.t years of experience
  • 37. Vision Performance • Low error on synthetic data • Successful Unity-to-MuJoCo transfer • Higher error on real data • Reality gap, noises, occlusions, etc… • Still performs reasonably well
  • 38. What Surprised Us 1. Tactile sensing was not necessary 2. Randomization for one object generalized to similar objects • System for a block worked well for an octagonal prism • Had trouble with sphere due to its rolling behavior 3. System engineering is as important as good algorithm • System without timing bug performed consistently better
  • 39. What Didn’t Work 1. Faster reaction time ≠ Better Performance • No noticeable difference between 40ms and 80ms 2. Using real data for training vision model did not work • Latency and measurement error • Hassle to collect due to configuration changes
  • 40. Thank you! Original Paper: https://arxiv.org/abs/1808.00177 Paper Recommendations: • Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping • Asymmetric Actor Critic for Image-Based Robot Learning • Generalizing from Simulation You can find more content in www.endtoend.ai/slides