Learning Montezuma's Revenge from a Single Demonstration

•

1 j'aime•198 vues

S

Presentation slides for 'Learning Montezuma's Revenge from a Single Demonstration' by T. Salimans and R. Chen. You can find more presentation slides in my website: https://www.endtoend.ai

Learning Montezuma’s Revenge from a
Single Demonstration (18.07)
Ryan Lee

Exploration and Learning
● Exploration: Find action sequence with positive reward
● Learning: Remember and generalize action sequence
● Need both for a successful agent

Montezuma’s Revenge
● One of the hardest games in Atari 2600
● Sparse rewards → Exploration is difficult
https://www.retrogames.cz/play_124-Atari2600.php?language=EN

Simplifying Exploration with Demonstrations
● Solution: Shorten the episode
○ Start the agent near the end of demonstration
○ Train agent until it ties or beats the demonstrator’s score
○ Gradually move starting point back in time
Go down
Ladder 1
Go down
Rope
Go down
Ladder 2
Jump over
Skull
Go up
Ladder 3

Go down
Ladder 1
Go down
Rope
Go down
Ladder 2
Jump over
Skull
Go up
Ladder 3
Go down
Ladder 1
Go down
Rope
Go down
Ladder 2
Jump over
Skull
Go up
Ladder 3
Go down
Ladder 1
Go down
Rope
Go down
Ladder 2
Jump over
Skull
Go up
Ladder 3
Go down
Ladder 1
Go down
Rope
Go down
Ladder 2
Jump over
Skull
Go up
Ladder 3
Go down
Ladder 1
Go down
Rope
Go down
Ladder 2
Jump over
Skull
Go up
Ladder 3

Result
● 74500 points on Montezuma’s Revenge (State of the Art)
● Surpasses demo score of 71500
● Exploits emulator flaw

Comparison with DeepMind’s approach
● DeepMind’s approach
○ Less control over environment needed
○ Agents imitate the demo
● This approach
○ Need full game states in demo
○ Directly optimize game score → Less overfitting for sub-optimal demo
○ Better in multiplayer games where performance should be optimized against various
opponents

Remaining Challenges
● Agent cannot reach exact state in demo
○ Agent needs to generalize between similar states
○ Problematic in Gravitar or Pitfall
● Careful hyperparameter tuning needed
● High variance in each run
● NN does not generalize as well as human
https://blog.openai.com/openai-baselines-ppo/

Thank you!
Original content by OpenAI
● Learning Montezuma’s Revenge from a Single Demonstration
You can find more content in
● github.com/seungjaeryanlee
● www.endtoend.ai

Recommandé

[1312.5602] Playing Atari with Deep Reinforcement Learning

[1312.5602] Playing Atari with Deep Reinforcement Learning

[1312.5602] Playing Atari with Deep Reinforcement LearningSeung Jae Lee

[1808.00177] Learning Dexterous In-Hand Manipulation

[1808.00177] Learning Dexterous In-Hand Manipulation

[1808.00177] Learning Dexterous In-Hand ManipulationSeung Jae Lee

Reinforcement Learning 8: Planning and Learning with Tabular Methods

Reinforcement Learning 8: Planning and Learning with Tabular Methods

Reinforcement Learning 8: Planning and Learning with Tabular MethodsSeung Jae Lee

Reinforcement Learning 7. n-step Bootstrapping

Reinforcement Learning 7. n-step Bootstrapping

Reinforcement Learning 7. n-step BootstrappingSeung Jae Lee

Reinforcement Learning 6. Temporal Difference Learning

Reinforcement Learning 6. Temporal Difference Learning

Reinforcement Learning 6. Temporal Difference LearningSeung Jae Lee

Reinforcement Learning 5. Monte Carlo Methods

Reinforcement Learning 5. Monte Carlo Methods

Reinforcement Learning 5. Monte Carlo MethodsSeung Jae Lee

Reinforcement Learning 10. On-policy Control with Approximation

Reinforcement Learning 10. On-policy Control with Approximation

Reinforcement Learning 10. On-policy Control with ApproximationSeung Jae Lee

Reinforcement Learning 4. Dynamic Programming

Reinforcement Learning 4. Dynamic Programming

Reinforcement Learning 4. Dynamic ProgrammingSeung Jae Lee

Recommandé

[1312.5602] Playing Atari with Deep Reinforcement Learning

[1312.5602] Playing Atari with Deep Reinforcement Learning

[1312.5602] Playing Atari with Deep Reinforcement LearningSeung Jae Lee

[1808.00177] Learning Dexterous In-Hand Manipulation

[1808.00177] Learning Dexterous In-Hand Manipulation

[1808.00177] Learning Dexterous In-Hand ManipulationSeung Jae Lee

Reinforcement Learning 8: Planning and Learning with Tabular Methods

Reinforcement Learning 8: Planning and Learning with Tabular Methods

Reinforcement Learning 8: Planning and Learning with Tabular MethodsSeung Jae Lee

Reinforcement Learning 7. n-step Bootstrapping

Reinforcement Learning 7. n-step Bootstrapping

Reinforcement Learning 7. n-step BootstrappingSeung Jae Lee

Reinforcement Learning 6. Temporal Difference Learning

Reinforcement Learning 6. Temporal Difference Learning

Reinforcement Learning 6. Temporal Difference LearningSeung Jae Lee

Reinforcement Learning 5. Monte Carlo Methods

Reinforcement Learning 5. Monte Carlo Methods

Reinforcement Learning 5. Monte Carlo MethodsSeung Jae Lee

Reinforcement Learning 10. On-policy Control with Approximation

Reinforcement Learning 10. On-policy Control with Approximation

Reinforcement Learning 10. On-policy Control with ApproximationSeung Jae Lee

Reinforcement Learning 4. Dynamic Programming

Reinforcement Learning 4. Dynamic Programming

Reinforcement Learning 4. Dynamic ProgrammingSeung Jae Lee

Unraveling Multimodality with Large Language Models.pdf

Unraveling Multimodality with Large Language Models.pdf

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

What's New in Teams Calling, Meetings and Devices March 2024

What's New in Teams Calling, Meetings and Devices March 2024

What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

Generative AI for Technical Writer or Information Developers

Generative AI for Technical Writer or Information Developers

Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan

A Journey Into the Emotions of Software Developers

A Journey Into the Emotions of Software Developers

A Journey Into the Emotions of Software DevelopersNicole Novielli

Dev Dives: Streamline document processing with UiPath Studio Web

Dev Dives: Streamline document processing with UiPath Studio Web

Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity

DevoxxFR 2024 Reproducible Builds with Apache Maven

DevoxxFR 2024 Reproducible Builds with Apache Maven

DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

What is Artificial Intelligence?????????

What is Artificial Intelligence?????????

What is Artificial Intelligence?????????blackmambaettijean

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx

A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3

The Ultimate Guide to Choosing WordPress Pros and Cons

The Ultimate Guide to Choosing WordPress Pros and Cons

The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech

DevEX - reference for building teams, processes, and platforms

DevEX - reference for building teams, processes, and platforms

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

Gen AI in Business - Global Trends Report 2024.pdf

Gen AI in Business - Global Trends Report 2024.pdf

Gen AI in Business - Global Trends Report 2024.pdfAddepto

TeamStation AI System Report LATAM IT Salaries 2024

TeamStation AI System Report LATAM IT Salaries 2024

TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3

"Debugging python applications inside k8s environment", Andrii Soldatenko

"Debugging python applications inside k8s environment", Andrii Soldatenko

"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays

Advanced Computer Architecture – An Introduction

Advanced Computer Architecture – An Introduction

Advanced Computer Architecture – An IntroductionDilum Bandara

Digital Identity is Under Attack: FIDO Paris Seminar.pptx

Digital Identity is Under Attack: FIDO Paris Seminar.pptx

Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3

2024 State of Marketing Report – by Hubspot

2024 State of Marketing Report – by Hubspot

2024 State of Marketing Report – by HubspotMarius Sescu

Everything You Need To Know About ChatGPT

Everything You Need To Know About ChatGPT

Everything You Need To Know About ChatGPTExpeed Software

Contenu connexe

Dernier

Unraveling Multimodality with Large Language Models.pdf

Unraveling Multimodality with Large Language Models.pdf

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

What's New in Teams Calling, Meetings and Devices March 2024

What's New in Teams Calling, Meetings and Devices March 2024

What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

Generative AI for Technical Writer or Information Developers

Generative AI for Technical Writer or Information Developers

Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan

A Journey Into the Emotions of Software Developers

A Journey Into the Emotions of Software Developers

A Journey Into the Emotions of Software DevelopersNicole Novielli

Dev Dives: Streamline document processing with UiPath Studio Web

Dev Dives: Streamline document processing with UiPath Studio Web

Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity

DevoxxFR 2024 Reproducible Builds with Apache Maven

DevoxxFR 2024 Reproducible Builds with Apache Maven

DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

What is Artificial Intelligence?????????

What is Artificial Intelligence?????????

What is Artificial Intelligence?????????blackmambaettijean

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx

A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3

The Ultimate Guide to Choosing WordPress Pros and Cons

The Ultimate Guide to Choosing WordPress Pros and Cons

The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech

DevEX - reference for building teams, processes, and platforms

DevEX - reference for building teams, processes, and platforms

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

Gen AI in Business - Global Trends Report 2024.pdf

Gen AI in Business - Global Trends Report 2024.pdf

Gen AI in Business - Global Trends Report 2024.pdfAddepto

TeamStation AI System Report LATAM IT Salaries 2024

TeamStation AI System Report LATAM IT Salaries 2024

TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3

"Debugging python applications inside k8s environment", Andrii Soldatenko

"Debugging python applications inside k8s environment", Andrii Soldatenko

"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays

Advanced Computer Architecture – An Introduction

Advanced Computer Architecture – An Introduction

Advanced Computer Architecture – An IntroductionDilum Bandara

Digital Identity is Under Attack: FIDO Paris Seminar.pptx

Digital Identity is Under Attack: FIDO Paris Seminar.pptx

Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3

Dernier (20)

Unraveling Multimodality with Large Language Models.pdf

Unraveling Multimodality with Large Language Models.pdf

Unraveling Multimodality with Large Language Models.pdf

What's New in Teams Calling, Meetings and Devices March 2024

What's New in Teams Calling, Meetings and Devices March 2024

What's New in Teams Calling, Meetings and Devices March 2024

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)

Generative AI for Technical Writer or Information Developers

Generative AI for Technical Writer or Information Developers

Generative AI for Technical Writer or Information Developers

A Journey Into the Emotions of Software Developers

A Journey Into the Emotions of Software Developers

A Journey Into the Emotions of Software Developers

Dev Dives: Streamline document processing with UiPath Studio Web

Dev Dives: Streamline document processing with UiPath Studio Web

Dev Dives: Streamline document processing with UiPath Studio Web

DevoxxFR 2024 Reproducible Builds with Apache Maven

DevoxxFR 2024 Reproducible Builds with Apache Maven

DevoxxFR 2024 Reproducible Builds with Apache Maven

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

What is Artificial Intelligence?????????

What is Artificial Intelligence?????????

What is Artificial Intelligence?????????

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx

The Ultimate Guide to Choosing WordPress Pros and Cons

The Ultimate Guide to Choosing WordPress Pros and Cons

The Ultimate Guide to Choosing WordPress Pros and Cons

DevEX - reference for building teams, processes, and platforms

DevEX - reference for building teams, processes, and platforms

DevEX - reference for building teams, processes, and platforms

Gen AI in Business - Global Trends Report 2024.pdf

Gen AI in Business - Global Trends Report 2024.pdf

Gen AI in Business - Global Trends Report 2024.pdf

TeamStation AI System Report LATAM IT Salaries 2024

TeamStation AI System Report LATAM IT Salaries 2024

TeamStation AI System Report LATAM IT Salaries 2024

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx

"Debugging python applications inside k8s environment", Andrii Soldatenko

"Debugging python applications inside k8s environment", Andrii Soldatenko

"Debugging python applications inside k8s environment", Andrii Soldatenko

Advanced Computer Architecture – An Introduction

Advanced Computer Architecture – An Introduction

Advanced Computer Architecture – An Introduction

Digital Identity is Under Attack: FIDO Paris Seminar.pptx

Digital Identity is Under Attack: FIDO Paris Seminar.pptx

Digital Identity is Under Attack: FIDO Paris Seminar.pptx

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx

En vedette

2024 State of Marketing Report – by Hubspot

2024 State of Marketing Report – by Hubspot

2024 State of Marketing Report – by HubspotMarius Sescu

Everything You Need To Know About ChatGPT

Everything You Need To Know About ChatGPT

Everything You Need To Know About ChatGPTExpeed Software

Product Design Trends in 2024 | Teenage Engineerings

Product Design Trends in 2024 | Teenage Engineerings

Product Design Trends in 2024 | Teenage EngineeringsPixeldarts

How Race, Age and Gender Shape Attitudes Towards Mental Health

How Race, Age and Gender Shape Attitudes Towards Mental Health

How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow

AI Trends in Creative Operations 2024 by Artwork Flow.pdf

AI Trends in Creative Operations 2024 by Artwork Flow.pdf

AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork

Skeleton Culture Code

Skeleton Culture Code

Skeleton Culture CodeSkeleton Technologies

PEPSICO Presentation to CAGNY Conference Feb 2024

PEPSICO Presentation to CAGNY Conference Feb 2024

PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley

Content Methodology: A Best Practices Report (Webinar)

Content Methodology: A Best Practices Report (Webinar)

Content Methodology: A Best Practices Report (Webinar)contently

How to Prepare For a Successful Job Search for 2024

How to Prepare For a Successful Job Search for 2024

How to Prepare For a Successful Job Search for 2024Albert Qian

Social Media Marketing Trends 2024 // The Global Indie Insights

Social Media Marketing Trends 2024 // The Global Indie Insights

Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)

Trends In Paid Search: Navigating The Digital Landscape In 2024

Trends In Paid Search: Navigating The Digital Landscape In 2024

Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal

5 Public speaking tips from TED - Visualized summary

5 Public speaking tips from TED - Visualized summary

5 Public speaking tips from TED - Visualized summarySpeakerHub

ChatGPT and the Future of Work - Clark Boyd

ChatGPT and the Future of Work - Clark Boyd

ChatGPT and the Future of Work - Clark Boyd Clark Boyd

Getting into the tech field. what next

Getting into the tech field. what next

Getting into the tech field. what next Tessa Mero

Google's Just Not That Into You: Understanding Core Updates & Search Intent

Google's Just Not That Into You: Understanding Core Updates & Search Intent

Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray

How to have difficult conversations

How to have difficult conversations

How to have difficult conversations Rajiv Jayarajah, MAppComm, ACC

Introduction to Data Science

Introduction to Data Science

Introduction to Data ScienceChristy Abraham Joy

Time Management & Productivity - Best Practices

Time Management & Productivity - Best Practices

Time Management & Productivity - Best PracticesVit Horky

The six step guide to practical project management

The six step guide to practical project management

The six step guide to practical project managementMindGenius

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36

En vedette (20)

2024 State of Marketing Report – by Hubspot

2024 State of Marketing Report – by Hubspot

2024 State of Marketing Report – by Hubspot

Everything You Need To Know About ChatGPT

Everything You Need To Know About ChatGPT

Everything You Need To Know About ChatGPT

Product Design Trends in 2024 | Teenage Engineerings

Product Design Trends in 2024 | Teenage Engineerings

Product Design Trends in 2024 | Teenage Engineerings

How Race, Age and Gender Shape Attitudes Towards Mental Health

How Race, Age and Gender Shape Attitudes Towards Mental Health

How Race, Age and Gender Shape Attitudes Towards Mental Health

AI Trends in Creative Operations 2024 by Artwork Flow.pdf

AI Trends in Creative Operations 2024 by Artwork Flow.pdf

AI Trends in Creative Operations 2024 by Artwork Flow.pdf

Skeleton Culture Code

Skeleton Culture Code

Skeleton Culture Code

PEPSICO Presentation to CAGNY Conference Feb 2024

PEPSICO Presentation to CAGNY Conference Feb 2024

PEPSICO Presentation to CAGNY Conference Feb 2024

Content Methodology: A Best Practices Report (Webinar)

Content Methodology: A Best Practices Report (Webinar)

Content Methodology: A Best Practices Report (Webinar)

How to Prepare For a Successful Job Search for 2024

How to Prepare For a Successful Job Search for 2024

How to Prepare For a Successful Job Search for 2024

Social Media Marketing Trends 2024 // The Global Indie Insights

Social Media Marketing Trends 2024 // The Global Indie Insights

Social Media Marketing Trends 2024 // The Global Indie Insights

Trends In Paid Search: Navigating The Digital Landscape In 2024

Trends In Paid Search: Navigating The Digital Landscape In 2024

Trends In Paid Search: Navigating The Digital Landscape In 2024

5 Public speaking tips from TED - Visualized summary

5 Public speaking tips from TED - Visualized summary

5 Public speaking tips from TED - Visualized summary

ChatGPT and the Future of Work - Clark Boyd

ChatGPT and the Future of Work - Clark Boyd

ChatGPT and the Future of Work - Clark Boyd

Getting into the tech field. what next

Getting into the tech field. what next

Getting into the tech field. what next

Google's Just Not That Into You: Understanding Core Updates & Search Intent

Google's Just Not That Into You: Understanding Core Updates & Search Intent

Google's Just Not That Into You: Understanding Core Updates & Search Intent

How to have difficult conversations

How to have difficult conversations

How to have difficult conversations

Introduction to Data Science

Introduction to Data Science

Introduction to Data Science

Time Management & Productivity - Best Practices

Time Management & Productivity - Best Practices

Time Management & Productivity - Best Practices

The six step guide to practical project management

The six step guide to practical project management

The six step guide to practical project management

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...

Learning Montezuma's Revenge from a Single Demonstration

1. Learning Montezuma’s Revenge from a Single Demonstration (18.07) Ryan Lee

2. Exploration and Learning ● Exploration: Find action sequence with positive reward ● Learning: Remember and generalize action sequence ● Need both for a successful agent

3. Montezuma’s Revenge ● One of the hardest games in Atari 2600 ● Sparse rewards → Exploration is difficult https://www.retrogames.cz/play_124-Atari2600.php?language=EN

4. Simplifying Exploration with Demonstrations ● Solution: Shorten the episode ○ Start the agent near the end of demonstration ○ Train agent until it ties or beats the demonstrator’s score ○ Gradually move starting point back in time Go down Ladder 1 Go down Rope Go down Ladder 2 Jump over Skull Go up Ladder 3

5. Go down Ladder 1 Go down Rope Go down Ladder 2 Jump over Skull Go up Ladder 3 Go down Ladder 1 Go down Rope Go down Ladder 2 Jump over Skull Go up Ladder 3 Go down Ladder 1 Go down Rope Go down Ladder 2 Jump over Skull Go up Ladder 3 Go down Ladder 1 Go down Rope Go down Ladder 2 Jump over Skull Go up Ladder 3 Go down Ladder 1 Go down Rope Go down Ladder 2 Jump over Skull Go up Ladder 3

6.

7.

8.

9. Result ● 74500 points on Montezuma’s Revenge (State of the Art) ● Surpasses demo score of 71500 ● Exploits emulator flaw

10. Comparison with DeepMind’s approach ● DeepMind’s approach ○ Less control over environment needed ○ Agents imitate the demo ● This approach ○ Need full game states in demo ○ Directly optimize game score → Less overfitting for sub-optimal demo ○ Better in multiplayer games where performance should be optimized against various opponents

11. Remaining Challenges ● Agent cannot reach exact state in demo ○ Agent needs to generalize between similar states ○ Problematic in Gravitar or Pitfall ● Careful hyperparameter tuning needed ● High variance in each run ● NN does not generalize as well as human https://blog.openai.com/openai-baselines-ppo/

12. Thank you! Original content by OpenAI ● Learning Montezuma’s Revenge from a Single Demonstration You can find more content in ● github.com/seungjaeryanlee ● www.endtoend.ai