SlideShare a Scribd company logo
1 of 37
UNDERSTANDING ALPHA GO
How Deep Learning Made the Impossible Possible
ABOUT MYSELF
 Ms.c. In computer Science, HUJI
 Research interest: Deep Learning in Computer
Vision, NLP, Reinforcement learning.
 Also, DL Theory and other ML stuff.
 Works in a DL start-up (Imubit)
 Contact: mangate@gmail.com
CREDITS
 A lot of slides were taken from the following publicly
available slideshows:
 https://www.slideshare.net/ShaneSeungwhanMoon/how-
alphago-works
 https://www.slideshare.net/ckmarkohchang/alphago-in-depth
 https://www.slideshare.net/KarelHa1/alphago-mastering-the-
game-of-go-with-deep-neural-networks-and-tree-search
 Original AlphaGo article:
Silver, David, et al. "Mastering the game of Go with
deep neural networks and tree search.“Nature 529.7587
(2016): 484-489.
Available here:
http://web.iitd.ac.in/~sumeet/Silver16.pdf
DEEP LEARNING IS CHANGING OUR LIVES
 Search Engine (also for images and audio)
 Spam filters
 Recommender systems (Netflix, Youtube)
 Self-Driving Cars
 Cyber security (and regular one via computer
vision)
 Machine translation.
 Speech to text, audio recognition.
 Image recognition, smart shopping
 And more and more and more…
AI VERSUS HUMAN
 In 1997, a super computer called Deep Blue (IBM) won Garry
Kasparov.
 This was the first defeat of a reigning world chess champion
by a computer under tournament conditions.
AI VERSUS HUMAN
 In 2011 Watson, another super-computer by IBM, “crashed”
the 2 best player in Jepoerdy, a popular question-answering
tv-show.
GO
 An ancient Chinese Game
(2,500 years old!)
 Despite its relatively simple
rules, Go is very complex,
even more so than chess.
 Winning Go requires a
great deal of intuition and
therefore was considered
unachievable by computer for at least the next 30
years.
AI VESUS HUMAN
 In 2016 a AlphaGo, a computer program by
DeepMind (part of Google) played a 5-games Go
match aginst Lee Sedol.
 Lee Sedol:
 professional 9-Dan (highest ranking in Go) considered
among top 3 players in the world.
 2nd in international titles.
 Won 97 out of 100 games
against european Go
champion Fan Hui.
AI VERSUS HUMAN
 “I’m confident that I can win, at least this time” – Lee Sedol
 Alpha Go won 4-1
 “I kind of felt powerless… misjudged the capabilities of
AlphaGo” – Lee Sedol
 How is it possible? Deep Learning.
AI IN GAME PLAYING
 Almost every game can be “simulated” with a tree search.
 A move is done if it has to most chances to end in a victory.
AI IN GAMES
 More formally: an optimal value function V*(s)
determines the outcome of the game:
 From every board position (state=s)
 Under perfect play by all players.
 This is done by going over the tree containing
possible move sequences where:
 b is the games breadth (number of legal moves in each
position)
 d is the game depth (game length in moves)
 Tic-Tac-Toe:
 Chess:
d
b
4, 4b d 
35 80b d 
TREE SEARCH IN GO
 However in GO:
 This is more than the number of atoms in the entire universe!
 Go Is more complex than chess!
250, 150b d 
100
10 ( )Googol
KEY: REDUCE THE SEARCH SPACE
 Reducing b (possible actions space)
KEY: REDUCE THE SEARCH SPACE
 Reducing d – Position evaluation ahead of time
 Instead of simulating all the way to the end:
Both reductions are done with Deep Learning.
SOME CONCEPTS
 Supervised Learning (classification)
 On a given data, predict a class (or choose 1 option
out of some known number of options)
SOME CONCEPTS
 Supervised Learning (regression)
 On a given data, predict some real number
SOME CONCEPTS
 Reinforcement Learning
 Upon given state (observation) perform some
action which leads to the goal (i.e. winning a game)
SOME CONCEPTS
 CNN’s are able to learn abstract features of a given image
REDUCING ACTION CANDIDATES
 Done by learning to “imitate” expert moves
 Data: Online Go experts. 160K Games 300M moves.
 This is supervised classification (on given data predict the
expert action out of all possible ones)
REDUCING ACTION CANDIDATES
 This deep CNN achieved 55% test accuracy on predicting
expert moves.
 Imitators with no Deep Learning reached only 22% accuracy.
 Small improvement in accuracy lead to big improvement in
playing ability.
ROLLOUT NETWORK
 Train additional smaller network
(Ppi ) for imitating.
 This network achieves only 24.2%
accuracy.
 Works 1000 times faster (2us
compared to 3ms).
 This network is used for rollouts
(explained later).
IMPROVING THE NETWORK
 Improve the imitator network through self playing
(Reinforcement learning)
 An entire game is played and the parameters are
updates according to the results.
IMPROVING THE NETWORK
 Keep generating better models by self-play newer models
against old ones
 The final network also won 85% against the best GO software
(model without self play won only 11%)
 However, the model was eventually not used during the
games. It was used to generate the value function.
REDUCING SEARCH DEPTH - DATASET
 Self-play with the imitator model for some steps (0
to 450).
 Make some random move. This is the starting
position ‘s’.
 Self play until the end with the RL network (latest
model).
 If black won z=1 otherwise z=0.
 Save (s,z) to the dataset.
 Generated 30M (s,z) pairs from 30M games.
REDUCING SEARCH DEPTH –
VALUE FUNCTION
 Regression task, for a given position S give number between
0 and 1.
 Now, for each possible position we can have an evaluation of
how “good” it is for the black player.
REDUCING SEARCH SPACE
PUTTING IT ALL TOGETHER - MCST
 During game time a method called Monte-Carlo
Search Tree (MCTS) is applied.
 This method have 4 steps:
 Selection
 Expansion
 Evaluation
 Backup (update)
 For each play in the game this process is repeated
about 10K times.
MCTS - SELECTION
 At each step we have a starting
position (the board at this point).
 An action is selected
using a combination of the imitator
network and some other value
(Q) which is set to 0 at the start.
 we divide by the
times a state/action pair was
visited to encourage diversity.
( , )
( )
1 ( , )
P s a
u p
N s a


MCTS - EXPANSION
 When building the tree,
position can be expended once
(create new leafs in the tree)
with the imitator network.
 This way we have the new u(P)
for the next searches.
MCTS - EVALUATION
 After simulating 3-4 steps
with the imitating network
we evaluate the board
position.
 This is done in two ways:
 The value network prediction.
 Using the smaller imitator
network to self-play to the end
(rollout), and save the result
(1 for black win 0 for white)
 Both evaluation are combined
to give this board position a
number between 0 and 1.
MCTS – BACKUP (UPDATE)
 After the simulation we
update the tree.
 Update Q (which was
0 in the beginning) with
the value computed with
the value network and the
rollouts.
 Update N(s,a): Increase
by one for each
state/action pair visited.
CHOOSING AN ACTION
 For each step during the game MCTS is done for
10K times.
 In the end the action which was visited the most
times from the root position (the current board) is
taken.
 Notes:
 Since this process is long they had to use the smaller
network for rollouts to keep it feasible (otherwise each
move would have taken the computer several days to
compute).
 The imitator network was better in choosing the first
actions compared to the RL network, probably due to
human taking more diverse actions.
ALPHA GO WEAKNESSES
 In the 4th game, Lee Sedol got the board to a
position which was not on Alpha Go search tree,
causing the program to choose worse actions and
losing the game eventually.
 Most assumptions made for Alpha-Go are not
relevant in real life RL problems. See:
https://medium.com/@karpathy/alphago-in-context-
c47718cb95a5
RETIREMENT
 In March 2017 alpha go won Ke Jie, the 1st ranked in the
world, 3-0.
 Google’s DeepMind unit announced that it would be the last
event match the AI plays.
SUMMARY
 To this day, AlphaGo is considered one of the greatest AI
achievements in recent history.
 This achievement was made by combining Deep
Learning with standard method (like MCST) to “simplify”
the very complex game of Go.
 4 Deep Neural Networks were used:
 3 almost identical Convolutional Neural Network:
 Imitating network for action space reduction.
 RL network created through self-play, for generating the dataset
for the value network.
 Value network for search depth reduction.
 1 small network for rollouts.
 Deep Learning keeps achieving new amazing goals
every day, and is one of the fastest growing fields in
both academy and industry.
QUESTIONS?
Thank you!

More Related Content

What's hot

알아두면 쓸데있는 신기한 강화학습 NAVER 2017
알아두면 쓸데있는 신기한 강화학습 NAVER 2017알아두면 쓸데있는 신기한 강화학습 NAVER 2017
알아두면 쓸데있는 신기한 강화학습 NAVER 2017Taehoon Kim
 
Continuous control with deep reinforcement learning (DDPG)
Continuous control with deep reinforcement learning (DDPG)Continuous control with deep reinforcement learning (DDPG)
Continuous control with deep reinforcement learning (DDPG)Taehoon Kim
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningKhaled Saleh
 
A brief overview of Reinforcement Learning applied to games
A brief overview of Reinforcement Learning applied to gamesA brief overview of Reinforcement Learning applied to games
A brief overview of Reinforcement Learning applied to gamesThomas da Silva Paula
 
Deep sarsa, Deep Q-learning, DQN
Deep sarsa, Deep Q-learning, DQNDeep sarsa, Deep Q-learning, DQN
Deep sarsa, Deep Q-learning, DQNEuijin Jeong
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learningSubrat Panda, PhD
 
DQN (Deep Q-Network)
DQN (Deep Q-Network)DQN (Deep Q-Network)
DQN (Deep Q-Network)Dong Guo
 
Multi-Armed Bandit and Applications
Multi-Armed Bandit and ApplicationsMulti-Armed Bandit and Applications
Multi-Armed Bandit and ApplicationsSangwoo Mo
 
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...Joonhyung Lee
 
지적 대화를 위한 깊고 넓은 딥러닝 PyCon APAC 2016
지적 대화를 위한 깊고 넓은 딥러닝 PyCon APAC 2016지적 대화를 위한 깊고 넓은 딥러닝 PyCon APAC 2016
지적 대화를 위한 깊고 넓은 딥러닝 PyCon APAC 2016Taehoon Kim
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningSalem-Kabbani
 
deep reinforcement learning with double q learning
deep reinforcement learning with double q learningdeep reinforcement learning with double q learning
deep reinforcement learning with double q learningSeungHyeok Baek
 
AlphaGo 알고리즘 요약
AlphaGo 알고리즘 요약AlphaGo 알고리즘 요약
AlphaGo 알고리즘 요약Jooyoul Lee
 
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기NAVER D2
 
An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)Thomas da Silva Paula
 
알파고 (바둑 인공지능)의 작동 원리
알파고 (바둑 인공지능)의 작동 원리알파고 (바둑 인공지능)의 작동 원리
알파고 (바둑 인공지능)의 작동 원리Shane (Seungwhan) Moon
 

What's hot (20)

알아두면 쓸데있는 신기한 강화학습 NAVER 2017
알아두면 쓸데있는 신기한 강화학습 NAVER 2017알아두면 쓸데있는 신기한 강화학습 NAVER 2017
알아두면 쓸데있는 신기한 강화학습 NAVER 2017
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
AlphaZero
AlphaZeroAlphaZero
AlphaZero
 
Continuous control with deep reinforcement learning (DDPG)
Continuous control with deep reinforcement learning (DDPG)Continuous control with deep reinforcement learning (DDPG)
Continuous control with deep reinforcement learning (DDPG)
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement Learning
 
Alpha zero - London 2018
Alpha zero  - London 2018 Alpha zero  - London 2018
Alpha zero - London 2018
 
A brief overview of Reinforcement Learning applied to games
A brief overview of Reinforcement Learning applied to gamesA brief overview of Reinforcement Learning applied to games
A brief overview of Reinforcement Learning applied to games
 
Deep sarsa, Deep Q-learning, DQN
Deep sarsa, Deep Q-learning, DQNDeep sarsa, Deep Q-learning, DQN
Deep sarsa, Deep Q-learning, DQN
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learning
 
DQN (Deep Q-Network)
DQN (Deep Q-Network)DQN (Deep Q-Network)
DQN (Deep Q-Network)
 
Multi-Armed Bandit and Applications
Multi-Armed Bandit and ApplicationsMulti-Armed Bandit and Applications
Multi-Armed Bandit and Applications
 
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
 
지적 대화를 위한 깊고 넓은 딥러닝 PyCon APAC 2016
지적 대화를 위한 깊고 넓은 딥러닝 PyCon APAC 2016지적 대화를 위한 깊고 넓은 딥러닝 PyCon APAC 2016
지적 대화를 위한 깊고 넓은 딥러닝 PyCon APAC 2016
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
deep reinforcement learning with double q learning
deep reinforcement learning with double q learningdeep reinforcement learning with double q learning
deep reinforcement learning with double q learning
 
AlphaGo 알고리즘 요약
AlphaGo 알고리즘 요약AlphaGo 알고리즘 요약
AlphaGo 알고리즘 요약
 
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기
 
An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
알파고 (바둑 인공지능)의 작동 원리
알파고 (바둑 인공지능)의 작동 원리알파고 (바둑 인공지능)의 작동 원리
알파고 (바둑 인공지능)의 작동 원리
 

Similar to Understanding AlphaGo

How DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoHow DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoTim Riser
 
J-Fall 2017 - AI Self-learning Game Playing
J-Fall 2017 - AI Self-learning Game PlayingJ-Fall 2017 - AI Self-learning Game Playing
J-Fall 2017 - AI Self-learning Game PlayingRichard Abbuhl
 
Alpha go 16110226_김영우
Alpha go 16110226_김영우Alpha go 16110226_김영우
Alpha go 16110226_김영우영우 김
 
Devoxx 2017 - AI Self-learning Game Playing
Devoxx 2017 - AI Self-learning Game PlayingDevoxx 2017 - AI Self-learning Game Playing
Devoxx 2017 - AI Self-learning Game PlayingRichard Abbuhl
 
AlphaZero and beyond: Polygames
AlphaZero and beyond: PolygamesAlphaZero and beyond: Polygames
AlphaZero and beyond: PolygamesOlivier Teytaud
 
Applying AI in Games (GDC2019)
Applying AI in Games (GDC2019)Applying AI in Games (GDC2019)
Applying AI in Games (GDC2019)Jun Okumura
 
Badiya haihn
Badiya haihnBadiya haihn
Badiya haihnamitp26
 
Adversarial search
Adversarial searchAdversarial search
Adversarial searchDheerendra k
 
Mastering the game of Go with deep neural networks and tree search: Presentation
Mastering the game of Go with deep neural networks and tree search: PresentationMastering the game of Go with deep neural networks and tree search: Presentation
Mastering the game of Go with deep neural networks and tree search: PresentationKarel Ha
 
What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?Tobias Pfeiffer
 
What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?Tobias Pfeiffer
 
Chakrabarti alpha go analysis
Chakrabarti alpha go analysisChakrabarti alpha go analysis
Chakrabarti alpha go analysisDave Selinger
 
Machine Learning: A gentle Introduction
Machine Learning: A gentle IntroductionMachine Learning: A gentle Introduction
Machine Learning: A gentle IntroductionMatthias Zimmermann
 
AI Supremacy in Games: Deep Blue, Watson, Cepheus, AlphaGo, DeepStack and Ten...
AI Supremacy in Games: Deep Blue, Watson, Cepheus, AlphaGo, DeepStack and Ten...AI Supremacy in Games: Deep Blue, Watson, Cepheus, AlphaGo, DeepStack and Ten...
AI Supremacy in Games: Deep Blue, Watson, Cepheus, AlphaGo, DeepStack and Ten...Karel Ha
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...Data Con LA
 
From Alpha Go to Alpha Zero - Vaas Madrid 2018
From Alpha Go to Alpha Zero -  Vaas Madrid 2018From Alpha Go to Alpha Zero -  Vaas Madrid 2018
From Alpha Go to Alpha Zero - Vaas Madrid 2018Juantomás García Molina
 

Similar to Understanding AlphaGo (20)

How DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoHow DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of Go
 
J-Fall 2017 - AI Self-learning Game Playing
J-Fall 2017 - AI Self-learning Game PlayingJ-Fall 2017 - AI Self-learning Game Playing
J-Fall 2017 - AI Self-learning Game Playing
 
Alpha go 16110226_김영우
Alpha go 16110226_김영우Alpha go 16110226_김영우
Alpha go 16110226_김영우
 
Devoxx 2017 - AI Self-learning Game Playing
Devoxx 2017 - AI Self-learning Game PlayingDevoxx 2017 - AI Self-learning Game Playing
Devoxx 2017 - AI Self-learning Game Playing
 
AlphaZero and beyond: Polygames
AlphaZero and beyond: PolygamesAlphaZero and beyond: Polygames
AlphaZero and beyond: Polygames
 
Applying AI in Games (GDC2019)
Applying AI in Games (GDC2019)Applying AI in Games (GDC2019)
Applying AI in Games (GDC2019)
 
Badiya haihn
Badiya haihnBadiya haihn
Badiya haihn
 
Adversarial search
Adversarial searchAdversarial search
Adversarial search
 
Mastering the game of Go with deep neural networks and tree search: Presentation
Mastering the game of Go with deep neural networks and tree search: PresentationMastering the game of Go with deep neural networks and tree search: Presentation
Mastering the game of Go with deep neural networks and tree search: Presentation
 
What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?
 
What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?
 
Chakrabarti alpha go analysis
Chakrabarti alpha go analysisChakrabarti alpha go analysis
Chakrabarti alpha go analysis
 
Ai in games
Ai in gamesAi in games
Ai in games
 
Machine Learning: A gentle Introduction
Machine Learning: A gentle IntroductionMachine Learning: A gentle Introduction
Machine Learning: A gentle Introduction
 
(Alpha) Zero to Elo (with demo)
(Alpha) Zero to Elo (with demo)(Alpha) Zero to Elo (with demo)
(Alpha) Zero to Elo (with demo)
 
Scaling Deep Learning
Scaling Deep LearningScaling Deep Learning
Scaling Deep Learning
 
AI Supremacy in Games: Deep Blue, Watson, Cepheus, AlphaGo, DeepStack and Ten...
AI Supremacy in Games: Deep Blue, Watson, Cepheus, AlphaGo, DeepStack and Ten...AI Supremacy in Games: Deep Blue, Watson, Cepheus, AlphaGo, DeepStack and Ten...
AI Supremacy in Games: Deep Blue, Watson, Cepheus, AlphaGo, DeepStack and Ten...
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...
 
From Alpha Go to Alpha Zero - Vaas Madrid 2018
From Alpha Go to Alpha Zero -  Vaas Madrid 2018From Alpha Go to Alpha Zero -  Vaas Madrid 2018
From Alpha Go to Alpha Zero - Vaas Madrid 2018
 
Google Deepmind Mastering Go Research Paper
Google Deepmind Mastering Go Research PaperGoogle Deepmind Mastering Go Research Paper
Google Deepmind Mastering Go Research Paper
 

Recently uploaded

Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Monika Rani
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Creating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsCreating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsNurulAfiqah307317
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...Lokesh Kothari
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxRizalinePalanog2
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 

Recently uploaded (20)

Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Creating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsCreating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening Designs
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 

Understanding AlphaGo

  • 1. UNDERSTANDING ALPHA GO How Deep Learning Made the Impossible Possible
  • 2. ABOUT MYSELF  Ms.c. In computer Science, HUJI  Research interest: Deep Learning in Computer Vision, NLP, Reinforcement learning.  Also, DL Theory and other ML stuff.  Works in a DL start-up (Imubit)  Contact: mangate@gmail.com
  • 3. CREDITS  A lot of slides were taken from the following publicly available slideshows:  https://www.slideshare.net/ShaneSeungwhanMoon/how- alphago-works  https://www.slideshare.net/ckmarkohchang/alphago-in-depth  https://www.slideshare.net/KarelHa1/alphago-mastering-the- game-of-go-with-deep-neural-networks-and-tree-search  Original AlphaGo article: Silver, David, et al. "Mastering the game of Go with deep neural networks and tree search.“Nature 529.7587 (2016): 484-489. Available here: http://web.iitd.ac.in/~sumeet/Silver16.pdf
  • 4. DEEP LEARNING IS CHANGING OUR LIVES  Search Engine (also for images and audio)  Spam filters  Recommender systems (Netflix, Youtube)  Self-Driving Cars  Cyber security (and regular one via computer vision)  Machine translation.  Speech to text, audio recognition.  Image recognition, smart shopping  And more and more and more…
  • 5. AI VERSUS HUMAN  In 1997, a super computer called Deep Blue (IBM) won Garry Kasparov.  This was the first defeat of a reigning world chess champion by a computer under tournament conditions.
  • 6. AI VERSUS HUMAN  In 2011 Watson, another super-computer by IBM, “crashed” the 2 best player in Jepoerdy, a popular question-answering tv-show.
  • 7. GO  An ancient Chinese Game (2,500 years old!)  Despite its relatively simple rules, Go is very complex, even more so than chess.  Winning Go requires a great deal of intuition and therefore was considered unachievable by computer for at least the next 30 years.
  • 8. AI VESUS HUMAN  In 2016 a AlphaGo, a computer program by DeepMind (part of Google) played a 5-games Go match aginst Lee Sedol.  Lee Sedol:  professional 9-Dan (highest ranking in Go) considered among top 3 players in the world.  2nd in international titles.  Won 97 out of 100 games against european Go champion Fan Hui.
  • 9. AI VERSUS HUMAN  “I’m confident that I can win, at least this time” – Lee Sedol  Alpha Go won 4-1  “I kind of felt powerless… misjudged the capabilities of AlphaGo” – Lee Sedol  How is it possible? Deep Learning.
  • 10. AI IN GAME PLAYING  Almost every game can be “simulated” with a tree search.  A move is done if it has to most chances to end in a victory.
  • 11. AI IN GAMES  More formally: an optimal value function V*(s) determines the outcome of the game:  From every board position (state=s)  Under perfect play by all players.  This is done by going over the tree containing possible move sequences where:  b is the games breadth (number of legal moves in each position)  d is the game depth (game length in moves)  Tic-Tac-Toe:  Chess: d b 4, 4b d  35 80b d 
  • 12. TREE SEARCH IN GO  However in GO:  This is more than the number of atoms in the entire universe!  Go Is more complex than chess! 250, 150b d  100 10 ( )Googol
  • 13. KEY: REDUCE THE SEARCH SPACE  Reducing b (possible actions space)
  • 14. KEY: REDUCE THE SEARCH SPACE  Reducing d – Position evaluation ahead of time  Instead of simulating all the way to the end: Both reductions are done with Deep Learning.
  • 15. SOME CONCEPTS  Supervised Learning (classification)  On a given data, predict a class (or choose 1 option out of some known number of options)
  • 16. SOME CONCEPTS  Supervised Learning (regression)  On a given data, predict some real number
  • 17. SOME CONCEPTS  Reinforcement Learning  Upon given state (observation) perform some action which leads to the goal (i.e. winning a game)
  • 18. SOME CONCEPTS  CNN’s are able to learn abstract features of a given image
  • 19. REDUCING ACTION CANDIDATES  Done by learning to “imitate” expert moves  Data: Online Go experts. 160K Games 300M moves.  This is supervised classification (on given data predict the expert action out of all possible ones)
  • 20. REDUCING ACTION CANDIDATES  This deep CNN achieved 55% test accuracy on predicting expert moves.  Imitators with no Deep Learning reached only 22% accuracy.  Small improvement in accuracy lead to big improvement in playing ability.
  • 21. ROLLOUT NETWORK  Train additional smaller network (Ppi ) for imitating.  This network achieves only 24.2% accuracy.  Works 1000 times faster (2us compared to 3ms).  This network is used for rollouts (explained later).
  • 22. IMPROVING THE NETWORK  Improve the imitator network through self playing (Reinforcement learning)  An entire game is played and the parameters are updates according to the results.
  • 23. IMPROVING THE NETWORK  Keep generating better models by self-play newer models against old ones  The final network also won 85% against the best GO software (model without self play won only 11%)  However, the model was eventually not used during the games. It was used to generate the value function.
  • 24. REDUCING SEARCH DEPTH - DATASET  Self-play with the imitator model for some steps (0 to 450).  Make some random move. This is the starting position ‘s’.  Self play until the end with the RL network (latest model).  If black won z=1 otherwise z=0.  Save (s,z) to the dataset.  Generated 30M (s,z) pairs from 30M games.
  • 25. REDUCING SEARCH DEPTH – VALUE FUNCTION  Regression task, for a given position S give number between 0 and 1.  Now, for each possible position we can have an evaluation of how “good” it is for the black player.
  • 27. PUTTING IT ALL TOGETHER - MCST  During game time a method called Monte-Carlo Search Tree (MCTS) is applied.  This method have 4 steps:  Selection  Expansion  Evaluation  Backup (update)  For each play in the game this process is repeated about 10K times.
  • 28. MCTS - SELECTION  At each step we have a starting position (the board at this point).  An action is selected using a combination of the imitator network and some other value (Q) which is set to 0 at the start.  we divide by the times a state/action pair was visited to encourage diversity. ( , ) ( ) 1 ( , ) P s a u p N s a  
  • 29. MCTS - EXPANSION  When building the tree, position can be expended once (create new leafs in the tree) with the imitator network.  This way we have the new u(P) for the next searches.
  • 30. MCTS - EVALUATION  After simulating 3-4 steps with the imitating network we evaluate the board position.  This is done in two ways:  The value network prediction.  Using the smaller imitator network to self-play to the end (rollout), and save the result (1 for black win 0 for white)  Both evaluation are combined to give this board position a number between 0 and 1.
  • 31. MCTS – BACKUP (UPDATE)  After the simulation we update the tree.  Update Q (which was 0 in the beginning) with the value computed with the value network and the rollouts.  Update N(s,a): Increase by one for each state/action pair visited.
  • 32. CHOOSING AN ACTION  For each step during the game MCTS is done for 10K times.  In the end the action which was visited the most times from the root position (the current board) is taken.  Notes:  Since this process is long they had to use the smaller network for rollouts to keep it feasible (otherwise each move would have taken the computer several days to compute).  The imitator network was better in choosing the first actions compared to the RL network, probably due to human taking more diverse actions.
  • 33. ALPHA GO WEAKNESSES  In the 4th game, Lee Sedol got the board to a position which was not on Alpha Go search tree, causing the program to choose worse actions and losing the game eventually.  Most assumptions made for Alpha-Go are not relevant in real life RL problems. See: https://medium.com/@karpathy/alphago-in-context- c47718cb95a5
  • 34. RETIREMENT  In March 2017 alpha go won Ke Jie, the 1st ranked in the world, 3-0.  Google’s DeepMind unit announced that it would be the last event match the AI plays.
  • 35. SUMMARY  To this day, AlphaGo is considered one of the greatest AI achievements in recent history.  This achievement was made by combining Deep Learning with standard method (like MCST) to “simplify” the very complex game of Go.  4 Deep Neural Networks were used:  3 almost identical Convolutional Neural Network:  Imitating network for action space reduction.  RL network created through self-play, for generating the dataset for the value network.  Value network for search depth reduction.  1 small network for rollouts.  Deep Learning keeps achieving new amazing goals every day, and is one of the fastest growing fields in both academy and industry.