SlideShare une entreprise Scribd logo
1  sur  141
Télécharger pour lire hors ligne
AlphaZero
Mastering Chess and Shogi
by Self-Play
with a General Reinforcement Learning Algorithm
Karel Ha
article by Google DeepMind
AI Seminar, 19th December 2017
Outline
The Alpha* Timeline
AlphaGo
AlphaGo Zero (AG0)
AlphaZero
Conclusion
1
The Alpha* Timeline
Fan Hui
https://en.wikipedia.org/wiki/Fan_Hui 2
Fan Hui
■ professional 2 dan
https://en.wikipedia.org/wiki/Fan_Hui 2
Fan Hui
■ professional 2 dan
■ European Go Champion in 2013, 2014 and 2015
https://en.wikipedia.org/wiki/Fan_Hui 2
Fan Hui
■ professional 2 dan
■ European Go Champion in 2013, 2014 and 2015
■ European Professional Go Champion in 2016
https://en.wikipedia.org/wiki/Fan_Hui 2
AlphaGo (AlphaGo Fan) vs. Fan Hui
3
AlphaGo (AlphaGo Fan) vs. Fan Hui
AlphaGo won 5:0 in a formal match on October 2015.
3
AlphaGo (AlphaGo Fan) vs. Fan Hui
AlphaGo won 5:0 in a formal match on October 2015.
[AlphaGo] is very strong and stable, it seems
like a wall. ... I know AlphaGo is a computer,
but if no one told me, maybe I would think
the player was a little strange, but a very
strong player, a real person.
Fan Hui 3
Lee Sedol “The Strong Stone”
https://en.wikipedia.org/wiki/Lee_Sedol 4
Lee Sedol “The Strong Stone”
■ professional 9 dan
https://en.wikipedia.org/wiki/Lee_Sedol 4
Lee Sedol “The Strong Stone”
■ professional 9 dan
■ the 2nd in international titles
https://en.wikipedia.org/wiki/Lee_Sedol 4
Lee Sedol “The Strong Stone”
■ professional 9 dan
■ the 2nd in international titles
■ the 5th youngest to become a professional Go player in South
Korean history
https://en.wikipedia.org/wiki/Lee_Sedol 4
Lee Sedol “The Strong Stone”
■ professional 9 dan
■ the 2nd in international titles
■ the 5th youngest to become a professional Go player in South
Korean history
■ Lee Sedol would win 97 out of 100 games against Fan Hui.
https://en.wikipedia.org/wiki/Lee_Sedol 4
Lee Sedol “The Strong Stone”
■ professional 9 dan
■ the 2nd in international titles
■ the 5th youngest to become a professional Go player in South
Korean history
■ Lee Sedol would win 97 out of 100 games against Fan Hui.
■ “Roger Federer” of Go
https://en.wikipedia.org/wiki/Lee_Sedol 4
I heard Google DeepMind’s AI is surprisingly
strong and getting stronger, but I am
confident that I can win, at least this time.
Lee Sedol
4
I heard Google DeepMind’s AI is surprisingly
strong and getting stronger, but I am
confident that I can win, at least this time.
Lee Sedol
...even beating AlphaGo by 4:1 may allow
the Google DeepMind team to claim its de
facto victory and the defeat of him
[Lee Sedol], or even humankind.
interview in JTBC
Newsroom
4
I heard Google DeepMind’s AI is surprisingly
strong and getting stronger, but I am
confident that I can win, at least this time.
Lee Sedol
...even beating AlphaGo by 4:1 may allow
the Google DeepMind team to claim its de
facto victory and the defeat of him
[Lee Sedol], or even humankind.
interview in JTBC
Newsroom
4
AlphaGo (AlphaGo Lee) vs. Lee Sedol
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol 5
AlphaGo (AlphaGo Lee) vs. Lee Sedol
In March 2016 AlphaGo won 4:1 against the legendary Lee Sedol.
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol 5
AlphaGo (AlphaGo Lee) vs. Lee Sedol
In March 2016 AlphaGo won 4:1 against the legendary Lee Sedol.
AlphaGo won all but the 4th game; all games were won
by resignation.
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol 5
AlphaGo (AlphaGo Lee) vs. Lee Sedol
In March 2016 AlphaGo won 4:1 against the legendary Lee Sedol.
AlphaGo won all but the 4th game; all games were won
by resignation.
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol 5
AlphaGo Master
https://deepmind.com/research/alphago/match-archive/master/ 6
AlphaGo Master
In January 2017, DeepMind revealed that AlphaGo had played
a series of unofficial online games against some of the strongest
professional Go players under the pseudonyms “Master” and
”Magister”.
https://deepmind.com/research/alphago/match-archive/master/ 6
AlphaGo Master
In January 2017, DeepMind revealed that AlphaGo had played
a series of unofficial online games against some of the strongest
professional Go players under the pseudonyms “Master” and
”Magister”.
This AlphaGo was an improved version of the AlphaGo that played
Lee Sedol in 2016.
https://deepmind.com/research/alphago/match-archive/master/ 6
AlphaGo Master
In January 2017, DeepMind revealed that AlphaGo had played
a series of unofficial online games against some of the strongest
professional Go players under the pseudonyms “Master” and
”Magister”.
This AlphaGo was an improved version of the AlphaGo that played
Lee Sedol in 2016.
Over one week, AlphaGo played 60 online fast time-control games.
https://deepmind.com/research/alphago/match-archive/master/ 6
AlphaGo Master
In January 2017, DeepMind revealed that AlphaGo had played
a series of unofficial online games against some of the strongest
professional Go players under the pseudonyms “Master” and
”Magister”.
This AlphaGo was an improved version of the AlphaGo that played
Lee Sedol in 2016.
Over one week, AlphaGo played 60 online fast time-control games.
AlphaGo won this series of games 60:0.
https://deepmind.com/research/alphago/match-archive/master/ 6
https://events.google.com/alphago2017/ 6
https://events.google.com/alphago2017/ 7
■ 23 May - 27 May 2017 in Wuzhen, China
https://events.google.com/alphago2017/ 7
■ 23 May - 27 May 2017 in Wuzhen, China
■ Team Go vs. AlphaGo 0:1
https://events.google.com/alphago2017/ 7
■ 23 May - 27 May 2017 in Wuzhen, China
■ Team Go vs. AlphaGo 0:1
■ AlphaGo vs. world champion Ke Jie 3:0
https://events.google.com/alphago2017/ 7
7
defeated AlphaGo Lee by 100 games to 0
7
AI system that mastered
chess, Shogi and Go to
“superhuman levels” within
a handful of hours
AlphaZero
8
AI system that mastered
chess, Shogi and Go to
“superhuman levels” within
a handful of hours
AlphaZero
defeated AlphaGo Zero (version with 20 blocks trained for 3 days)
by 60 games to 40
8
8
1 AlphaGo Fan
8
1 AlphaGo Fan
2 AlphaGo Lee
8
1 AlphaGo Fan
2 AlphaGo Lee
3 AlphaGo Master
8
1 AlphaGo Fan
2 AlphaGo Lee
3 AlphaGo Master
4 AlphaGo Zero
8
1 AlphaGo Fan
2 AlphaGo Lee
3 AlphaGo Master
4 AlphaGo Zero
5 AlphaZero
8
AlphaGo
Policy and Value Networks
[Silver et al. 2016] 9
Training the (Deep Convolutional) Neural Networks
[Silver et al. 2016] 10
AlphaGo Zero (AG0)
AG0: Differences Compared to AlphaGo {Fan, Lee, Master}
AlphaGo {Fan, Lee, Master} × AlphaGo Zero:
[Silver et al. 2017b] 11
AG0: Differences Compared to AlphaGo {Fan, Lee, Master}
AlphaGo {Fan, Lee, Master} × AlphaGo Zero:
■ supervised learning from human expert positions × from
scratch by self-play reinforcement learning (“tabula rasa”)
[Silver et al. 2017b] 11
AG0: Differences Compared to AlphaGo {Fan, Lee, Master}
AlphaGo {Fan, Lee, Master} × AlphaGo Zero:
■ supervised learning from human expert positions × from
scratch by self-play reinforcement learning (“tabula rasa”)
■ additional (auxialiary) input features × only the black and
white stones from the board as input features
[Silver et al. 2017b] 11
AG0: Differences Compared to AlphaGo {Fan, Lee, Master}
AlphaGo {Fan, Lee, Master} × AlphaGo Zero:
■ supervised learning from human expert positions × from
scratch by self-play reinforcement learning (“tabula rasa”)
■ additional (auxialiary) input features × only the black and
white stones from the board as input features
■ separate policy and value networks × single neural network
[Silver et al. 2017b] 11
AG0: Differences Compared to AlphaGo {Fan, Lee, Master}
AlphaGo {Fan, Lee, Master} × AlphaGo Zero:
■ supervised learning from human expert positions × from
scratch by self-play reinforcement learning (“tabula rasa”)
■ additional (auxialiary) input features × only the black and
white stones from the board as input features
■ separate policy and value networks × single neural network
■ tree search using also Monte Carlo rollouts × simpler tree
search using only the single neural network to both evaluate
positions and sample moves
[Silver et al. 2017b] 11
AG0: Differences Compared to AlphaGo {Fan, Lee, Master}
AlphaGo {Fan, Lee, Master} × AlphaGo Zero:
■ supervised learning from human expert positions × from
scratch by self-play reinforcement learning (“tabula rasa”)
■ additional (auxialiary) input features × only the black and
white stones from the board as input features
■ separate policy and value networks × single neural network
■ tree search using also Monte Carlo rollouts × simpler tree
search using only the single neural network to both evaluate
positions and sample moves
■ (AlphaGo Lee) distributed machines + 48 tensor processing
units (TPUs) × single machines + 4 TPUs
[Silver et al. 2017b] 11
AG0: Differences Compared to AlphaGo {Fan, Lee, Master}
AlphaGo {Fan, Lee, Master} × AlphaGo Zero:
■ supervised learning from human expert positions × from
scratch by self-play reinforcement learning (“tabula rasa”)
■ additional (auxialiary) input features × only the black and
white stones from the board as input features
■ separate policy and value networks × single neural network
■ tree search using also Monte Carlo rollouts × simpler tree
search using only the single neural network to both evaluate
positions and sample moves
■ (AlphaGo Lee) distributed machines + 48 tensor processing
units (TPUs) × single machines + 4 TPUs
■ (AlphaGo Lee) several months of training time × 72 h of
training time (outperforming AlphaGo Lee after 36 h)
[Silver et al. 2017b] 11
AG0 achieves this via
[Silver et al. 2017b] 12
AG0 achieves this via
■ a new reinforcement learning algorithm
[Silver et al. 2017b] 12
AG0 achieves this via
■ a new reinforcement learning algorithm
■ with lookahead search inside the training loop
[Silver et al. 2017b] 12
AG0: Self-Play Reinforcement Learning
[Silver et al. 2017b] 13
AG0: Self-Play Reinforcement Learning
[Silver et al. 2017b] 13
AG0: Self-Play Reinforcement Learning – Neural Network
deep neural network fθ with parameters θ:
[Silver et al. 2017b] 14
AG0: Self-Play Reinforcement Learning – Neural Network
deep neural network fθ with parameters θ:
■ input: raw board representation s
[Silver et al. 2017b] 14
AG0: Self-Play Reinforcement Learning – Neural Network
deep neural network fθ with parameters θ:
■ input: raw board representation s
■ output:
[Silver et al. 2017b] 14
AG0: Self-Play Reinforcement Learning – Neural Network
deep neural network fθ with parameters θ:
■ input: raw board representation s
■ output:
■ move probabilities p
[Silver et al. 2017b] 14
AG0: Self-Play Reinforcement Learning – Neural Network
deep neural network fθ with parameters θ:
■ input: raw board representation s
■ output:
■ move probabilities p
■ value v of the board position
[Silver et al. 2017b] 14
AG0: Self-Play Reinforcement Learning – Neural Network
deep neural network fθ with parameters θ:
■ input: raw board representation s
■ output:
■ move probabilities p
■ value v of the board position
■ fθ(s) = (p, v)
[Silver et al. 2017b] 14
AG0: Self-Play Reinforcement Learning – Neural Network
deep neural network fθ with parameters θ:
■ input: raw board representation s
■ output:
■ move probabilities p
■ value v of the board position
■ fθ(s) = (p, v)
■ specifics:
[Silver et al. 2017b] 14
AG0: Self-Play Reinforcement Learning – Neural Network
deep neural network fθ with parameters θ:
■ input: raw board representation s
■ output:
■ move probabilities p
■ value v of the board position
■ fθ(s) = (p, v)
■ specifics:
■ (20 or 40) residual blocks (of convolutional layers)
[Silver et al. 2017b] 14
AG0: Self-Play Reinforcement Learning – Neural Network
deep neural network fθ with parameters θ:
■ input: raw board representation s
■ output:
■ move probabilities p
■ value v of the board position
■ fθ(s) = (p, v)
■ specifics:
■ (20 or 40) residual blocks (of convolutional layers)
■ batch normalization
[Silver et al. 2017b] 14
AG0: Self-Play Reinforcement Learning – Neural Network
deep neural network fθ with parameters θ:
■ input: raw board representation s
■ output:
■ move probabilities p
■ value v of the board position
■ fθ(s) = (p, v)
■ specifics:
■ (20 or 40) residual blocks (of convolutional layers)
■ batch normalization
■ rectifier non-linearities
[Silver et al. 2017b] 14
AG0: Comparison of Various Neural Network Architectures
[Silver et al. 2017b] 15
AG0: Self-Play Reinforcement Learning – Steps
0. random weights θ0
[Silver et al. 2017b] 16
AG0: Self-Play Reinforcement Learning – Steps
0. random weights θ0
1. at each iteration i > 0, self-play games are generated:
[Silver et al. 2017b] 16
AG0: Self-Play Reinforcement Learning – Steps
0. random weights θ0
1. at each iteration i > 0, self-play games are generated:
i. MCTS samples search probabilities πt based on the neural
network from the previous iteration fθi−1 :
πt = αθi−1
(st)
for each time-step t = 1, 2, . . . , T
[Silver et al. 2017b] 16
AG0: Self-Play Reinforcement Learning – Steps
0. random weights θ0
1. at each iteration i > 0, self-play games are generated:
i. MCTS samples search probabilities πt based on the neural
network from the previous iteration fθi−1 :
πt = αθi−1
(st)
for each time-step t = 1, 2, . . . , T
ii. move is sampled from πt
[Silver et al. 2017b] 16
AG0: Self-Play Reinforcement Learning – Steps
0. random weights θ0
1. at each iteration i > 0, self-play games are generated:
i. MCTS samples search probabilities πt based on the neural
network from the previous iteration fθi−1 :
πt = αθi−1
(st)
for each time-step t = 1, 2, . . . , T
ii. move is sampled from πt
iii. data (st, πt, zt) for each t are stored for later training
[Silver et al. 2017b] 16
AG0: Self-Play Reinforcement Learning – Steps
0. random weights θ0
1. at each iteration i > 0, self-play games are generated:
i. MCTS samples search probabilities πt based on the neural
network from the previous iteration fθi−1 :
πt = αθi−1
(st)
for each time-step t = 1, 2, . . . , T
ii. move is sampled from πt
iii. data (st, πt, zt) for each t are stored for later training
iv. new neural network fθi is trained in order to minimize the loss
l = (z − v)2
− π⊤
log p + c||θ||2
[Silver et al. 2017b] 16
AG0: Self-Play Reinforcement Learning – Steps
0. random weights θ0
1. at each iteration i > 0, self-play games are generated:
i. MCTS samples search probabilities πt based on the neural
network from the previous iteration fθi−1 :
πt = αθi−1
(st)
for each time-step t = 1, 2, . . . , T
ii. move is sampled from πt
iii. data (st, πt, zt) for each t are stored for later training
iv. new neural network fθi is trained in order to minimize the loss
l = (z − v)2
− π⊤
log p + c||θ||2
[Silver et al. 2017b] 16
AG0: Self-Play Reinforcement Learning – Steps
0. random weights θ0
1. at each iteration i > 0, self-play games are generated:
i. MCTS samples search probabilities πt based on the neural
network from the previous iteration fθi−1 :
πt = αθi−1
(st)
for each time-step t = 1, 2, . . . , T
ii. move is sampled from πt
iii. data (st, πt, zt) for each t are stored for later training
iv. new neural network fθi is trained in order to minimize the loss
l = (z − v)2
− π⊤
log p + c||θ||2
Loss l makes (p, v) = fθ(s) more closely match the improved search
probabilities and self-play winner (π, z).
[Silver et al. 2017b] 16
AG0: Monte Carlo Tree Search (1/2)
Monte Carlo Tree Search (MCTS) in AG0:
[Silver et al. 2017b] 17
AG0: Monte Carlo Tree Search (1/2)
Monte Carlo Tree Search (MCTS) in AG0:
[Silver et al. 2017b] 17
AG0: Monte Carlo Tree Search (2/2)
[Silver et al. 2017b] 18
AG0: Self-Play Reinforcement Learning – Review
[Silver et al. 2017b] 19
AG0: Elo Rating over Training Time (RL vs. SL)
[Silver et al. 2017b] 20
AG0: Elo Rating over Training Time (AG0 with 40 blocks)
[Silver et al. 2017b] 21
AG0: Tournament between AI Go Programs
[Silver et al. 2017b] 22
AG0: Discovered Joseki (Corner Sequences)
[Silver et al. 2017b] 23
AG0: Discovered Joseki (Corner Sequences)
a five human joseki
[Silver et al. 2017b] 23
AG0: Discovered Joseki (Corner Sequences)
a five human joseki
b five novel joseki variants eventually preferred by AG0
[Silver et al. 2017b] 23
AG0: Discovered Playing Styles
[Silver et al. 2017b] 24
AG0: Discovered Playing Styles
at 3 h greedy capture of stones
[Silver et al. 2017b] 24
AG0: Discovered Playing Styles
at 3 h greedy capture of stones
at 19 h the fundamentals of Go concepts (life-and-death,
influence, territory...)
[Silver et al. 2017b] 24
AG0: Discovered Playing Styles
at 3 h greedy capture of stones
at 19 h the fundamentals of Go concepts (life-and-death,
influence, territory...)
at 70 h remarkably balanced game (multiple battles,
complicated ko fight, a half-point win for white...)[Silver et al. 2017b] 24
AlphaZero
24
To watch such a strong programme like
Stockfish, against whom most top players
would be happy to win even one game out
of a hundred, being completely taken apart
is certainly definitive.
Viswanathan Anand
24
To watch such a strong programme like
Stockfish, against whom most top players
would be happy to win even one game out
of a hundred, being completely taken apart
is certainly definitive.
Viswanathan Anand
It’s like chess from another dimension.
Demis Hassabis
24
AlphaZero: Differences Compared to AlphaGo Zero
AlphaGo Zero × AlphaZero:
[Silver et al. 2017a] 25
AlphaZero: Differences Compared to AlphaGo Zero
AlphaGo Zero × AlphaZero:
■ binary outcome (win / loss) × expected outcome (including
draws or potentially other outcomes)
[Silver et al. 2017a] 25
AlphaZero: Differences Compared to AlphaGo Zero
AlphaGo Zero × AlphaZero:
■ binary outcome (win / loss) × expected outcome (including
draws or potentially other outcomes)
■ board positions transformed before passing to neural networks
(by randomly selected rotation or reflection) × no data
augmentation
[Silver et al. 2017a] 25
AlphaZero: Differences Compared to AlphaGo Zero
AlphaGo Zero × AlphaZero:
■ binary outcome (win / loss) × expected outcome (including
draws or potentially other outcomes)
■ board positions transformed before passing to neural networks
(by randomly selected rotation or reflection) × no data
augmentation
■ games generated by the best player from previous iterations
(margin of 55 %) × continual update using the latest
parameters (without the evaluation and selection steps)
[Silver et al. 2017a] 25
AlphaZero: Differences Compared to AlphaGo Zero
AlphaGo Zero × AlphaZero:
■ binary outcome (win / loss) × expected outcome (including
draws or potentially other outcomes)
■ board positions transformed before passing to neural networks
(by randomly selected rotation or reflection) × no data
augmentation
■ games generated by the best player from previous iterations
(margin of 55 %) × continual update using the latest
parameters (without the evaluation and selection steps)
■ hyper-parameters tuned by Bayesian optimisation × reused
the same hyper-parameters without game-specific tuning
[Silver et al. 2017a] 25
AlphaZero: Elo Rating over Training Time
[Silver et al. 2017a] 26
AlphaZero: Elo Rating over Training Time
[Silver et al. 2017a] 26
AlphaZero: Elo Rating over Training Time
[Silver et al. 2017a] 26
AlphaZero: Elo Rating over Training Time
[Silver et al. 2017a] 26
AlphaZero: Tournament between AI Programs
(Values are given from AlphaZero’s point of view.)
[Silver et al. 2017a] 27
AlphaZero: Openings Discovered by the Self-Play (1/2)
[Silver et al. 2017a] 28
AlphaZero: Openings Discovered by the Self-Play (2/2)
[Silver et al. 2017a] 29
Conclusion
Difficulties of Go
■ challenging decision-making
[Silver et al. 2017a] 30
Difficulties of Go
■ challenging decision-making
■ intractable search space
[Silver et al. 2017a] 30
Difficulties of Go
■ challenging decision-making
■ intractable search space
■ complex optimal solution
It appears infeasible to directly approximate using a policy or value function!
[Silver et al. 2017a] 30
AlphaZero: Summary
■ Monte Carlo tree search
[Silver et al. 2017a] 31
AlphaZero: Summary
■ Monte Carlo tree search
■ effective move selection and position evaluation
[Silver et al. 2017a] 31
AlphaZero: Summary
■ Monte Carlo tree search
■ effective move selection and position evaluation
■ through deep convolutional neural networks
[Silver et al. 2017a] 31
AlphaZero: Summary
■ Monte Carlo tree search
■ effective move selection and position evaluation
■ through deep convolutional neural networks
■ trained by new self-play reinforcement learning algorithm
[Silver et al. 2017a] 31
AlphaZero: Summary
■ Monte Carlo tree search
■ effective move selection and position evaluation
■ through deep convolutional neural networks
■ trained by new self-play reinforcement learning algorithm
■ new search algorithm combining
[Silver et al. 2017a] 31
AlphaZero: Summary
■ Monte Carlo tree search
■ effective move selection and position evaluation
■ through deep convolutional neural networks
■ trained by new self-play reinforcement learning algorithm
■ new search algorithm combining
■ evaluation by a single neural network
[Silver et al. 2017a] 31
AlphaZero: Summary
■ Monte Carlo tree search
■ effective move selection and position evaluation
■ through deep convolutional neural networks
■ trained by new self-play reinforcement learning algorithm
■ new search algorithm combining
■ evaluation by a single neural network
■ Monte Carlo tree search
[Silver et al. 2017a] 31
AlphaZero: Summary
■ Monte Carlo tree search
■ effective move selection and position evaluation
■ through deep convolutional neural networks
■ trained by new self-play reinforcement learning algorithm
■ new search algorithm combining
■ evaluation by a single neural network
■ Monte Carlo tree search
■ more efficient when compared to previous AlphaGo versions
[Silver et al. 2017a] 31
AlphaZero: Summary
■ Monte Carlo tree search
■ effective move selection and position evaluation
■ through deep convolutional neural networks
■ trained by new self-play reinforcement learning algorithm
■ new search algorithm combining
■ evaluation by a single neural network
■ Monte Carlo tree search
■ more efficient when compared to previous AlphaGo versions
■ single machine
[Silver et al. 2017a] 31
AlphaZero: Summary
■ Monte Carlo tree search
■ effective move selection and position evaluation
■ through deep convolutional neural networks
■ trained by new self-play reinforcement learning algorithm
■ new search algorithm combining
■ evaluation by a single neural network
■ Monte Carlo tree search
■ more efficient when compared to previous AlphaGo versions
■ single machine
■ 4 TPUs
[Silver et al. 2017a] 31
AlphaZero: Summary
■ Monte Carlo tree search
■ effective move selection and position evaluation
■ through deep convolutional neural networks
■ trained by new self-play reinforcement learning algorithm
■ new search algorithm combining
■ evaluation by a single neural network
■ Monte Carlo tree search
■ more efficient when compared to previous AlphaGo versions
■ single machine
■ 4 TPUs
■ hours rather than months of training time
[Silver et al. 2017a] 31
Novel approach
[Silver et al. 2017a] 32
Novel approach
During the matches (against Stockfish and Elmo), AlphaZero
evaluated thousands of times fewer positions than Deep Blue
against Kasparov.
[Silver et al. 2017a] 32
Novel approach
During the matches (against Stockfish and Elmo), AlphaZero
evaluated thousands of times fewer positions than Deep Blue
against Kasparov.
It compensated this by:
■ selecting those positions more intelligently (the neural
network)
[Silver et al. 2017a] 32
Novel approach
During the matches (against Stockfish and Elmo), AlphaZero
evaluated thousands of times fewer positions than Deep Blue
against Kasparov.
It compensated this by:
■ selecting those positions more intelligently (the neural
network)
■ evaluating them more precisely (the same neural network)
[Silver et al. 2017a] 32
Novel approach
During the matches (against Stockfish and Elmo), AlphaZero
evaluated thousands of times fewer positions than Deep Blue
against Kasparov.
It compensated this by:
■ selecting those positions more intelligently (the neural
network)
■ evaluating them more precisely (the same neural network)
[Silver et al. 2017a] 32
Novel approach
During the matches (against Stockfish and Elmo), AlphaZero
evaluated thousands of times fewer positions than Deep Blue
against Kasparov.
It compensated this by:
■ selecting those positions more intelligently (the neural
network)
■ evaluating them more precisely (the same neural network)
Deep Blue relied on a handcrafted evaluation function.
[Silver et al. 2017a] 32
Novel approach
During the matches (against Stockfish and Elmo), AlphaZero
evaluated thousands of times fewer positions than Deep Blue
against Kasparov.
It compensated this by:
■ selecting those positions more intelligently (the neural
network)
■ evaluating them more precisely (the same neural network)
Deep Blue relied on a handcrafted evaluation function.
AlphaZero was trained tabula rasa from self-play. It used
general-purpose learning.
[Silver et al. 2017a] 32
Novel approach
During the matches (against Stockfish and Elmo), AlphaZero
evaluated thousands of times fewer positions than Deep Blue
against Kasparov.
It compensated this by:
■ selecting those positions more intelligently (the neural
network)
■ evaluating them more precisely (the same neural network)
Deep Blue relied on a handcrafted evaluation function.
AlphaZero was trained tabula rasa from self-play. It used
general-purpose learning.
This approach is not specific to the game of Go. The algorithm
can be used for much wider class of AI problems!
[Silver et al. 2017a] 32
Thank you!
Questions?
32
Backup Slides
Input Features of AlphaZero’s Neural Networks
[Silver et al. 2017a]
AlphaZero: Statistics of Training
[Silver et al. 2017a]
AlphaZero: Evaluation Speeds
[Silver et al. 2017a]
Scalability When Compared to Other Programs
[Silver et al. 2017a]
Further Reading I
AlphaGo:
■ Google Research Blog
http://googleresearch.blogspot.cz/2016/01/alphago-mastering-ancient-game-of-go.html
■ an article in Nature
http://www.nature.com/news/google-ai-algorithm-masters-ancient-game-of-go-1.19234
■ a reddit article claiming that AlphaGo is even stronger than it appears to be:
“AlphaGo would rather win by less points, but with higher probability.”
https://www.reddit.com/r/baduk/comments/49y17z/the_true_strength_of_alphago/
■ a video of how AlphaGo works (put in layman’s terms) https://youtu.be/qWcfiPi9gUU
Articles by Google DeepMind:
■ Atari player: a DeepRL system which combines Deep Neural Networks with Reinforcement Learning (Mnih
et al. 2015)
■ Neural Turing Machines (Graves, Wayne, and Danihelka 2014)
Artificial Intelligence:
■ Artificial Intelligence course at MIT
http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/
6-034-artificial-intelligence-fall-2010/index.htm
Further Reading II
■ Introduction to Artificial Intelligence at Udacity
https://www.udacity.com/course/intro-to-artificial-intelligence--cs271
■ General Game Playing course https://www.coursera.org/course/ggp
■ Singularity http://waitbutwhy.com/2015/01/artificial-intelligence-revolution-1.html + Part 2
■ The Singularity Is Near (Kurzweil 2005)
Combinatorial Game Theory (founded by John H. Conway to study endgames in Go):
■ Combinatorial Game Theory course https://www.coursera.org/learn/combinatorial-game-theory
■ On Numbers and Games (Conway 1976)
■ Computer Go as a sum of local games: an application of combinatorial game theory (Müller 1995)
Chess:
■ Deep Blue beats G. Kasparov in 1997 https://youtu.be/NJarxpYyoFI
Machine Learning:
■ Machine Learning course
https://youtu.be/hPKJBXkyTK://www.coursera.org/learn/machine-learning/
■ Reinforcement Learning http://reinforcementlearning.ai-depot.com/
■ Deep Learning (LeCun, Bengio, and Hinton 2015)
Further Reading III
■ Deep Learning course https://www.udacity.com/course/deep-learning--ud730
■ Two Minute Papers https://www.youtube.com/user/keeroyz
■ Applications of Deep Learning https://youtu.be/hPKJBXkyTKM
Neuroscience:
■ http://www.brainfacts.org/
References I
Conway, John Horton (1976). “On Numbers and Games”. In: London Mathematical Society Monographs 6.
Graves, Alex, Greg Wayne, and Ivo Danihelka (2014). “Neural Turing Machines”. In: arXiv preprint
arXiv:1410.5401.
Kurzweil, Ray (2005). The Singularity is Near: When Humans Transcend Biology. Penguin.
LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton (2015). “Deep Learning”. In: Nature 521.7553, pp. 436–444.
Mnih, Volodymyr et al. (2015). “Human-Level Control through Deep Reinforcement Learning”. In: Nature
518.7540, pp. 529–533. url:
https://storage.googleapis.com/deepmind-data/assets/papers/DeepMindNature14236Paper.pdf.
Müller, Martin (1995). “Computer Go as a Sum of Local Games: an Application of Combinatorial Game Theory”.
PhD thesis. TU Graz.
Silver, David et al. (2016). “Mastering the Game of Go with Deep Neural Networks and Tree Search”. In: Nature
529.7587, pp. 484–489.
Silver, David et al. (2017a). “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning
Algorithm”. In: arXiv preprint arXiv:1712.01815.
Silver, David et al. (2017b). “Mastering the Game of Go without Human Knowledge”. In: Nature 550.7676,
pp. 354–359.

Contenu connexe

Tendances

Tendances (20)

Xgboost
XgboostXgboost
Xgboost
 
GANs and Applications
GANs and ApplicationsGANs and Applications
GANs and Applications
 
A brief overview of Reinforcement Learning applied to games
A brief overview of Reinforcement Learning applied to gamesA brief overview of Reinforcement Learning applied to games
A brief overview of Reinforcement Learning applied to games
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial Networks
 
GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and Applications
 
Decision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmDecision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning Algorithm
 
InfoGAN and Generative Adversarial Networks
InfoGAN and Generative Adversarial NetworksInfoGAN and Generative Adversarial Networks
InfoGAN and Generative Adversarial Networks
 
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAIGenerative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
 
Markov decision process
Markov decision processMarkov decision process
Markov decision process
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
How DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoHow DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of Go
 
Modern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentationModern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentation
 
Understanding Black Box Models with Shapley Values
Understanding Black Box Models with Shapley ValuesUnderstanding Black Box Models with Shapley Values
Understanding Black Box Models with Shapley Values
 
Unified Approach to Interpret Machine Learning Model: SHAP + LIME
Unified Approach to Interpret Machine Learning Model: SHAP + LIMEUnified Approach to Interpret Machine Learning Model: SHAP + LIME
Unified Approach to Interpret Machine Learning Model: SHAP + LIME
 
A (Very) Gentle Introduction to Generative Adversarial Networks (a.k.a GANs)
 A (Very) Gentle Introduction to Generative Adversarial Networks (a.k.a GANs) A (Very) Gentle Introduction to Generative Adversarial Networks (a.k.a GANs)
A (Very) Gentle Introduction to Generative Adversarial Networks (a.k.a GANs)
 
Feed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentFeed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descent
 
Adversarial examples in deep learning (Gregory Chatel)
Adversarial examples in deep learning (Gregory Chatel)Adversarial examples in deep learning (Gregory Chatel)
Adversarial examples in deep learning (Gregory Chatel)
 
Machine Learning Performance Evaluation: Tips and Pitfalls - Jose Hernandez O...
Machine Learning Performance Evaluation: Tips and Pitfalls - Jose Hernandez O...Machine Learning Performance Evaluation: Tips and Pitfalls - Jose Hernandez O...
Machine Learning Performance Evaluation: Tips and Pitfalls - Jose Hernandez O...
 
Generative Adversarial Networks (GANs)
Generative Adversarial Networks (GANs)Generative Adversarial Networks (GANs)
Generative Adversarial Networks (GANs)
 
20191019 sinkhorn
20191019 sinkhorn20191019 sinkhorn
20191019 sinkhorn
 

Plus de Karel Ha

Schrodinger poster 2020
Schrodinger poster 2020Schrodinger poster 2020
Schrodinger poster 2020
Karel Ha
 
transcript-bachelor-studies-Karel-Ha
transcript-bachelor-studies-Karel-Hatranscript-bachelor-studies-Karel-Ha
transcript-bachelor-studies-Karel-Ha
Karel Ha
 

Plus de Karel Ha (18)

transcript-master-studies-Karel-Ha
transcript-master-studies-Karel-Hatranscript-master-studies-Karel-Ha
transcript-master-studies-Karel-Ha
 
Schrodinger poster 2020
Schrodinger poster 2020Schrodinger poster 2020
Schrodinger poster 2020
 
CapsuleGAN: Generative Adversarial Capsule Network
CapsuleGAN: Generative Adversarial Capsule NetworkCapsuleGAN: Generative Adversarial Capsule Network
CapsuleGAN: Generative Adversarial Capsule Network
 
Dynamic Routing Between Capsules
Dynamic Routing Between CapsulesDynamic Routing Between Capsules
Dynamic Routing Between Capsules
 
AI Supremacy in Games: Deep Blue, Watson, Cepheus, AlphaGo, DeepStack and Ten...
AI Supremacy in Games: Deep Blue, Watson, Cepheus, AlphaGo, DeepStack and Ten...AI Supremacy in Games: Deep Blue, Watson, Cepheus, AlphaGo, DeepStack and Ten...
AI Supremacy in Games: Deep Blue, Watson, Cepheus, AlphaGo, DeepStack and Ten...
 
Solving Endgames in Large Imperfect-Information Games such as Poker
Solving Endgames in Large Imperfect-Information Games such as PokerSolving Endgames in Large Imperfect-Information Games such as Poker
Solving Endgames in Large Imperfect-Information Games such as Poker
 
transcript-bachelor-studies-Karel-Ha
transcript-bachelor-studies-Karel-Hatranscript-bachelor-studies-Karel-Ha
transcript-bachelor-studies-Karel-Ha
 
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree SearchAlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search
 
Mastering the game of Go with deep neural networks and tree search: Presentation
Mastering the game of Go with deep neural networks and tree search: PresentationMastering the game of Go with deep neural networks and tree search: Presentation
Mastering the game of Go with deep neural networks and tree search: Presentation
 
Real-time applications on IntelXeon/Phi
Real-time applications on IntelXeon/PhiReal-time applications on IntelXeon/Phi
Real-time applications on IntelXeon/Phi
 
HTCC poster for CERN Openlab opendays 2015
HTCC poster for CERN Openlab opendays 2015HTCC poster for CERN Openlab opendays 2015
HTCC poster for CERN Openlab opendays 2015
 
Separation Axioms
Separation AxiomsSeparation Axioms
Separation Axioms
 
Oddělovací axiomy v bezbodové topologii
Oddělovací axiomy v bezbodové topologiiOddělovací axiomy v bezbodové topologii
Oddělovací axiomy v bezbodové topologii
 
Algorithmic Game Theory
Algorithmic Game TheoryAlgorithmic Game Theory
Algorithmic Game Theory
 
Summer Student Programme
Summer Student ProgrammeSummer Student Programme
Summer Student Programme
 
Summer @CERN
Summer @CERNSummer @CERN
Summer @CERN
 
Tape Storage and CRC Protection
Tape Storage and CRC ProtectionTape Storage and CRC Protection
Tape Storage and CRC Protection
 
Question Answering with Subgraph Embeddings
Question Answering with Subgraph EmbeddingsQuestion Answering with Subgraph Embeddings
Question Answering with Subgraph Embeddings
 

Dernier

Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 

Dernier (20)

Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
American Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptxAmerican Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptx
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 

AlphaZero

  • 1. AlphaZero Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm Karel Ha article by Google DeepMind AI Seminar, 19th December 2017
  • 2. Outline The Alpha* Timeline AlphaGo AlphaGo Zero (AG0) AlphaZero Conclusion 1
  • 5. Fan Hui ■ professional 2 dan https://en.wikipedia.org/wiki/Fan_Hui 2
  • 6. Fan Hui ■ professional 2 dan ■ European Go Champion in 2013, 2014 and 2015 https://en.wikipedia.org/wiki/Fan_Hui 2
  • 7. Fan Hui ■ professional 2 dan ■ European Go Champion in 2013, 2014 and 2015 ■ European Professional Go Champion in 2016 https://en.wikipedia.org/wiki/Fan_Hui 2
  • 8. AlphaGo (AlphaGo Fan) vs. Fan Hui 3
  • 9. AlphaGo (AlphaGo Fan) vs. Fan Hui AlphaGo won 5:0 in a formal match on October 2015. 3
  • 10. AlphaGo (AlphaGo Fan) vs. Fan Hui AlphaGo won 5:0 in a formal match on October 2015. [AlphaGo] is very strong and stable, it seems like a wall. ... I know AlphaGo is a computer, but if no one told me, maybe I would think the player was a little strange, but a very strong player, a real person. Fan Hui 3
  • 11. Lee Sedol “The Strong Stone” https://en.wikipedia.org/wiki/Lee_Sedol 4
  • 12. Lee Sedol “The Strong Stone” ■ professional 9 dan https://en.wikipedia.org/wiki/Lee_Sedol 4
  • 13. Lee Sedol “The Strong Stone” ■ professional 9 dan ■ the 2nd in international titles https://en.wikipedia.org/wiki/Lee_Sedol 4
  • 14. Lee Sedol “The Strong Stone” ■ professional 9 dan ■ the 2nd in international titles ■ the 5th youngest to become a professional Go player in South Korean history https://en.wikipedia.org/wiki/Lee_Sedol 4
  • 15. Lee Sedol “The Strong Stone” ■ professional 9 dan ■ the 2nd in international titles ■ the 5th youngest to become a professional Go player in South Korean history ■ Lee Sedol would win 97 out of 100 games against Fan Hui. https://en.wikipedia.org/wiki/Lee_Sedol 4
  • 16. Lee Sedol “The Strong Stone” ■ professional 9 dan ■ the 2nd in international titles ■ the 5th youngest to become a professional Go player in South Korean history ■ Lee Sedol would win 97 out of 100 games against Fan Hui. ■ “Roger Federer” of Go https://en.wikipedia.org/wiki/Lee_Sedol 4
  • 17. I heard Google DeepMind’s AI is surprisingly strong and getting stronger, but I am confident that I can win, at least this time. Lee Sedol 4
  • 18. I heard Google DeepMind’s AI is surprisingly strong and getting stronger, but I am confident that I can win, at least this time. Lee Sedol ...even beating AlphaGo by 4:1 may allow the Google DeepMind team to claim its de facto victory and the defeat of him [Lee Sedol], or even humankind. interview in JTBC Newsroom 4
  • 19. I heard Google DeepMind’s AI is surprisingly strong and getting stronger, but I am confident that I can win, at least this time. Lee Sedol ...even beating AlphaGo by 4:1 may allow the Google DeepMind team to claim its de facto victory and the defeat of him [Lee Sedol], or even humankind. interview in JTBC Newsroom 4
  • 20. AlphaGo (AlphaGo Lee) vs. Lee Sedol https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol 5
  • 21. AlphaGo (AlphaGo Lee) vs. Lee Sedol In March 2016 AlphaGo won 4:1 against the legendary Lee Sedol. https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol 5
  • 22. AlphaGo (AlphaGo Lee) vs. Lee Sedol In March 2016 AlphaGo won 4:1 against the legendary Lee Sedol. AlphaGo won all but the 4th game; all games were won by resignation. https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol 5
  • 23. AlphaGo (AlphaGo Lee) vs. Lee Sedol In March 2016 AlphaGo won 4:1 against the legendary Lee Sedol. AlphaGo won all but the 4th game; all games were won by resignation. https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol 5
  • 25. AlphaGo Master In January 2017, DeepMind revealed that AlphaGo had played a series of unofficial online games against some of the strongest professional Go players under the pseudonyms “Master” and ”Magister”. https://deepmind.com/research/alphago/match-archive/master/ 6
  • 26. AlphaGo Master In January 2017, DeepMind revealed that AlphaGo had played a series of unofficial online games against some of the strongest professional Go players under the pseudonyms “Master” and ”Magister”. This AlphaGo was an improved version of the AlphaGo that played Lee Sedol in 2016. https://deepmind.com/research/alphago/match-archive/master/ 6
  • 27. AlphaGo Master In January 2017, DeepMind revealed that AlphaGo had played a series of unofficial online games against some of the strongest professional Go players under the pseudonyms “Master” and ”Magister”. This AlphaGo was an improved version of the AlphaGo that played Lee Sedol in 2016. Over one week, AlphaGo played 60 online fast time-control games. https://deepmind.com/research/alphago/match-archive/master/ 6
  • 28. AlphaGo Master In January 2017, DeepMind revealed that AlphaGo had played a series of unofficial online games against some of the strongest professional Go players under the pseudonyms “Master” and ”Magister”. This AlphaGo was an improved version of the AlphaGo that played Lee Sedol in 2016. Over one week, AlphaGo played 60 online fast time-control games. AlphaGo won this series of games 60:0. https://deepmind.com/research/alphago/match-archive/master/ 6
  • 31. ■ 23 May - 27 May 2017 in Wuzhen, China https://events.google.com/alphago2017/ 7
  • 32. ■ 23 May - 27 May 2017 in Wuzhen, China ■ Team Go vs. AlphaGo 0:1 https://events.google.com/alphago2017/ 7
  • 33. ■ 23 May - 27 May 2017 in Wuzhen, China ■ Team Go vs. AlphaGo 0:1 ■ AlphaGo vs. world champion Ke Jie 3:0 https://events.google.com/alphago2017/ 7
  • 34. 7
  • 35. defeated AlphaGo Lee by 100 games to 0 7
  • 36. AI system that mastered chess, Shogi and Go to “superhuman levels” within a handful of hours AlphaZero 8
  • 37. AI system that mastered chess, Shogi and Go to “superhuman levels” within a handful of hours AlphaZero defeated AlphaGo Zero (version with 20 blocks trained for 3 days) by 60 games to 40 8
  • 38. 8
  • 40. 1 AlphaGo Fan 2 AlphaGo Lee 8
  • 41. 1 AlphaGo Fan 2 AlphaGo Lee 3 AlphaGo Master 8
  • 42. 1 AlphaGo Fan 2 AlphaGo Lee 3 AlphaGo Master 4 AlphaGo Zero 8
  • 43. 1 AlphaGo Fan 2 AlphaGo Lee 3 AlphaGo Master 4 AlphaGo Zero 5 AlphaZero 8
  • 45. Policy and Value Networks [Silver et al. 2016] 9
  • 46. Training the (Deep Convolutional) Neural Networks [Silver et al. 2016] 10
  • 48. AG0: Differences Compared to AlphaGo {Fan, Lee, Master} AlphaGo {Fan, Lee, Master} × AlphaGo Zero: [Silver et al. 2017b] 11
  • 49. AG0: Differences Compared to AlphaGo {Fan, Lee, Master} AlphaGo {Fan, Lee, Master} × AlphaGo Zero: ■ supervised learning from human expert positions × from scratch by self-play reinforcement learning (“tabula rasa”) [Silver et al. 2017b] 11
  • 50. AG0: Differences Compared to AlphaGo {Fan, Lee, Master} AlphaGo {Fan, Lee, Master} × AlphaGo Zero: ■ supervised learning from human expert positions × from scratch by self-play reinforcement learning (“tabula rasa”) ■ additional (auxialiary) input features × only the black and white stones from the board as input features [Silver et al. 2017b] 11
  • 51. AG0: Differences Compared to AlphaGo {Fan, Lee, Master} AlphaGo {Fan, Lee, Master} × AlphaGo Zero: ■ supervised learning from human expert positions × from scratch by self-play reinforcement learning (“tabula rasa”) ■ additional (auxialiary) input features × only the black and white stones from the board as input features ■ separate policy and value networks × single neural network [Silver et al. 2017b] 11
  • 52. AG0: Differences Compared to AlphaGo {Fan, Lee, Master} AlphaGo {Fan, Lee, Master} × AlphaGo Zero: ■ supervised learning from human expert positions × from scratch by self-play reinforcement learning (“tabula rasa”) ■ additional (auxialiary) input features × only the black and white stones from the board as input features ■ separate policy and value networks × single neural network ■ tree search using also Monte Carlo rollouts × simpler tree search using only the single neural network to both evaluate positions and sample moves [Silver et al. 2017b] 11
  • 53. AG0: Differences Compared to AlphaGo {Fan, Lee, Master} AlphaGo {Fan, Lee, Master} × AlphaGo Zero: ■ supervised learning from human expert positions × from scratch by self-play reinforcement learning (“tabula rasa”) ■ additional (auxialiary) input features × only the black and white stones from the board as input features ■ separate policy and value networks × single neural network ■ tree search using also Monte Carlo rollouts × simpler tree search using only the single neural network to both evaluate positions and sample moves ■ (AlphaGo Lee) distributed machines + 48 tensor processing units (TPUs) × single machines + 4 TPUs [Silver et al. 2017b] 11
  • 54. AG0: Differences Compared to AlphaGo {Fan, Lee, Master} AlphaGo {Fan, Lee, Master} × AlphaGo Zero: ■ supervised learning from human expert positions × from scratch by self-play reinforcement learning (“tabula rasa”) ■ additional (auxialiary) input features × only the black and white stones from the board as input features ■ separate policy and value networks × single neural network ■ tree search using also Monte Carlo rollouts × simpler tree search using only the single neural network to both evaluate positions and sample moves ■ (AlphaGo Lee) distributed machines + 48 tensor processing units (TPUs) × single machines + 4 TPUs ■ (AlphaGo Lee) several months of training time × 72 h of training time (outperforming AlphaGo Lee after 36 h) [Silver et al. 2017b] 11
  • 55. AG0 achieves this via [Silver et al. 2017b] 12
  • 56. AG0 achieves this via ■ a new reinforcement learning algorithm [Silver et al. 2017b] 12
  • 57. AG0 achieves this via ■ a new reinforcement learning algorithm ■ with lookahead search inside the training loop [Silver et al. 2017b] 12
  • 58. AG0: Self-Play Reinforcement Learning [Silver et al. 2017b] 13
  • 59. AG0: Self-Play Reinforcement Learning [Silver et al. 2017b] 13
  • 60. AG0: Self-Play Reinforcement Learning – Neural Network deep neural network fθ with parameters θ: [Silver et al. 2017b] 14
  • 61. AG0: Self-Play Reinforcement Learning – Neural Network deep neural network fθ with parameters θ: ■ input: raw board representation s [Silver et al. 2017b] 14
  • 62. AG0: Self-Play Reinforcement Learning – Neural Network deep neural network fθ with parameters θ: ■ input: raw board representation s ■ output: [Silver et al. 2017b] 14
  • 63. AG0: Self-Play Reinforcement Learning – Neural Network deep neural network fθ with parameters θ: ■ input: raw board representation s ■ output: ■ move probabilities p [Silver et al. 2017b] 14
  • 64. AG0: Self-Play Reinforcement Learning – Neural Network deep neural network fθ with parameters θ: ■ input: raw board representation s ■ output: ■ move probabilities p ■ value v of the board position [Silver et al. 2017b] 14
  • 65. AG0: Self-Play Reinforcement Learning – Neural Network deep neural network fθ with parameters θ: ■ input: raw board representation s ■ output: ■ move probabilities p ■ value v of the board position ■ fθ(s) = (p, v) [Silver et al. 2017b] 14
  • 66. AG0: Self-Play Reinforcement Learning – Neural Network deep neural network fθ with parameters θ: ■ input: raw board representation s ■ output: ■ move probabilities p ■ value v of the board position ■ fθ(s) = (p, v) ■ specifics: [Silver et al. 2017b] 14
  • 67. AG0: Self-Play Reinforcement Learning – Neural Network deep neural network fθ with parameters θ: ■ input: raw board representation s ■ output: ■ move probabilities p ■ value v of the board position ■ fθ(s) = (p, v) ■ specifics: ■ (20 or 40) residual blocks (of convolutional layers) [Silver et al. 2017b] 14
  • 68. AG0: Self-Play Reinforcement Learning – Neural Network deep neural network fθ with parameters θ: ■ input: raw board representation s ■ output: ■ move probabilities p ■ value v of the board position ■ fθ(s) = (p, v) ■ specifics: ■ (20 or 40) residual blocks (of convolutional layers) ■ batch normalization [Silver et al. 2017b] 14
  • 69. AG0: Self-Play Reinforcement Learning – Neural Network deep neural network fθ with parameters θ: ■ input: raw board representation s ■ output: ■ move probabilities p ■ value v of the board position ■ fθ(s) = (p, v) ■ specifics: ■ (20 or 40) residual blocks (of convolutional layers) ■ batch normalization ■ rectifier non-linearities [Silver et al. 2017b] 14
  • 70. AG0: Comparison of Various Neural Network Architectures [Silver et al. 2017b] 15
  • 71. AG0: Self-Play Reinforcement Learning – Steps 0. random weights θ0 [Silver et al. 2017b] 16
  • 72. AG0: Self-Play Reinforcement Learning – Steps 0. random weights θ0 1. at each iteration i > 0, self-play games are generated: [Silver et al. 2017b] 16
  • 73. AG0: Self-Play Reinforcement Learning – Steps 0. random weights θ0 1. at each iteration i > 0, self-play games are generated: i. MCTS samples search probabilities πt based on the neural network from the previous iteration fθi−1 : πt = αθi−1 (st) for each time-step t = 1, 2, . . . , T [Silver et al. 2017b] 16
  • 74. AG0: Self-Play Reinforcement Learning – Steps 0. random weights θ0 1. at each iteration i > 0, self-play games are generated: i. MCTS samples search probabilities πt based on the neural network from the previous iteration fθi−1 : πt = αθi−1 (st) for each time-step t = 1, 2, . . . , T ii. move is sampled from πt [Silver et al. 2017b] 16
  • 75. AG0: Self-Play Reinforcement Learning – Steps 0. random weights θ0 1. at each iteration i > 0, self-play games are generated: i. MCTS samples search probabilities πt based on the neural network from the previous iteration fθi−1 : πt = αθi−1 (st) for each time-step t = 1, 2, . . . , T ii. move is sampled from πt iii. data (st, πt, zt) for each t are stored for later training [Silver et al. 2017b] 16
  • 76. AG0: Self-Play Reinforcement Learning – Steps 0. random weights θ0 1. at each iteration i > 0, self-play games are generated: i. MCTS samples search probabilities πt based on the neural network from the previous iteration fθi−1 : πt = αθi−1 (st) for each time-step t = 1, 2, . . . , T ii. move is sampled from πt iii. data (st, πt, zt) for each t are stored for later training iv. new neural network fθi is trained in order to minimize the loss l = (z − v)2 − π⊤ log p + c||θ||2 [Silver et al. 2017b] 16
  • 77. AG0: Self-Play Reinforcement Learning – Steps 0. random weights θ0 1. at each iteration i > 0, self-play games are generated: i. MCTS samples search probabilities πt based on the neural network from the previous iteration fθi−1 : πt = αθi−1 (st) for each time-step t = 1, 2, . . . , T ii. move is sampled from πt iii. data (st, πt, zt) for each t are stored for later training iv. new neural network fθi is trained in order to minimize the loss l = (z − v)2 − π⊤ log p + c||θ||2 [Silver et al. 2017b] 16
  • 78. AG0: Self-Play Reinforcement Learning – Steps 0. random weights θ0 1. at each iteration i > 0, self-play games are generated: i. MCTS samples search probabilities πt based on the neural network from the previous iteration fθi−1 : πt = αθi−1 (st) for each time-step t = 1, 2, . . . , T ii. move is sampled from πt iii. data (st, πt, zt) for each t are stored for later training iv. new neural network fθi is trained in order to minimize the loss l = (z − v)2 − π⊤ log p + c||θ||2 Loss l makes (p, v) = fθ(s) more closely match the improved search probabilities and self-play winner (π, z). [Silver et al. 2017b] 16
  • 79. AG0: Monte Carlo Tree Search (1/2) Monte Carlo Tree Search (MCTS) in AG0: [Silver et al. 2017b] 17
  • 80. AG0: Monte Carlo Tree Search (1/2) Monte Carlo Tree Search (MCTS) in AG0: [Silver et al. 2017b] 17
  • 81. AG0: Monte Carlo Tree Search (2/2) [Silver et al. 2017b] 18
  • 82. AG0: Self-Play Reinforcement Learning – Review [Silver et al. 2017b] 19
  • 83. AG0: Elo Rating over Training Time (RL vs. SL) [Silver et al. 2017b] 20
  • 84. AG0: Elo Rating over Training Time (AG0 with 40 blocks) [Silver et al. 2017b] 21
  • 85. AG0: Tournament between AI Go Programs [Silver et al. 2017b] 22
  • 86. AG0: Discovered Joseki (Corner Sequences) [Silver et al. 2017b] 23
  • 87. AG0: Discovered Joseki (Corner Sequences) a five human joseki [Silver et al. 2017b] 23
  • 88. AG0: Discovered Joseki (Corner Sequences) a five human joseki b five novel joseki variants eventually preferred by AG0 [Silver et al. 2017b] 23
  • 89. AG0: Discovered Playing Styles [Silver et al. 2017b] 24
  • 90. AG0: Discovered Playing Styles at 3 h greedy capture of stones [Silver et al. 2017b] 24
  • 91. AG0: Discovered Playing Styles at 3 h greedy capture of stones at 19 h the fundamentals of Go concepts (life-and-death, influence, territory...) [Silver et al. 2017b] 24
  • 92. AG0: Discovered Playing Styles at 3 h greedy capture of stones at 19 h the fundamentals of Go concepts (life-and-death, influence, territory...) at 70 h remarkably balanced game (multiple battles, complicated ko fight, a half-point win for white...)[Silver et al. 2017b] 24
  • 94. 24
  • 95. To watch such a strong programme like Stockfish, against whom most top players would be happy to win even one game out of a hundred, being completely taken apart is certainly definitive. Viswanathan Anand 24
  • 96. To watch such a strong programme like Stockfish, against whom most top players would be happy to win even one game out of a hundred, being completely taken apart is certainly definitive. Viswanathan Anand It’s like chess from another dimension. Demis Hassabis 24
  • 97. AlphaZero: Differences Compared to AlphaGo Zero AlphaGo Zero × AlphaZero: [Silver et al. 2017a] 25
  • 98. AlphaZero: Differences Compared to AlphaGo Zero AlphaGo Zero × AlphaZero: ■ binary outcome (win / loss) × expected outcome (including draws or potentially other outcomes) [Silver et al. 2017a] 25
  • 99. AlphaZero: Differences Compared to AlphaGo Zero AlphaGo Zero × AlphaZero: ■ binary outcome (win / loss) × expected outcome (including draws or potentially other outcomes) ■ board positions transformed before passing to neural networks (by randomly selected rotation or reflection) × no data augmentation [Silver et al. 2017a] 25
  • 100. AlphaZero: Differences Compared to AlphaGo Zero AlphaGo Zero × AlphaZero: ■ binary outcome (win / loss) × expected outcome (including draws or potentially other outcomes) ■ board positions transformed before passing to neural networks (by randomly selected rotation or reflection) × no data augmentation ■ games generated by the best player from previous iterations (margin of 55 %) × continual update using the latest parameters (without the evaluation and selection steps) [Silver et al. 2017a] 25
  • 101. AlphaZero: Differences Compared to AlphaGo Zero AlphaGo Zero × AlphaZero: ■ binary outcome (win / loss) × expected outcome (including draws or potentially other outcomes) ■ board positions transformed before passing to neural networks (by randomly selected rotation or reflection) × no data augmentation ■ games generated by the best player from previous iterations (margin of 55 %) × continual update using the latest parameters (without the evaluation and selection steps) ■ hyper-parameters tuned by Bayesian optimisation × reused the same hyper-parameters without game-specific tuning [Silver et al. 2017a] 25
  • 102. AlphaZero: Elo Rating over Training Time [Silver et al. 2017a] 26
  • 103. AlphaZero: Elo Rating over Training Time [Silver et al. 2017a] 26
  • 104. AlphaZero: Elo Rating over Training Time [Silver et al. 2017a] 26
  • 105. AlphaZero: Elo Rating over Training Time [Silver et al. 2017a] 26
  • 106. AlphaZero: Tournament between AI Programs (Values are given from AlphaZero’s point of view.) [Silver et al. 2017a] 27
  • 107. AlphaZero: Openings Discovered by the Self-Play (1/2) [Silver et al. 2017a] 28
  • 108. AlphaZero: Openings Discovered by the Self-Play (2/2) [Silver et al. 2017a] 29
  • 110. Difficulties of Go ■ challenging decision-making [Silver et al. 2017a] 30
  • 111. Difficulties of Go ■ challenging decision-making ■ intractable search space [Silver et al. 2017a] 30
  • 112. Difficulties of Go ■ challenging decision-making ■ intractable search space ■ complex optimal solution It appears infeasible to directly approximate using a policy or value function! [Silver et al. 2017a] 30
  • 113. AlphaZero: Summary ■ Monte Carlo tree search [Silver et al. 2017a] 31
  • 114. AlphaZero: Summary ■ Monte Carlo tree search ■ effective move selection and position evaluation [Silver et al. 2017a] 31
  • 115. AlphaZero: Summary ■ Monte Carlo tree search ■ effective move selection and position evaluation ■ through deep convolutional neural networks [Silver et al. 2017a] 31
  • 116. AlphaZero: Summary ■ Monte Carlo tree search ■ effective move selection and position evaluation ■ through deep convolutional neural networks ■ trained by new self-play reinforcement learning algorithm [Silver et al. 2017a] 31
  • 117. AlphaZero: Summary ■ Monte Carlo tree search ■ effective move selection and position evaluation ■ through deep convolutional neural networks ■ trained by new self-play reinforcement learning algorithm ■ new search algorithm combining [Silver et al. 2017a] 31
  • 118. AlphaZero: Summary ■ Monte Carlo tree search ■ effective move selection and position evaluation ■ through deep convolutional neural networks ■ trained by new self-play reinforcement learning algorithm ■ new search algorithm combining ■ evaluation by a single neural network [Silver et al. 2017a] 31
  • 119. AlphaZero: Summary ■ Monte Carlo tree search ■ effective move selection and position evaluation ■ through deep convolutional neural networks ■ trained by new self-play reinforcement learning algorithm ■ new search algorithm combining ■ evaluation by a single neural network ■ Monte Carlo tree search [Silver et al. 2017a] 31
  • 120. AlphaZero: Summary ■ Monte Carlo tree search ■ effective move selection and position evaluation ■ through deep convolutional neural networks ■ trained by new self-play reinforcement learning algorithm ■ new search algorithm combining ■ evaluation by a single neural network ■ Monte Carlo tree search ■ more efficient when compared to previous AlphaGo versions [Silver et al. 2017a] 31
  • 121. AlphaZero: Summary ■ Monte Carlo tree search ■ effective move selection and position evaluation ■ through deep convolutional neural networks ■ trained by new self-play reinforcement learning algorithm ■ new search algorithm combining ■ evaluation by a single neural network ■ Monte Carlo tree search ■ more efficient when compared to previous AlphaGo versions ■ single machine [Silver et al. 2017a] 31
  • 122. AlphaZero: Summary ■ Monte Carlo tree search ■ effective move selection and position evaluation ■ through deep convolutional neural networks ■ trained by new self-play reinforcement learning algorithm ■ new search algorithm combining ■ evaluation by a single neural network ■ Monte Carlo tree search ■ more efficient when compared to previous AlphaGo versions ■ single machine ■ 4 TPUs [Silver et al. 2017a] 31
  • 123. AlphaZero: Summary ■ Monte Carlo tree search ■ effective move selection and position evaluation ■ through deep convolutional neural networks ■ trained by new self-play reinforcement learning algorithm ■ new search algorithm combining ■ evaluation by a single neural network ■ Monte Carlo tree search ■ more efficient when compared to previous AlphaGo versions ■ single machine ■ 4 TPUs ■ hours rather than months of training time [Silver et al. 2017a] 31
  • 124. Novel approach [Silver et al. 2017a] 32
  • 125. Novel approach During the matches (against Stockfish and Elmo), AlphaZero evaluated thousands of times fewer positions than Deep Blue against Kasparov. [Silver et al. 2017a] 32
  • 126. Novel approach During the matches (against Stockfish and Elmo), AlphaZero evaluated thousands of times fewer positions than Deep Blue against Kasparov. It compensated this by: ■ selecting those positions more intelligently (the neural network) [Silver et al. 2017a] 32
  • 127. Novel approach During the matches (against Stockfish and Elmo), AlphaZero evaluated thousands of times fewer positions than Deep Blue against Kasparov. It compensated this by: ■ selecting those positions more intelligently (the neural network) ■ evaluating them more precisely (the same neural network) [Silver et al. 2017a] 32
  • 128. Novel approach During the matches (against Stockfish and Elmo), AlphaZero evaluated thousands of times fewer positions than Deep Blue against Kasparov. It compensated this by: ■ selecting those positions more intelligently (the neural network) ■ evaluating them more precisely (the same neural network) [Silver et al. 2017a] 32
  • 129. Novel approach During the matches (against Stockfish and Elmo), AlphaZero evaluated thousands of times fewer positions than Deep Blue against Kasparov. It compensated this by: ■ selecting those positions more intelligently (the neural network) ■ evaluating them more precisely (the same neural network) Deep Blue relied on a handcrafted evaluation function. [Silver et al. 2017a] 32
  • 130. Novel approach During the matches (against Stockfish and Elmo), AlphaZero evaluated thousands of times fewer positions than Deep Blue against Kasparov. It compensated this by: ■ selecting those positions more intelligently (the neural network) ■ evaluating them more precisely (the same neural network) Deep Blue relied on a handcrafted evaluation function. AlphaZero was trained tabula rasa from self-play. It used general-purpose learning. [Silver et al. 2017a] 32
  • 131. Novel approach During the matches (against Stockfish and Elmo), AlphaZero evaluated thousands of times fewer positions than Deep Blue against Kasparov. It compensated this by: ■ selecting those positions more intelligently (the neural network) ■ evaluating them more precisely (the same neural network) Deep Blue relied on a handcrafted evaluation function. AlphaZero was trained tabula rasa from self-play. It used general-purpose learning. This approach is not specific to the game of Go. The algorithm can be used for much wider class of AI problems! [Silver et al. 2017a] 32
  • 134. Input Features of AlphaZero’s Neural Networks [Silver et al. 2017a]
  • 135. AlphaZero: Statistics of Training [Silver et al. 2017a]
  • 137. Scalability When Compared to Other Programs [Silver et al. 2017a]
  • 138. Further Reading I AlphaGo: ■ Google Research Blog http://googleresearch.blogspot.cz/2016/01/alphago-mastering-ancient-game-of-go.html ■ an article in Nature http://www.nature.com/news/google-ai-algorithm-masters-ancient-game-of-go-1.19234 ■ a reddit article claiming that AlphaGo is even stronger than it appears to be: “AlphaGo would rather win by less points, but with higher probability.” https://www.reddit.com/r/baduk/comments/49y17z/the_true_strength_of_alphago/ ■ a video of how AlphaGo works (put in layman’s terms) https://youtu.be/qWcfiPi9gUU Articles by Google DeepMind: ■ Atari player: a DeepRL system which combines Deep Neural Networks with Reinforcement Learning (Mnih et al. 2015) ■ Neural Turing Machines (Graves, Wayne, and Danihelka 2014) Artificial Intelligence: ■ Artificial Intelligence course at MIT http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/ 6-034-artificial-intelligence-fall-2010/index.htm
  • 139. Further Reading II ■ Introduction to Artificial Intelligence at Udacity https://www.udacity.com/course/intro-to-artificial-intelligence--cs271 ■ General Game Playing course https://www.coursera.org/course/ggp ■ Singularity http://waitbutwhy.com/2015/01/artificial-intelligence-revolution-1.html + Part 2 ■ The Singularity Is Near (Kurzweil 2005) Combinatorial Game Theory (founded by John H. Conway to study endgames in Go): ■ Combinatorial Game Theory course https://www.coursera.org/learn/combinatorial-game-theory ■ On Numbers and Games (Conway 1976) ■ Computer Go as a sum of local games: an application of combinatorial game theory (Müller 1995) Chess: ■ Deep Blue beats G. Kasparov in 1997 https://youtu.be/NJarxpYyoFI Machine Learning: ■ Machine Learning course https://youtu.be/hPKJBXkyTK://www.coursera.org/learn/machine-learning/ ■ Reinforcement Learning http://reinforcementlearning.ai-depot.com/ ■ Deep Learning (LeCun, Bengio, and Hinton 2015)
  • 140. Further Reading III ■ Deep Learning course https://www.udacity.com/course/deep-learning--ud730 ■ Two Minute Papers https://www.youtube.com/user/keeroyz ■ Applications of Deep Learning https://youtu.be/hPKJBXkyTKM Neuroscience: ■ http://www.brainfacts.org/
  • 141. References I Conway, John Horton (1976). “On Numbers and Games”. In: London Mathematical Society Monographs 6. Graves, Alex, Greg Wayne, and Ivo Danihelka (2014). “Neural Turing Machines”. In: arXiv preprint arXiv:1410.5401. Kurzweil, Ray (2005). The Singularity is Near: When Humans Transcend Biology. Penguin. LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton (2015). “Deep Learning”. In: Nature 521.7553, pp. 436–444. Mnih, Volodymyr et al. (2015). “Human-Level Control through Deep Reinforcement Learning”. In: Nature 518.7540, pp. 529–533. url: https://storage.googleapis.com/deepmind-data/assets/papers/DeepMindNature14236Paper.pdf. Müller, Martin (1995). “Computer Go as a Sum of Local Games: an Application of Combinatorial Game Theory”. PhD thesis. TU Graz. Silver, David et al. (2016). “Mastering the Game of Go with Deep Neural Networks and Tree Search”. In: Nature 529.7587, pp. 484–489. Silver, David et al. (2017a). “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm”. In: arXiv preprint arXiv:1712.01815. Silver, David et al. (2017b). “Mastering the Game of Go without Human Knowledge”. In: Nature 550.7676, pp. 354–359.