Towards a General Theory of Intelligence - April 2018

London Deep Learning Lab Meetup – April 19, 2018
© Peter Morgan, April 2018 https://www.meetup.com/Deep-Learning-Lab/

Towards a General Theory
of Intelligence
Peter Morgan www.deeplp.com

Thanks to our Sponsors
Wizebit
© Peter Morgan, April 2018

Upcoming
Conferences

London 9-11 October

Announcements
• TensorFlow Dev Summit March 30, 2018
• Summary of TF developments over the last year
• Held in Mountain View CA
• https://www.youtube.com/watch?v=bUjMAzCg
k2A&list=PLQY2H8rRoyvxjVx3zfw4vA4cvlKogyL
NN
• Coincided with Release 1.7
• 11 million downloads so far
• Many highlights – check it out.

Announcements
• HOUSE OF LORDS Select Committee on Artificial Intelligence releases AI Report
on 16 April: “AI in the UK: ready, willing and able?”
• https://www.parliament.uk/business/committees/committees-a-z/lords-
select/ai-committee/news-parliament-2017/ai-report-published/
• The Select Committee on Artificial Intelligence was appointed by the House of
Lords on 29 June 2017 “to consider the economic, ethical and social implications
of advances in artificial intelligence”
• “Our inquiry has concluded that the UK is in a strong position to be among the
world leaders in the development of artificial intelligence during the twenty- first
century”.

Announcements
(due to be published
by end of April)

Outline of Talk
• Physical Systems
• Biological
• Non-biological
• Deep Learning
• Description
• CNN, RNN, LSTM, GAN
• Reinforcement Learning
• Latest Research in DL
• Other (Better) Theories?
• Overview
• Comparisons
• AGI
• Conclusions

Motivation
• Solve (general) intelligence
• Use it to solve everything else
• Medicine
• Cancer
• Brain disease (Alzheimer's, etc.)
• Longevity
• Physics
• Maths
• Materials science
• Social

The Big
Picture -
a ToE?
Physics Computer
Science
Neuroscience

Physical Systems
• Biological
• Plants, bacteria, insects, reptiles, mammalian, biological brains
• Non-biological
• CPU - Intel Xeon SP, AMD RyZen, Qualcomm, IBM PowerPC, ARM
• GPU - Nvidia (Volta), AMD (Vega)
• FPGA - Intel (Altera, Xylinx etc.)
• ASIC - Google TPU, Graphcore IPU, Intel Nervana, Wave, …
• Neuromorphic (Human Brain Project - SpiNNaker, BrainScaleS; IBM TrueNorth; Intel Liohi, …
• Quantum
• IBM, Microsoft, Intel, Google, DWave, Rigetti, …
• Quantum biology? (photosynthesis, navigation, …)
• QuantumML, Quantum Intelligence

Types of Physical Computation Systems*
*Can we find a theory that unifies them all (classical, quantum, biological, non-biological)
Digital Neuromorphic
Quantum Biological

Biology

Biological
Systems are
Hierarchical

Biological
Neuron
Microstructure

Biological
Neuron

Hand drawn
neuron types
From "Structure of the Mammalian Retina"
c.1900, by Santiago Ramon y Cajal.

Neuron -
scanning
electron
microscope

Cortical
columns in
the cortex

Human
Connectome

Central
Nervous
System (CNS)

Social
Systems

A
Comparison
of Neuron
Models

Non-biological
Hardware
• Digital
• CPU
• GPU
• FPGA
• ASIC
• Neuromorphic
• Various architectures
• SpiNNaker, BrainScaleS, …
• Quantum
• Different qubits
• Anyons, superconducting, photonic, …

Digital
Computing
• Abacus
• Charles Babbage
• Ada Lovelace
• Vacuum tubes (valves)
• Turing
• Von Neumann
• ENIAC
• Transistor (Bardeen, Brattain, Shockey, 1947)
• Intel
• ARM
• Nvidia

Cray-1
1976
160 MFlops© Peter Morgan, April 2018

CPU – Intel
Xeon
Up to 18 cores, ~1 TFlops© Peter Morgan, April 2018

GPU –
Nvidia Volta
V100
21 billion transistors, 120 TFlops© Peter Morgan, April 2018

DGX-2 - released 27 Mar 2018
16 V100’s, 2 PFlops, 30TB storage ($400k) 2 PFlops!

ASIC -
Google TPU
v2
180 TFlops© Peter Morgan, April 2018

ASIC - Graphcore IPU
© Peter Morgan, April 2018 >200 TFlops

Graph computations – Graphcore (ResNet-50)

TPU Pod
64 2nd-genTPUs
11.5 PetaFlops
4 Terabytes ofmemory
Cloud TPU’s

HPC –
what’s
next?
Currently 100PFlops
By 2020 - Exascale© Peter Morgan, April 2018

Processor
Performance
(MFlops)
More specific à

End to End Hardware Example

Neuromorphic
Computing
• Biologically inspired
• First proposed Carver Mead, Caltech, 1980’s
• Uses analogue signals – spiking neural networks (SNN)
• SpiNNaker (Manchester, HBP, Furber)
• BrainScaleS (Heidelberg, HBP, Schemmel)
• TrueNorth (IBM, Modha)
• Intel Liohi
• Startups (Knowm, Spaun, etc.)
• Up to 1 million cores, 1 billion “neurons” (mouse)
• Need to scale 100X à human brain
• Relatively low power
• Available on the (HBP) cloud today

SpiNNaker
Neuromorphic
Computer

Neuromorphic
vs von
Neumann

TrueNorth
Performance

Neuromorphic v ASIC
Analogue v Digital

Quantum
Computing
• First proposed by Richard Feynman, Caltech, 1980’s
• Qubits – spin 1, 0 and superposition states (QM)
• (Nature is) fundamentally probabilistic at atomic scale
• Have to be kept cold (mKelvin) to avoid
noise/decoherence
• Building is an engineering problem (theory is known)
• Several approaches - superconductors, trapped ions,
semiconductors, topological structures
• Several initiatives (with access available)
• Microsoft, IBM, Google, Intel, Dwave, Rigetti, etc.
• Can login today
• Many applications – optimization, cryptography, drug
discovery, etc.

IBM 50 Qubit
Quantum
Computer

Quantum Logic Gates

Summary – Now have three non-biological stacks
Algorithms
Distributed Layer
OS
Hardware
Digital Neuromorphic Quantum

Outline
• Physical Systems
• Biological
• Non-biological
• Deep Learning
• Description
• CNN, RNN, LSTM, GAN
• Reinforcement Learning
• Latest Research in DL
• Other (Better) Theories?
• Overview
• Comparisons
• AGI
• Conclusions

Deep Learning
• Artificial Neural Networks (ANNs)
• Universal Approximation Theorem
• Computation graph
• Hyperparameters
• AutoML
• Optimization
• CNN
• RNN (LSTM)

Deep Learning
(cont.)
• GAN
• Different Models
• AlexNet, VGG, ResNet, Inception
• Squeeznet, MobileNet
• DL Frameworks
• TensorFlow
• MXnet, CNTK, PyTorch
• Training data sets
• Text, speech, images, video, time series

Early papers

Nodes and Layers

More Neural Networks (“Neural Network Zoo”)

Computation in each node

Universal Approximation Theorem
• A feed-forward network with a single hidden layer containing a finite number
of neurons, can approximate continuous functions in Rn, under mild assumptions on the
activation function
• We can define as an approximate realization of f(x):
• One of the first versions of the theorem was proved by George Cybenko in 1989
for sigmoid activation functions
• Kurt Hornik showed in 1991 that it is not the specific choice of the activation function, but
rather the multilayer feedforward architecture which gives neural networks the potential
of being universal approximators
• Cybenko, G. (1989) "Approximations by superpositions of sigmoidal functions",
Mathematics of Control, Signals, and Systems, 2 (4), 303-314
• Kurt Hornik (1991) "Approximation Capabilities of Multilayer Feedforward Networks",
Neural Networks, 4(2), 251–257

Computation Graph
https://www.tensorflow.org/programmers_guide/graph_viz© Peter Morgan, April 2018

Hyperparameters
• Activation function
• Loss (cost) function
• Learning rate
• Initialization
• Batch normalization
• Automation
• Hyperparameter tuning
• AutoML
• https://research.googleblog.com/2018/03/using-machine-learning-to-discover.html

Optimizations
• Initializers
Constant, Uniform, Gaussian, Glorot Uniform, Xavier, Kaiming, IdentityInit, Orthonormal
• Optimizers
Gradient Descent with Momentum, RMSProp, Adadelta, Adam, Adagrad, MultiOptimizer
• Activations Rectified Linear, Softmax, Tanh, Logistic, Identity, ExpLin
• Layers Linear, Convolution, Pooling, Deconvolution, Dropout, Recurrent, Long Short-
Term Memory, Gated Recurrent Unit, BatchNorm, LookupTable, Local Response Normaliz
ation, Bidirectional-RNN, Bidirectional-LSTM
• Cost functions Binary Cross Entropy, Multiclass Cross Entropy, Sum of Squares Error

Deep
Learning
Performance
Image classification

Deep Learning Performance
ImageNet Error rate is now around 2.2%, less than half that of average humans

Convolutional Neural
Networks
• First developed in 1970’s.
• Widely used for image recognition and
classification.
• Inspired by biological processes, CNN’s
are a type of feed-forward ANN.
• The individual neurons are tiled in such a
way that they respond to overlapping
regions in the visual field
• Yann LeCun – Bell Labs, 90’s

Recurrent Neural Networks
• First developed in 1970’s.
• RNN’s are neural networks that
are used to predict the next
element in a sequence or time
series.
• This could be, for example,
words in a sentence or letters in
a word.
• Applications include predicting
or generating music, stories,
news, code, financial instrument
pricing, text, speech, in fact the
next element in any event
stream.

GANs
Generative Adversarial Networks - introduced
by Ian Goodfellow et al in 2014 (see references)
A class of artificial intelligence algorithms used
in unsupervised deep learning
A theory of adversarial examples, resembling
what we have for normal supervised learning
Implemented by a system of two neural
networks, a discriminator, D and a generator, G
D & G contest with each other in a zero-sum
game framework
Generator generates candidate networks and
the discriminator evaluates them

Stacked Generative Adversarial Networks
https://arxiv.org/abs/1612.04357v1© Peter Morgan, April 2018

Collection Style Transfer

Season Transfer

Models
AlexNet (Toronto)
VGG (Oxford)
ResNet (Microsoft)
Inception (Google)
DenseNet (Cornell)
SqueezNet (Berkeley)
MobileNet (Google)
NASNet (Google)

Deep
Learning
Frameworks

Top 20 ML/DL Frameworks
KD Nuggets Feb 2018 https://www.kdnuggets.com/2018/02/top-20-python-ai-machine-learning-open-source-projects.html
* Deep Learning
o Machine Learning
*
MXNet
*CNTK

TensorFlow
• TensorFlow is the open sourced deep learning library from Google (Nov 2015)
• It is their second generation system for the implementation and deployment of
large-scale machine learning models
• Written in C++ with a python interface, originated from research and deploying
machine learning projects throughout a wide range of Google products and
services
• Initially TF ran only on a single node (your laptop, say), but now runs on distributed
clusters
• Available across all the major cloud providers (TFaaS)
• Second most popular framework on GitHub
• Close to 100,000 stars as of March 2018
• https://www.tensorflow.org/

TensorFlow supports many platforms
RaspberryPi
AndroidiOS
1st-genTPU
GPUCPU
CloudTPU © Peter Morgan, April 2018

Growth of Deep Learning atGoogle
and many more . . ..
Directories containing model descriptionfiles

TensorFlow Popularity

Other
Frameworks
• CNTK (Microsoft)
• MXnet (Amazon)
• Keras (Open source community)
• PyTorch (Facebook)
• Caffe (Berkeley)
• Neon (Intel)
• Chainer (Preferred Networks)

Data Sets
• Text, speech, images, video, time series
• Examples of recorded data sets include the MNIST and Labeled Faces in the Wild
(LFW).
MNIST LFW

Other Data Sets
• Images: CIFAR-10, ImageNet, PASCAL VOC, Mini-Places2, Food 101
• Text: IMDB, Penn Treebank, Shakespeare Text, bAbI, Hutter-prize
• Video: UCF101, Kinetics, YouTube-8M, CMU mocap
• Others: flickr8k, flickr30k, COCO
• List of data sets for machine learning
https://en.wikipedia.org/wiki/List_of_datasets_for_machine_learning_research

Open Source
• ML Frameworks – open source (e.g., TensorFlow)
• Operating systems – open source (Linux)
• Hardware – open source (OCP = Open Compute
Project)
• Data sets – open source (see previous slide)
• Research – open source (see arXiv)
• The fourth industrial revolution will be open source

Reinforcement
Learning
• TD Learning
• DQN
• Latest research
• NIPS Workshop Dec 2017
• http://metalearning-symposium.ml

RL Research Directions
• Graphcore https://www.graphcore.ai/posts/directions-of-ai-research
• Bristol ASIC
• Geometric Deep Learning http://geometricdeeplearning.com
• Gary Marcus
• Berkeley (BAIR) http://bair.berkeley.edu
• Peter Abdeel
• Serge Levine
• Deepmind https://deepmind.com
• IMPALA (DMLab) https://deepmind.com/blog/impala-scalable-distributed-deeprl-
dmlab-30/
• OpenAI https://openai.com
• Research white papers

Other
Theories of
Intelligence
• What do we need?
• Active Inference
• Gauge theories
• Other approaches
• Applications
• Building AGI

What do we need to build AGI? A Principle of
Principles?
• Free Energy Principle
• Systems act to minimize their expected free energy
• Reduce uncertainty (or surprisal)
• F = Complexity – Accuracy
• Prediction error = expected outcome – actual outcome = surprise
• Theory of Everything (ToE)
• In physics - try to unify gravity and quantum mechanics è call this a ToE
• But actually Active Inference is more encompassing than even this
• It encompasses all interactions and dynamics (physical phenomena)
• Over all time scales
• Over all distance scales
• Also see Constructor Theory
• David Deutsch (Oxford)

So what are the principles?
Hint: we already have them
Newtonian
mechanics – three
laws
Special relativity –
invariance of laws
under a Lorentz
transformation
GR – Principle of
Equivalence
Electromagnetism
– Maxwell’s
equations
Thermodynamics –
three laws
Quantum
mechanics –
uncertainty
principle
Relativistic QM –
Dirac equation
Dark energy/dark
matter – we don’t
know yet
All of the above =
Principle of Least
Action

Analogy – Einstein’s General Theory of Relativity
• Made some very general (and insightful)
assumptions about the laws of physics in a
gravitational field (non-inertial frames)
• Equivalence principle
• Covariance of laws of physics
• Generalised coordinate system –
Riemannian geometry
• Spacetime is curved
• Standing on the shoulders of giants
• After ten years of hard work he finally
wrote down his now famous field equations

All known physics – Field theoretic

Active Inference - Information theoretic (uses generalised free energy)
( ) argmin E [ ( , )] [ ( ) || ( )]
( ) argmin ( , )
( , ) E [ln ( | ) ln ( , | )]
ln ( ) ( , )
( , ) E [ln ( | ) ln ( , | )]
Q
Q
entropy energy
Q
entropy energy
Q F D Q P
Q s F
F Q s P o s
P G
G Q s P o s
t
t t
t t t
t
t t t
p p t p p
p p t
p t p p
p p t
p t p p
= +
=
= -
= -
= -
å
å
å
!"#"$ !""#""$
!"#"$ !""#""$
Perceptual inference
Policy selection
( , | )
( , | )
( , ) E [ ( , )]
E [ln ( | ) ( | ) ln ( , )]
E [ ( | ) || ( | )] [ ( , ) || ( | ) ( | )]
Q
Q o s
entropy energy
Q o s
expected cost epistemic value(mutual informat
G F
Q o Q s P o s
D Q s P s D Q o s Q s Q o
t t
t t
p t t t t
p t t t t t t
p t p t
p p
p p p p
=
= -
= -
!"""#"""$ !"#"$
!""""#""""$
ion)
!"""""#"""""$
Generalised free energy – with some care
( | ) :
( | )
( ) :
( ) :
( | )
( | ) :
P o s t
Q o s
o t
P s t
P s
P s t
t t
t t
t
t
t
t
t
d t
t
p
p t
>ì
= í
£î
>ì
= í
£î

Active Inference
Karl Friston - UCL

ln ( ) ( , )
arg min ( , )
( , ) E [ln ( | ) ln ( , )]
[ ( | ) || ( )] [ [ ( | )]]
Q
expected entropy expected energy
Q
expected cost expected ambiguity
P G
G
G Q s P o s
D Q s P s E H P o s
t
p t
t t t
t t t t
p g p t
p t
p t p
p
= - × Þ
=
= -
= +
å
åπ
!"#"$ !"#"$
!"""#"""$ !""#""$
[ ] 0
0
( [ ], [ ]| [ ])
( [ ]) 0
[ ] arg min ( [ ])
( [ ]) E [ln ( [ ]| [ ]) ln ( [ ], [ ])]
[ ( [ ]| [ ]) || ( [ ])]
T
a
T
a
p s a
expected entropy expected energy
expected complexity
d
a d
a p b p b
D p b p
t
h t t t
d t t
t t t
t h t t t h t
h t t h t
= Þ
=
= -
=
ò
ò
I a
a I
I
!""#""$ !""#""$
!""""#""" $ ( [ ]| [ ])E [ [ ( [ ]| [ ])]]p a
expected ambiguity
H p bh t t t h t+
" !"""""#"""""$
Active states
( , )ss f bh w= +
( )a af b Fµ -Ñ
External states
Sensory states
( , )f bhh h w= +!
prefrontal cortex
β
tu
VTA/SN
motor cortex
occipital cortex
striatum
toπ
p
ts
G hippocampus
Discrete formulation
Dynamic formulation
Expected surprise and free energy

What is free-energy?
Free-energy is basically prediction error
where small errors mean low surprise
General Principle – Systems act to minimize uncertainty (their
expected free energy)
sensations – predictions
= prediction error

The Markov blanket of cells to brains
Active states
( , , )aa f s a µ»!
External states Internal states
Sensory states
( , , )f s aµµ µ»!
( , , )s ss f s ay w= +!
( , , )f s ay yy y w= +!
Cell
Brain

But what about the Markov blanket?
( , , )s s a µ=!
"#$
( ) ln ( | )
( ) ln ( | )a
Q p s m
a Q p s m
µµ = G - Ñ
= G - Ñ
! "
! "
Perception
Action
Reinforcement learning, optimal control
and expected utility theory
Infomax, minimum redundancy and the
free-energy principle
Self-organisation, synergetics and
homoeostasis
Bayesian brain, evidence
accumulation and predictive coding
Value
Surprise
Entropy
Model evidence
Pavlov
Haken
Helmholtz
ln ( | )
ln ( | )
[ ln ( | )]
( | )
t
p s m
F p s m
E p s m
p s m
=
= - =
- =
=
!
!
!
!
Barlow
( ) ( ) ln ( | )f x Q p x m= G - Ñ

Application

Summary
• Biological agents resist the second law of thermodynamics
• They must minimize their average surprise (entropy)
• They minimize surprise by suppressing prediction error (free-energy)
• Prediction error can be reduced by changing predictions (perception)
• Prediction error can be reduced by changing sensations (action)
• Perception entails recurrent message passing in the brain to optimise predictions
• Action makes predictions come true (and minimises surprise)
Perception Birdsong and categorization
Simulated lesions
Action Active inference
Goal directed reaching
Policies Control and attractors
The mountain-car problem

Techniques from Maths and Physics
• We’ve already been here before
• We use various mathematical techniques to describe physical phenomena
• Maths: higher dimensions, group theory, transformations, symmetries, path integrals, variational
calculus, gauge theories, topology, vector spaces, category theory, algebraic geometry, …
• Physics: special relativity, general relativity, QM, QFT, QED, standard model, particle physics,
statistical physics, information theory, classical physics, EM, gravitation, string theory, unification
theory, …
• Apply above tools to the brain – after all the brain is a (hierarchical) physical system
• For example – mirror symmetry
• Transform to another mathematical space where the calculation is more easily performed, then
transform back (“duality”)

Gauge Theories
• Invariance of laws under transformations – Gauge theories
• Give rise to conservation laws
• Noether’s theorem
• Examples:
• Neuronal gauge theory - many aspects of neurobiology can be seen
as consequences of fundamental invariance properties
• See references section
Invariance under transformation Conserved quantity
Space Momentum
Time Energy
Rotation Angular momentum

Types of Intelligence

Comparisons - ANN vs BNN
• Neural circuits in the brain develop via synaptic pruning; a process by which
connections are overproduced and then eliminated over time
• In contrast, computer scientists typically design networks by starting with an
initially sparse topology and gradually adding connections
• AI (specific) vs AGI (general)
• Yann LeCun – CNN’s Bell Labs in ’80/90’s – “mathematical, not biological”
• Gone as far as we can with ”just” mathematics
• Now almost every researcher looking to biology for inspiration
• Costa et al, 2018, etc. (see “Bio-plausible Deep Learning” in reference section)

Approaches
• Helmholtz (Late 1800’s)
• Friston – Active Inference
• Tishby – Information bottleneck
• Bialek – Biophysics
• Hutter - AIXI
• Schmidhuber – Godel Machine
• Etc.

Key Concepts
• Bayesian inference
• Predictive coding
• Generative models
• Cortical organization
• Perception
• Action
• Learning
• Decision making
• Affect
• Computational psychiatry

Probabilistic Programming
• A probabilistic programming language (PPL) is a programming language designed
to describe probabilistic models and then perform inference in those models
• Define a probability model on a programme
• Closely related to graphical models and Bayesian networks, but are more
expressive and flexible. Probabilistic programming represents an attempt to unify
general purpose programming with probabilistic modeling
• Languages include Edward, Church, Anglican, Pyro, PyMC, MetaProb, Gen, Stan,
Turing.jl, Infer.NET
• Introducing TensorFlow Probability https://medium.com/tensorflow/introducing-
tensorflow-probability-dca4c304e245
• Announced at TF Dev Summit, March 30, 2018 (see next slide)

Tensorflow
Probability

Implementations & Applications
• BNN Frameworks – SPM, PyNN, NEST, NEURON, Brian
• Various open source frameworks on github
• Hearing aids - GN Group (DK)
• Order of Magnitude - Christian Kaiser (SV)

Building
AGI

Building AGI

Can we build general intelligence?
• We have the theory – active inference
• We have the algorithms/software
• We have the hardware (ASIC, neuromorphic)
• We have the data sets (Internet plus open data sets)
• Need to build out libraries
• A TensorFlow for general intelligence
• Open source? (Open/closed)
• Apollo Project of our time – “Fourth Revolution”
• Human Brain Project
• Deepmind
• BRAIN project
• Should we build AGI/ASI? – safety, ethics, singularity?© Peter Morgan, April 2018

Other AGI
Projects
• OpenCog – Ben Goertzel (US)
• Numenta – Jeff Hawkins (US)
• Curious AI – (Finland)
• AGI Innovations – Peter Voss (US)
• Eurisko – Doug Lenat (US)
• GoodAI – Marek Rosa (Czech)
• OpenAI – Sam Altman (US)
• NNAIsense – Jurgen Schmidhuber (Swiss)
• Deepmind – Demis Hassibis (UK)
• Vicarious – Dileep George (US)
• SOAR – CMU
• ACT-R – CMU
• Sigma – Paul Rosenbloom – USC
• Plus many more

Conclusions
• Deep Learning (ANN) is lacking many of the characteristics and attributes needed
for a general theory of intelligence
• Active inference is such a theory (A ToE* which includes AGI)
• ANN research groups are now (finally) turning to biology for inspiration
• Bioplausible models are starting to appear
• Some groups are starting to look at active inference
• AGI in five years? Ten years?
• Still have to wait for hardware to mature
• Neuromorphic might be the platform that gets us *there*
* ToE = Theory of Everything

References

Neuroscience - Books
• Saxe, G. et al, Brain entropy and human intelligence: A resting-state fMRI study, PLOS One,
Feb 12, 2018
• Sterling, P. and Laughlin, S., Principles of Neural Design, MIT Press, 2017
• Slotnick, S., Cognitive Neuroscience of Memory, Cambridge Univ Press, 2017
• Engel, Friston, Kragic, Eds, The Pragmatic Turn - Toward Action-Oriented Views in Cognitive
Science, MIT Press, 2016
• Gerstner, W. et al, Neuronal Dynamics, Cambridge Univ Press, 2014
• Kandel, E., Principles of Neural Science, 5th ed, McGraw-Hill, 2012
• Rabinovich, Friston and Varona, Eds, Principles of Brain Dynamics, MIT Press, 2012
• Jones, E. G. Thalamus, Cambridge Univ. Press, 2007
• Dayan, P. and L. Abbott, Theoretical Neuroscience, MIT Press, 2005

Neuroscience - Papers
• Crick, F., The recent excitement about neural networks, Nature337, 129–132, 1989
• Rao RP and DH Ballard, Predictive coding in the visual cortex, Nature Neuroscience 2:79–87, 1999
• Izhikevich, E. M., Solving the distal reward problem through linkage of STDP and dopamine
signalling, Cereb. Cortex 17, 2443–2452, 2007
• How the brain constructs the world, 2018 https://medicalxpress.com/news/2018-02-brain-world.html
• Lamme, V. A. F. & Roelfsema, P. R., The distinct modes of vision offered by feedforward and recurrent
processing, Trends Neurosci. 23, 571–579, 2000
• Sherman, S. M., Thalamus plays a central role in ongoing cortical functioning, Nat. Neurosci. 16, 533–
541, 2016
• Harris, K. D. & Shepherd, G. M. G., The neocortical circuit: themes and variations, Nat.
Neurosci. 18, 170–181, 2015
• van Kerkoerle, T. et al, Effects of attention and working memory in the different layers of monkey
primary visual cortex, Nat. Commun. 8, 13804, 2017
• Roelfsema, P.R. and A. Holtmaat, Control of synaptic plasticity in deep cortical networks, Nature
Reviews Neuroscience, 19, pages 166–180, 2018

Hardware
• Lacey, G. et al, Deep Learning on FPGAs: Past, Present, and Future, Feb 2016
https://arxiv.org/abs/1602.04283
• AI ASICs https://www.nanalyze.com/2017/05/12-ai-hardware-startups-new-ai-chips/
• Suri, M. Advances in Neuromorphic Hardware, Springer, 2017
• Human Brain Project, Silicon Brains https://www.humanbrainproject.eu/en/silicon-
brains/
• Artificial Brains http://www.artificialbrains.com
• The Future is Quantum https://www.microsoft.com/en-us/research/blog/future-is-
quantum-with-dr-krysta-svore/?OCID=MSR_podcast_ksvore_fb
• Wang, Z. et al, Fully memristive neural networks for pattern classification with
unsupervised learning, Nature Electronics, 8 Feb, 2018

Classical Deep Learning
• Schmidhuber, Jurgen, Deep learning in neural networks: An overview, Neural Networks, 61:85–117, 2015
• Bengio, Yoshua et al, Deep Learning, MIT Press, 2016
• LeCun, Y., Bengio, Y., and Hinton, G., Deep Learning, Nature, v.521, p.436–444, May 2016
http://www.nature.com/nature/journal/v521/n7553/abs/nature14539.html
• Brtiz, D. et al, Massive Exploration of Neural Machine Translation Architectures, Mar 2017
• Liu H. et al, Hierarchical representations for efficient architecture search, 2017
• NIPS 2017 Proceedings https://papers.nips.cc/book/advances-in-neural-information-processing-systems-30-
2017
• Deepmind papers https://deepmind.com/blog/deepmind-papers-nips-2017/
• Jeff Dean, Building Intelligent Systems with Large Scale Deep Learning, TensorFlow slides, Google Brain,
2017
• Rawat, W. and Z. Wang, Deep Convolutional Neural Networks for Image Classification: A Comprehensive
Review, Neural Computation, 29(9), Sept 2017

New Ideas in Deep Learning
• Sabour, S. et al, Dynamic Routing Between Capsules, Nov 2017, https://arxiv.org/abs/1710.09829
• Chaudhari, P. and S. Soatto, Stochastic gradient descent performs variational inference, Jan 2018,
• Vidal, R. et al, The Mathematics of Deep Learning, Dec 2017, https://arxiv.org/abs/1712.04741
• Chaudhari, P. and S. Soatto, On the energy landscape of deep networks, Apr 2017,
• Pearl, Judea, Theoretical Impediments to Machine Learning With Seven Sparks from the Causal
Revolution, Jan 2018, https://arxiv.org/abs/1801.04016
• Marcus, Gary, Deep Learning: A Critical Appraisal, Jan 2018, https://arxiv.org/abs/1801.00631
• Scellier, B. and Y. Bengio, Equilibrium propagation: bridging the gap between energy-based models
and backpropagation, Front. Comput. Neurosci. 11, 24, 2017
• Pham H. et al, Efficient Neural Architecture Search via Parameter Sharing, Feb 2018,
• Jaderberg, M. et al, Population Based Training of Neural Networks, 28 Nov, 2017,

Bio-plausible Deep Learning
• Hassabis, D. et al, Neuroscience-Inspired Artificial Intelligence, Neuron, 95(2), July
2017
• Marblestone, A.H. et al, Toward an Integration of Deep Learning and Neuroscience,
Front Comput Neurosci., 14 Sept, 2016
• Costa, R.P. et al, Cortical microcircuits as gated-recurrent neural networks, Jan
2018 https://arxiv.org/abs/1711.02448
• Lillicrap T.P. et al, Random synaptic feedback weights support error
backpropagation for deep learning, Nature Communications 7:13276, 2016
• Sacramento, J. et al, Dendritic error backpropagation in deep cortical microcircuits,
Dec 2017, https://arxiv.org/abs/1801.00062
• Guerguiev, J. et al, Towards deep learning with segregated dendrites, eLife
Neuroscience, 5 Dec, 2017
• Webb, S., Deep learning for biology, Nature, 20 Feb,2018,
https://www.nature.com/articles/d41586-018-02174-z

Cognitive Science
• Barbey, A., Network Neuroscience Theory of Human Intelligence, Trends in Cognitive Sciences,
22(1), Jan 2018
• Navlakha, B. et al, Network Design and the Brain, Trends in Cognitive Sciences, 22 (1), Jan 2018
• Lake, B. et al, Building Machines That Learn and Think Like People, 2016
• Lake, B., et al, Human-level concept learning through probabilistic program induction, Science,
350(6266) Dec 2015
• Tenenbaum, J.B. et al, How to Grow a Mind: Statistics, Structure, and Abstraction, Science,
331(1279) March 2011
• Trends in Cognitive Sciences, Special Issue: The Genetics of Cognition 15 (9), Sept 2011
• William Bialek, Princeton https://www.princeton.edu/~wbialek/categories.html
• Dissecting artificial intelligence to better understand the human brain
https://medicalxpress.com/news/2018-03-artificial-intelligence-human-brain.html

Active Inference
• Friston K., The free-energy principle: a unified brain theory? Nature Reviews
Neuroscience, 11(2), 2010
• Friston, K., Life as we know it, Journal of the Royal Society Interface, 3 July, 2013
• Friston, K. et al, Active Inference: A Process Theory, Neural Computation, 29(1),
Jan 2017
• Friston, K., Consciousness is not a thing, but a process of inference, Aeon, 18 May,
2017
• Kirchoff, M. et al, The Markov blankets of life, Journal of the Royal Society
Interface, 17 Jan, 2018

Gauge Theories and Beyond
• Sengupta et al, Towards a Neuronal Gauge Theory, PLOS Biology, Mar 8, 2016
• Information geometry https://en.wikipedia.org/wiki/Information_geometry
• Algebraic geometry – HBP https://www.wired.com/story/the-mind-boggling-math-that-
maybe-mapped-the-brain-in-11-dimensions/
• Guss, W.H., Deep function machines: Generalized neural networks for topological layer
expression, 2016 https://arxiv.org/abs/1612.04799
• Guss, W.H. and R. Salakhutdinov, On Characterizing the Capacity of Neural Networks
using Algebraic Topology, 2018 https://arxiv.org/abs/1802.04443
• Fok, R. et al, Spontaneous Symmetry Breaking in Deep Neural Networks ICLR Conference
Submission, Feb 2018
• Bronstein, M.M. et al, Geometric deep learning: going beyond Euclidean data, May 2017,

AGI
• Veness, J. et al, A Monte Carlo AIXI Approximation, Dec 2010, https://arxiv.org/abs/0909.0801
• Schmidhuber, J., Goedel Machines: Self-Referential Universal Problem Solvers Making Provably
Optimal Self-Improvements, Dec 2006, https://arxiv.org/abs/cs/0309048
• Hutter, M., One Decade of Universal Artificial Intelligence, Feb 2012,
• Sunehag, P. and M. Hutter, Principles of Solomonoff Induction and AIXI, Nov 2011,
• Silver, D. et al, Mastering the game of Go without human knowledge, Nature, Vol 550, 19 Oct,
2017
• Wolpert, D., Physical limits of inference, Oct 2008, https://arxiv.org/abs/0708.1362
• Goertzel, B., Toward a Formal Model of Cognitive Synergy, Mar 2017,
• Hauser, Hermann, Are Machines Better than Humans? Evening lecture on machine intelligence at
SCI, London, 25 October 2017 https://www.youtube.com/watch?v=SVOMyEeXUow

Information Theory
• Chaitin, G.J., From Philosophy to Program Size, Mar 2013,
https://arxiv.org/abs/math/0303352
• Solomonoff, R.J., Machine Learning — Past and Future, Revision of lecture given at AI@50,
The Dartmouth Artificial Intelligence Conference, July 13-15, 2006
• Publications of A. N. Kolmogorov, Annals of Probability, 17(3), July 1989
• Levin, L. A., Universal Sequential Search Problems, Problems of Information Transmission,
9(3), 1973
• Shannon, C.E., A Mathematical Theory of Communication, Bell System Technical Journal, 27
(3):379–423, July 1948
• Tishby, N. & R. Schwartz-Ziv, Opening the Black Box of Deep Neural Networks via Information, Apr
29 2017, https://arxiv.org/abs/1703.00810
• AIT https://en.m.wikipedia.org/wiki/Algorithmic_information_theory

Classic Papers
• Turing, A.M., Computing Machinery and Intelligence, Mind 49:433-460, 1950
• Schrodinger, E., What is Life? Based on lectures delivered at Trinity College, Dublin, Feb
1943 http://www.whatislife.ie/downloads/What-is-Life.pdf
• Deutsch, David, The Constructor Theory of Life, Journal of the Royal Society Interface,
12(104), 2016
• Rumelhart DE, Hinton GE, Williams RJ, Learning representations by back-propagating
errors, Nature 323:533–536, 1986
• Crick F., The recent excitement about neural networks, Nature 337:129–132, 1989
• Kolmogorov, A., On Analytical Methods in the Theory of Probability, Mathematische
Annalen, 104(1) 1931
• Solomonoff, R.J., A Formal Theory of Inductive Inference, Part 1, Information and Control,
7(1), Mar, 1964, http://world.std.com/~rjs/1964pt1.ps
• McCulloch, W.S. and W. Pitts, A logical calculus of the ideas immanent in nervous activity,
Bulletin of Mathematical Biophysics, 5(4):115–133, 1943

Mathematical
• Cybenko, George, Approximation by superpositions of a sigmoidal function,
Mathematics of Control, Signals, and Systems, 2(4):303–314, 1989
• Kurt Hornik (1991) "Approximation Capabilities of Multilayer Feedforward
Networks", Neural Networks, 4(2), 251–257
• Stinchcombe, M.B., Neural network approximation of continuous functionals and
continuous functions on compactifications, Neural Networks, 12(3):467–477,
1999

Books
• Sutton, R. S. & A.G. Barto, Reinforcement Learning, 2nd ed., MIT Press, 2018
• Goodfellow, I. et al, Deep Learning, Cambridge University Press, 2016
• Li, Ming and Paul Vitanyi, An Introduction to Kolmogorov Complexity and Its
Applications. Springer-Verlag, N.Y., 2008
• Hutter M., Universal Artificial Intelligence, Springer–Verlag, 2004
• Wolfram, S., A New Kind of Science, Wolfram Media, 2002
• MacKay, David, Information theory, inference and learning algorithms, Cambridge
University Press, 2003
• Hebb, D. O. The Organization of Behavior. A Neuropsychological Theory, John Wiley &
Sons, 1949

Questions

Towards a General Theory of Intelligence - April 2018

Recommandé

Recommandé

Contenu connexe

Similaire à Towards a General Theory of Intelligence - April 2018

Similaire à Towards a General Theory of Intelligence - April 2018 (20)

Plus de Peter Morgan

Plus de Peter Morgan (9)

Dernier

Dernier (20)

Towards a General Theory of Intelligence - April 2018