SlideShare une entreprise Scribd logo
1  sur  24
Proprietary and confidential. Do not distribute.
Rethinking computation:
A processor architecture for
machine intelligence
17 May 2016
Amir Khosrowshahi
Co-founder and CTO, Nervana
MAKING MACHINES SMARTER.™
Proprietary and confidential. Do not distribute.
ner va na
About nervana
2
• A platform for machine intelligence
• enable deep learning at scale
• optimized from algorithms to silicon
X
Proprietary and confidential. Do not distribute.
ner va na
Model and substrate for computation
3
Functional model
? ?
Machine learning modelMammalian cortex
Hard!
Proprietary and confidential. Do not distribute.
ner va na
Model and substrate for computation
4
Custom ASIC Deep learning model
• Model description language
• Hardware abstraction layer
• Distributed primitives
• Compilers, drivers
Feasible, but still hard.Do this instead:
Proprietary and confidential. Do not distribute.
ner va na
Application areas
5
Healthcare Agriculture Finance
Online Services Automotive Energy
Proprietary and confidential. Do not distribute.
ner va na
nervana cloud
6
Images
Text
Tabular
Speech
Time series
Video
Data
import trainbuild deploy
Cloud
Proprietary and confidential. Do not distribute.
ner va na
Deep learning as a core technology
7
DL
Photos
Maps
Voice
Search
Self-driving
car
Ad
Targeting
Machine
Translation
‘Google Brain’ model
DL
Image
Classification
Object
Localization
Video
Indexing
Speech
Recognition
Nervana Platform
Natural
Language
Proprietary and confidential. Do not distribute.
ner va na
nervana neon
8
• Fastest library
Proprietary and confidential. Do not distribute.
ner va na
nervana neon
8
• Fastest library
Proprietary and confidential. Do not distribute.
ner va na
nervana neon
8
• Fastest library
• Model support Models
• Convnet
• RNN, LSTM
• MLP
• DQN
• NTM
Domains
• Images
• Video
• Speech
• Text
• Time series
Proprietary and confidential. Do not distribute.
ner va na
Running locally:
% python rnn.py # or neon rnn.yaml
Running in nervana cloud:
% ncloud submit —py rnn.py # or —yaml rnn.yaml
% ncloud show <model_id>
% ncloud list
% ncloud deploy <model_id>
% ncloud predict <model_id> <data> # or use REST api
nervana neon
8
• Fastest library
• Model support
• Cloud integration
Proprietary and confidential. Do not distribute.
ner va na
Backends
• CPU
• GPU
• Multiple GPUs
• Parameter server
• (Xeon Phi)
• nervana TPU
nervana neon
8
• Fastest library
• Model support
• Cloud integration
• Multiple backends
Proprietary and confidential. Do not distribute.
ner va na
nervana neon
8
• Fastest library
• Model support
• Cloud integration
• Multiple backends
• Optimized at assembler level
Proprietary and confidential. Do not distribute.
ner va na
=1
nervana
engine
@200 watts
10 GPUs
@2000 watts
200 CPUs
@20,000 watts
nervana tensor processing unit (TPU)
9
• Unprecedented compute density
Proprietary and confidential. Do not distribute.
ner va na
nervana tensor processing unit (TPU)
9
• Unprecedented compute density
• Scalable distributed architecture
nn
n n
nn
nn
Proprietary and confidential. Do not distribute.
ner va na
Instruction
and data
memory
Ctrl
ALU
CPU
Data
Memory
Ctrl
Nervana
nervana tensor processing unit (TPU)
9
• Unprecedented compute density
• Scalable distributed architecture
• Memory near computation
Proprietary and confidential. Do not distribute.
ner va na
nervana tensor processing unit (TPU)
9
• Unprecedented compute density
• Scalable distributed architecture
• Memory near computation
• Learning and inference
• Exploit limited precision
• Incorporate latest advances
• Power efficiency
Proprietary and confidential. Do not distribute.
ner va na
• 10-100x gain
• Architecture optimized for
algorithm
nervana tensor processing unit (TPU)
9
• Unprecedented compute density
• Scalable distributed architecture
• Memory near computation
• Learning and inference
• Exploit limited precision
• Incorporate latest advances
• Power efficiency
Proprietary and confidential. Do not distribute.
ner va na
General purpose computation
10
2000s: SoC
Motivation: reduce power
and cost, fungible
computing.
Enabled inexpensive
mobile devices.
Proprietary and confidential. Do not distribute.
ner va na
Dennard scaling has ended
11
What’s next?
Transistors
Clock speed
Power
Perf / clock
Proprietary and confidential. Do not distribute.
ner va na
Many-core tiled architectures
12
Tile Processor Architecture Overview for the TILEPro Series 5
and provides high bandwidth and extremely low latency communication among tiles. The Tile
Processor™ integrates external memory and I/O interfaces on chip and is a complete programma-
ble multicore processor. External memory and I/O interfaces are connected to the tiles via the
iMesh interconnect.
Figure 2-1 shows the 64-core TILEPro64™ Tile processor with details of an individual tile’s
structure.
Figure 2-1. Tile Processor Hardware Architecture
Each tile is a powerful, full-featured computing system that can independently run an entire oper-
ating system, such as Linux. Each tile implements a 32-bit integer processor engine utilizing a
three-way Very Long Instruction Word (VLIW) architecture with its own program counter (PC),
cache, and DMA subsystem. An individual tile is capable of executing up to three operations per
cycle.
CDN
TDN
IDN
MDN
STN
UDN
1,1 6,1
3,2 4,2 5,2 6,2 7,2
XAUI
(10GbE)
TDN
IDN
MDN
STN
UDN
LEGEND:
Tile Detail
port2
msh0
port0
port2 port1 port0
DDR2
DDR2
port0
msh1
port2
port0 port1 port2
DDR2
DDR2
RGMII
(GbE)
XAUI
(10GbE)
FlexI/O
PCIe
(x4 lane)
I2C, JTAG,
HPI, UART,
SPI ROM
FlexI/O
PCIe
(x4 lane)
port1 port1
msh3 msh2
port2
msh0
port0
port2 port1 port0
port0
msh1
port2
port0 port1 port2
port1 port1
msh3 msh2
gpio1
port0
port1
port1
port0
port1
xgbe0
gbe0
xgbe1
port0
gpio1
port1
port0
port1
gbe1
port0
port1
xgbe0
xgbe1
port0
0,3 1,3 2,3 3,3 4,3 5,3 6,3 7,3
0,5 1,5 2,5 3,5 4,5 5,5 6,5 7,5
0,6 1,6 2,6 3,6 4,6 5,6 6,6 7,6
0,7 1,7 2,7 3,7 4,7 5,7 6,7 7,7
7,00,0 1,0 2,0 3,0 4,0 5,0 6,0
0,1 1,1 6,12,1 3,1 4,1 5,1 7,1
3,2 4,2 5,2 6,2 7,20,2 1,2 2,2
0,4 1,4 2,4 3,4 4,4 5,4 6,4 7,4
port0
7,0
port0
pcie0
port0
port1
rshim0
gpio0
pcie1
port0
port1
pcie0
port0
port1
rshim0
gpio0
pcie1
port0
port1
Switch
Engine
Cache
Engine
Processor
Engine
U
D
N
S
T
N
M
D
N
I
D
N
T
D
N
C
D
N
U
D
N
S
T
N
M
D
N
I
D
N
T
D
N
C
D
N
STNSTN
TDNTDN
IDNIDN
MDNMDN
UDNUDN
CDNCDN
2010s: multi-core, GPGPU
Motivation: increased
performance without clock
rate increase or smaller
devices.
Requires changes in
programming paradigm.
NVIDIA GM204Tilera
Intel Xeon Phi
Knight’s landing
Proprietary and confidential. Do not distribute.
ner va na
Special purpose computation: Anton
13
flex
(b)(a)
flex flex flex
flex flex flex flex
flex flex flex flex
HTIS HTIS
flex flex flex flexX+
X+
Y-
Y+
Z-
Z+
HOST LA
X-
X-
Y-
Y+
Z-
Z+
(c)
(a) The Anton 2 ASICs are directly connected by high-speed channels to form a three-dimensional torus topology. (b) Schematic view of an Anton
contains 2 connections to each neighbor in the torus topology, 16 flexible subsystem (“flex”) tiles, 2 high-throughput interaction subsystem (HTIS) t
erface (HOST), and an on-die logic analyzer (LA). (c) Physical layout of a 20.4 mm × 20 mm Anton 2 ASIC implemented in 40-nm technology. On
(Shaw et al., 2014)
Proprietary and confidential. Do not distribute.
ner va na
Computational motifs
14
Motif Examples
1 Dense linear algebra Matrix multiply (GEMM)
2 Sparse linear algebra SpMV
3 Spectral methods FFT
4 N-Body methods Molecular dynamics
5 Structured grids Lattice Boltzmann
6 Unstructured grids CFD
7 Map-Reduce Expectation
maximization8 Combinational logic Encryption, hashing
9 Graph traversal Decision trees, quicksort
10 Dynamic programming Forward-backward
11 Bactrack, branch and bound Constraint satisfaction
12 Graphical models HMM, Bayesian
networks13 Finite state machines Compilers
(Asanovic et al., 2006)
• Silicon
• Software
• Neural network
architectures!
Can be implemented using:
Proprietary and confidential. Do not distribute.
ner va na
Summary
15
• Computers are tools for solving problems of their time
• Was: Coding, calculation, graphics, web
• Today: Learning and Inference on data
• Deep learning as a computational paradigm
• Custom architecture can do vastly better
• We are hiring! Summer interns and full time.

Contenu connexe

Tendances

Introduction to deep learning @ Startup.ML by Andres Rodriguez
Introduction to deep learning @ Startup.ML by Andres RodriguezIntroduction to deep learning @ Startup.ML by Andres Rodriguez
Introduction to deep learning @ Startup.ML by Andres RodriguezIntel Nervana
 
Deep Learning for Robotics
Deep Learning for RoboticsDeep Learning for Robotics
Deep Learning for RoboticsIntel Nervana
 
Using neon for pattern recognition in audio data
Using neon for pattern recognition in audio dataUsing neon for pattern recognition in audio data
Using neon for pattern recognition in audio dataIntel Nervana
 
NVIDIA 深度學習教育機構 (DLI): Approaches to object detection
NVIDIA 深度學習教育機構 (DLI): Approaches to object detectionNVIDIA 深度學習教育機構 (DLI): Approaches to object detection
NVIDIA 深度學習教育機構 (DLI): Approaches to object detectionNVIDIA Taiwan
 
RE-Work Deep Learning Summit - September 2016
RE-Work Deep Learning Summit - September 2016RE-Work Deep Learning Summit - September 2016
RE-Work Deep Learning Summit - September 2016Intel Nervana
 
Introduction to Deep Learning with Will Constable
Introduction to Deep Learning with Will ConstableIntroduction to Deep Learning with Will Constable
Introduction to Deep Learning with Will ConstableIntel Nervana
 
A Platform for Accelerating Machine Learning Applications
 A Platform for Accelerating Machine Learning Applications A Platform for Accelerating Machine Learning Applications
A Platform for Accelerating Machine Learning ApplicationsNVIDIA Taiwan
 
Squeezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile PhonesSqueezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile PhonesAnirudh Koul
 
Intel Nervana Artificial Intelligence Meetup 1/31/17
Intel Nervana Artificial Intelligence Meetup 1/31/17Intel Nervana Artificial Intelligence Meetup 1/31/17
Intel Nervana Artificial Intelligence Meetup 1/31/17Intel Nervana
 
Nervana Systems
Nervana SystemsNervana Systems
Nervana SystemsNand Dalal
 
Improving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN ApplicationsImproving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN ApplicationsChester Chen
 
Introduction to Deep Learning (NVIDIA)
Introduction to Deep Learning (NVIDIA)Introduction to Deep Learning (NVIDIA)
Introduction to Deep Learning (NVIDIA)Rakuten Group, Inc.
 
Anil Thomas - Object recognition
Anil Thomas - Object recognitionAnil Thomas - Object recognition
Anil Thomas - Object recognitionIntel Nervana
 
Deep Learning Computer Build
Deep Learning Computer BuildDeep Learning Computer Build
Deep Learning Computer BuildPetteriTeikariPhD
 
NVIDIA深度學習教育機構 (DLI): Object detection with jetson
NVIDIA深度學習教育機構 (DLI): Object detection with jetsonNVIDIA深度學習教育機構 (DLI): Object detection with jetson
NVIDIA深度學習教育機構 (DLI): Object detection with jetsonNVIDIA Taiwan
 
Recent developments in Deep Learning
Recent developments in Deep LearningRecent developments in Deep Learning
Recent developments in Deep LearningBrahim HAMADICHAREF
 
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from IntelEdge AI and Vision Alliance
 
Deep Learning Frameworks Using Spark on YARN by Vartika Singh
Deep Learning Frameworks Using Spark on YARN by Vartika SinghDeep Learning Frameworks Using Spark on YARN by Vartika Singh
Deep Learning Frameworks Using Spark on YARN by Vartika SinghData Con LA
 
Mastering Computer Vision Problems with State-of-the-art Deep Learning
Mastering Computer Vision Problems with State-of-the-art Deep LearningMastering Computer Vision Problems with State-of-the-art Deep Learning
Mastering Computer Vision Problems with State-of-the-art Deep LearningMiguel González-Fierro
 
Deep Learning with Microsoft R Open
Deep Learning with Microsoft R OpenDeep Learning with Microsoft R Open
Deep Learning with Microsoft R OpenPoo Kuan Hoong
 

Tendances (20)

Introduction to deep learning @ Startup.ML by Andres Rodriguez
Introduction to deep learning @ Startup.ML by Andres RodriguezIntroduction to deep learning @ Startup.ML by Andres Rodriguez
Introduction to deep learning @ Startup.ML by Andres Rodriguez
 
Deep Learning for Robotics
Deep Learning for RoboticsDeep Learning for Robotics
Deep Learning for Robotics
 
Using neon for pattern recognition in audio data
Using neon for pattern recognition in audio dataUsing neon for pattern recognition in audio data
Using neon for pattern recognition in audio data
 
NVIDIA 深度學習教育機構 (DLI): Approaches to object detection
NVIDIA 深度學習教育機構 (DLI): Approaches to object detectionNVIDIA 深度學習教育機構 (DLI): Approaches to object detection
NVIDIA 深度學習教育機構 (DLI): Approaches to object detection
 
RE-Work Deep Learning Summit - September 2016
RE-Work Deep Learning Summit - September 2016RE-Work Deep Learning Summit - September 2016
RE-Work Deep Learning Summit - September 2016
 
Introduction to Deep Learning with Will Constable
Introduction to Deep Learning with Will ConstableIntroduction to Deep Learning with Will Constable
Introduction to Deep Learning with Will Constable
 
A Platform for Accelerating Machine Learning Applications
 A Platform for Accelerating Machine Learning Applications A Platform for Accelerating Machine Learning Applications
A Platform for Accelerating Machine Learning Applications
 
Squeezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile PhonesSqueezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile Phones
 
Intel Nervana Artificial Intelligence Meetup 1/31/17
Intel Nervana Artificial Intelligence Meetup 1/31/17Intel Nervana Artificial Intelligence Meetup 1/31/17
Intel Nervana Artificial Intelligence Meetup 1/31/17
 
Nervana Systems
Nervana SystemsNervana Systems
Nervana Systems
 
Improving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN ApplicationsImproving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN Applications
 
Introduction to Deep Learning (NVIDIA)
Introduction to Deep Learning (NVIDIA)Introduction to Deep Learning (NVIDIA)
Introduction to Deep Learning (NVIDIA)
 
Anil Thomas - Object recognition
Anil Thomas - Object recognitionAnil Thomas - Object recognition
Anil Thomas - Object recognition
 
Deep Learning Computer Build
Deep Learning Computer BuildDeep Learning Computer Build
Deep Learning Computer Build
 
NVIDIA深度學習教育機構 (DLI): Object detection with jetson
NVIDIA深度學習教育機構 (DLI): Object detection with jetsonNVIDIA深度學習教育機構 (DLI): Object detection with jetson
NVIDIA深度學習教育機構 (DLI): Object detection with jetson
 
Recent developments in Deep Learning
Recent developments in Deep LearningRecent developments in Deep Learning
Recent developments in Deep Learning
 
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
 
Deep Learning Frameworks Using Spark on YARN by Vartika Singh
Deep Learning Frameworks Using Spark on YARN by Vartika SinghDeep Learning Frameworks Using Spark on YARN by Vartika Singh
Deep Learning Frameworks Using Spark on YARN by Vartika Singh
 
Mastering Computer Vision Problems with State-of-the-art Deep Learning
Mastering Computer Vision Problems with State-of-the-art Deep LearningMastering Computer Vision Problems with State-of-the-art Deep Learning
Mastering Computer Vision Problems with State-of-the-art Deep Learning
 
Deep Learning with Microsoft R Open
Deep Learning with Microsoft R OpenDeep Learning with Microsoft R Open
Deep Learning with Microsoft R Open
 

En vedette

Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...
Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...
Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...Intel Nervana
 
High-Performance GPU Programming for Deep Learning
High-Performance GPU Programming for Deep LearningHigh-Performance GPU Programming for Deep Learning
High-Performance GPU Programming for Deep LearningIntel Nervana
 
Mainline Functional Verification of IBM's POWER7 Processor Core
Mainline Functional Verification of IBM's POWER7 Processor CoreMainline Functional Verification of IBM's POWER7 Processor Core
Mainline Functional Verification of IBM's POWER7 Processor CoreDVClub
 
Video Activity Recognition and NLP Q&A Model Example
Video Activity Recognition and NLP Q&A Model ExampleVideo Activity Recognition and NLP Q&A Model Example
Video Activity Recognition and NLP Q&A Model ExampleIntel Nervana
 
An Analysis of Convolution for Inference
An Analysis of Convolution for InferenceAn Analysis of Convolution for Inference
An Analysis of Convolution for InferenceIntel Nervana
 
Nervana AI Overview Deck April 2016
Nervana AI Overview Deck April 2016Nervana AI Overview Deck April 2016
Nervana AI Overview Deck April 2016Sean Everett
 
Deepcheck, 딥러닝 기반의 얼굴인식 출석체크
Deepcheck, 딥러닝 기반의 얼굴인식 출석체크Deepcheck, 딥러닝 기반의 얼굴인식 출석체크
Deepcheck, 딥러닝 기반의 얼굴인식 출석체크지운 배
 
Object Detection and Recognition
Object Detection and Recognition Object Detection and Recognition
Object Detection and Recognition Intel Nervana
 
AWS CLOUD 2017 - AWS 신규 서비스를 통해 본 클라우드의 미래 (김봉환 솔루션즈 아키텍트)
AWS CLOUD 2017 - AWS 신규 서비스를 통해 본 클라우드의 미래 (김봉환 솔루션즈 아키텍트)AWS CLOUD 2017 - AWS 신규 서비스를 통해 본 클라우드의 미래 (김봉환 솔루션즈 아키텍트)
AWS CLOUD 2017 - AWS 신규 서비스를 통해 본 클라우드의 미래 (김봉환 솔루션즈 아키텍트)Amazon Web Services Korea
 
AWS CLOUD 2017 - 인공 지능과 클라우드와의 만남: Amazon의 신규 AI 서비스 (김무현 솔루션즈 아키텍트)
AWS CLOUD 2017 - 인공 지능과 클라우드와의 만남: Amazon의 신규 AI 서비스 (김무현 솔루션즈 아키텍트)AWS CLOUD 2017 - 인공 지능과 클라우드와의 만남: Amazon의 신규 AI 서비스 (김무현 솔루션즈 아키텍트)
AWS CLOUD 2017 - 인공 지능과 클라우드와의 만남: Amazon의 신규 AI 서비스 (김무현 솔루션즈 아키텍트)Amazon Web Services Korea
 
Machine Translation Introduction
Machine Translation IntroductionMachine Translation Introduction
Machine Translation Introductionnlab_utokyo
 

En vedette (13)

Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...
Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...
Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...
 
High-Performance GPU Programming for Deep Learning
High-Performance GPU Programming for Deep LearningHigh-Performance GPU Programming for Deep Learning
High-Performance GPU Programming for Deep Learning
 
Mainline Functional Verification of IBM's POWER7 Processor Core
Mainline Functional Verification of IBM's POWER7 Processor CoreMainline Functional Verification of IBM's POWER7 Processor Core
Mainline Functional Verification of IBM's POWER7 Processor Core
 
Video Activity Recognition and NLP Q&A Model Example
Video Activity Recognition and NLP Q&A Model ExampleVideo Activity Recognition and NLP Q&A Model Example
Video Activity Recognition and NLP Q&A Model Example
 
An Analysis of Convolution for Inference
An Analysis of Convolution for InferenceAn Analysis of Convolution for Inference
An Analysis of Convolution for Inference
 
Region Of Interest Extraction
Region Of Interest ExtractionRegion Of Interest Extraction
Region Of Interest Extraction
 
Nervana AI Overview Deck April 2016
Nervana AI Overview Deck April 2016Nervana AI Overview Deck April 2016
Nervana AI Overview Deck April 2016
 
Deepcheck, 딥러닝 기반의 얼굴인식 출석체크
Deepcheck, 딥러닝 기반의 얼굴인식 출석체크Deepcheck, 딥러닝 기반의 얼굴인식 출석체크
Deepcheck, 딥러닝 기반의 얼굴인식 출석체크
 
Deep Learning for Computer Vision: Attention Models (UPC 2016)
Deep Learning for Computer Vision: Attention Models (UPC 2016)Deep Learning for Computer Vision: Attention Models (UPC 2016)
Deep Learning for Computer Vision: Attention Models (UPC 2016)
 
Object Detection and Recognition
Object Detection and Recognition Object Detection and Recognition
Object Detection and Recognition
 
AWS CLOUD 2017 - AWS 신규 서비스를 통해 본 클라우드의 미래 (김봉환 솔루션즈 아키텍트)
AWS CLOUD 2017 - AWS 신규 서비스를 통해 본 클라우드의 미래 (김봉환 솔루션즈 아키텍트)AWS CLOUD 2017 - AWS 신규 서비스를 통해 본 클라우드의 미래 (김봉환 솔루션즈 아키텍트)
AWS CLOUD 2017 - AWS 신규 서비스를 통해 본 클라우드의 미래 (김봉환 솔루션즈 아키텍트)
 
AWS CLOUD 2017 - 인공 지능과 클라우드와의 만남: Amazon의 신규 AI 서비스 (김무현 솔루션즈 아키텍트)
AWS CLOUD 2017 - 인공 지능과 클라우드와의 만남: Amazon의 신규 AI 서비스 (김무현 솔루션즈 아키텍트)AWS CLOUD 2017 - 인공 지능과 클라우드와의 만남: Amazon의 신규 AI 서비스 (김무현 솔루션즈 아키텍트)
AWS CLOUD 2017 - 인공 지능과 클라우드와의 만남: Amazon의 신규 AI 서비스 (김무현 솔루션즈 아키텍트)
 
Machine Translation Introduction
Machine Translation IntroductionMachine Translation Introduction
Machine Translation Introduction
 

Similaire à Rethinking computation: A processor architecture for machine intelligence

A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networksinside-BigData.com
 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersRyousei Takano
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageMayaData Inc
 
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2Tyrone Systems
 
SoC Solutions Enabling Server-Based Networking
SoC Solutions Enabling Server-Based NetworkingSoC Solutions Enabling Server-Based Networking
SoC Solutions Enabling Server-Based NetworkingNetronome
 
SDVIs and In-Situ Visualization on TACC's Stampede
SDVIs and In-Situ Visualization on TACC's StampedeSDVIs and In-Situ Visualization on TACC's Stampede
SDVIs and In-Situ Visualization on TACC's StampedeIntel® Software
 
Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors
Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors
Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors Michelle Holley
 
Hardware architecture of Summit Supercomputer
 Hardware architecture of Summit Supercomputer Hardware architecture of Summit Supercomputer
Hardware architecture of Summit SupercomputerVigneshwarRamaswamy
 
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedData Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedHostedbyConfluent
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale SupercomputerSagar Dolas
 
NUMA-aware thread-parallel breadth-first search for Graph500 and Green Graph5...
NUMA-aware thread-parallel breadth-first search for Graph500 and Green Graph5...NUMA-aware thread-parallel breadth-first search for Graph500 and Green Graph5...
NUMA-aware thread-parallel breadth-first search for Graph500 and Green Graph5...Yuichiro Yasui
 
FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning Dr. Swaminathan Kathirvel
 
HiPEAC 2022_Marco Tassemeier presentation
HiPEAC 2022_Marco Tassemeier presentationHiPEAC 2022_Marco Tassemeier presentation
HiPEAC 2022_Marco Tassemeier presentationVEDLIoT Project
 

Similaire à Rethinking computation: A processor architecture for machine intelligence (20)

A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networks
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
Current Trends in HPC
Current Trends in HPCCurrent Trends in HPC
Current Trends in HPC
 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computers
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
 
PF_DIRECT@TMA12
PF_DIRECT@TMA12PF_DIRECT@TMA12
PF_DIRECT@TMA12
 
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
 
What is 3d torus
What is 3d torusWhat is 3d torus
What is 3d torus
 
RISC V in Spacer
RISC V in SpacerRISC V in Spacer
RISC V in Spacer
 
SoC Solutions Enabling Server-Based Networking
SoC Solutions Enabling Server-Based NetworkingSoC Solutions Enabling Server-Based Networking
SoC Solutions Enabling Server-Based Networking
 
SDVIs and In-Situ Visualization on TACC's Stampede
SDVIs and In-Situ Visualization on TACC's StampedeSDVIs and In-Situ Visualization on TACC's Stampede
SDVIs and In-Situ Visualization on TACC's Stampede
 
GIST AI-X Computing Cluster
GIST AI-X Computing ClusterGIST AI-X Computing Cluster
GIST AI-X Computing Cluster
 
Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors
Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors
Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors
 
Hardware architecture of Summit Supercomputer
 Hardware architecture of Summit Supercomputer Hardware architecture of Summit Supercomputer
Hardware architecture of Summit Supercomputer
 
chameleon chip
chameleon chipchameleon chip
chameleon chip
 
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedData Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
 
NUMA-aware thread-parallel breadth-first search for Graph500 and Green Graph5...
NUMA-aware thread-parallel breadth-first search for Graph500 and Green Graph5...NUMA-aware thread-parallel breadth-first search for Graph500 and Green Graph5...
NUMA-aware thread-parallel breadth-first search for Graph500 and Green Graph5...
 
FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning
 
HiPEAC 2022_Marco Tassemeier presentation
HiPEAC 2022_Marco Tassemeier presentationHiPEAC 2022_Marco Tassemeier presentation
HiPEAC 2022_Marco Tassemeier presentation
 

Dernier

Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 

Dernier (20)

Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 

Rethinking computation: A processor architecture for machine intelligence

  • 1. Proprietary and confidential. Do not distribute. Rethinking computation: A processor architecture for machine intelligence 17 May 2016 Amir Khosrowshahi Co-founder and CTO, Nervana MAKING MACHINES SMARTER.™
  • 2. Proprietary and confidential. Do not distribute. ner va na About nervana 2 • A platform for machine intelligence • enable deep learning at scale • optimized from algorithms to silicon X
  • 3. Proprietary and confidential. Do not distribute. ner va na Model and substrate for computation 3 Functional model ? ? Machine learning modelMammalian cortex Hard!
  • 4. Proprietary and confidential. Do not distribute. ner va na Model and substrate for computation 4 Custom ASIC Deep learning model • Model description language • Hardware abstraction layer • Distributed primitives • Compilers, drivers Feasible, but still hard.Do this instead:
  • 5. Proprietary and confidential. Do not distribute. ner va na Application areas 5 Healthcare Agriculture Finance Online Services Automotive Energy
  • 6. Proprietary and confidential. Do not distribute. ner va na nervana cloud 6 Images Text Tabular Speech Time series Video Data import trainbuild deploy Cloud
  • 7. Proprietary and confidential. Do not distribute. ner va na Deep learning as a core technology 7 DL Photos Maps Voice Search Self-driving car Ad Targeting Machine Translation ‘Google Brain’ model DL Image Classification Object Localization Video Indexing Speech Recognition Nervana Platform Natural Language
  • 8. Proprietary and confidential. Do not distribute. ner va na nervana neon 8 • Fastest library
  • 9. Proprietary and confidential. Do not distribute. ner va na nervana neon 8 • Fastest library
  • 10. Proprietary and confidential. Do not distribute. ner va na nervana neon 8 • Fastest library • Model support Models • Convnet • RNN, LSTM • MLP • DQN • NTM Domains • Images • Video • Speech • Text • Time series
  • 11. Proprietary and confidential. Do not distribute. ner va na Running locally: % python rnn.py # or neon rnn.yaml Running in nervana cloud: % ncloud submit —py rnn.py # or —yaml rnn.yaml % ncloud show <model_id> % ncloud list % ncloud deploy <model_id> % ncloud predict <model_id> <data> # or use REST api nervana neon 8 • Fastest library • Model support • Cloud integration
  • 12. Proprietary and confidential. Do not distribute. ner va na Backends • CPU • GPU • Multiple GPUs • Parameter server • (Xeon Phi) • nervana TPU nervana neon 8 • Fastest library • Model support • Cloud integration • Multiple backends
  • 13. Proprietary and confidential. Do not distribute. ner va na nervana neon 8 • Fastest library • Model support • Cloud integration • Multiple backends • Optimized at assembler level
  • 14. Proprietary and confidential. Do not distribute. ner va na =1 nervana engine @200 watts 10 GPUs @2000 watts 200 CPUs @20,000 watts nervana tensor processing unit (TPU) 9 • Unprecedented compute density
  • 15. Proprietary and confidential. Do not distribute. ner va na nervana tensor processing unit (TPU) 9 • Unprecedented compute density • Scalable distributed architecture nn n n nn nn
  • 16. Proprietary and confidential. Do not distribute. ner va na Instruction and data memory Ctrl ALU CPU Data Memory Ctrl Nervana nervana tensor processing unit (TPU) 9 • Unprecedented compute density • Scalable distributed architecture • Memory near computation
  • 17. Proprietary and confidential. Do not distribute. ner va na nervana tensor processing unit (TPU) 9 • Unprecedented compute density • Scalable distributed architecture • Memory near computation • Learning and inference • Exploit limited precision • Incorporate latest advances • Power efficiency
  • 18. Proprietary and confidential. Do not distribute. ner va na • 10-100x gain • Architecture optimized for algorithm nervana tensor processing unit (TPU) 9 • Unprecedented compute density • Scalable distributed architecture • Memory near computation • Learning and inference • Exploit limited precision • Incorporate latest advances • Power efficiency
  • 19. Proprietary and confidential. Do not distribute. ner va na General purpose computation 10 2000s: SoC Motivation: reduce power and cost, fungible computing. Enabled inexpensive mobile devices.
  • 20. Proprietary and confidential. Do not distribute. ner va na Dennard scaling has ended 11 What’s next? Transistors Clock speed Power Perf / clock
  • 21. Proprietary and confidential. Do not distribute. ner va na Many-core tiled architectures 12 Tile Processor Architecture Overview for the TILEPro Series 5 and provides high bandwidth and extremely low latency communication among tiles. The Tile Processor™ integrates external memory and I/O interfaces on chip and is a complete programma- ble multicore processor. External memory and I/O interfaces are connected to the tiles via the iMesh interconnect. Figure 2-1 shows the 64-core TILEPro64™ Tile processor with details of an individual tile’s structure. Figure 2-1. Tile Processor Hardware Architecture Each tile is a powerful, full-featured computing system that can independently run an entire oper- ating system, such as Linux. Each tile implements a 32-bit integer processor engine utilizing a three-way Very Long Instruction Word (VLIW) architecture with its own program counter (PC), cache, and DMA subsystem. An individual tile is capable of executing up to three operations per cycle. CDN TDN IDN MDN STN UDN 1,1 6,1 3,2 4,2 5,2 6,2 7,2 XAUI (10GbE) TDN IDN MDN STN UDN LEGEND: Tile Detail port2 msh0 port0 port2 port1 port0 DDR2 DDR2 port0 msh1 port2 port0 port1 port2 DDR2 DDR2 RGMII (GbE) XAUI (10GbE) FlexI/O PCIe (x4 lane) I2C, JTAG, HPI, UART, SPI ROM FlexI/O PCIe (x4 lane) port1 port1 msh3 msh2 port2 msh0 port0 port2 port1 port0 port0 msh1 port2 port0 port1 port2 port1 port1 msh3 msh2 gpio1 port0 port1 port1 port0 port1 xgbe0 gbe0 xgbe1 port0 gpio1 port1 port0 port1 gbe1 port0 port1 xgbe0 xgbe1 port0 0,3 1,3 2,3 3,3 4,3 5,3 6,3 7,3 0,5 1,5 2,5 3,5 4,5 5,5 6,5 7,5 0,6 1,6 2,6 3,6 4,6 5,6 6,6 7,6 0,7 1,7 2,7 3,7 4,7 5,7 6,7 7,7 7,00,0 1,0 2,0 3,0 4,0 5,0 6,0 0,1 1,1 6,12,1 3,1 4,1 5,1 7,1 3,2 4,2 5,2 6,2 7,20,2 1,2 2,2 0,4 1,4 2,4 3,4 4,4 5,4 6,4 7,4 port0 7,0 port0 pcie0 port0 port1 rshim0 gpio0 pcie1 port0 port1 pcie0 port0 port1 rshim0 gpio0 pcie1 port0 port1 Switch Engine Cache Engine Processor Engine U D N S T N M D N I D N T D N C D N U D N S T N M D N I D N T D N C D N STNSTN TDNTDN IDNIDN MDNMDN UDNUDN CDNCDN 2010s: multi-core, GPGPU Motivation: increased performance without clock rate increase or smaller devices. Requires changes in programming paradigm. NVIDIA GM204Tilera Intel Xeon Phi Knight’s landing
  • 22. Proprietary and confidential. Do not distribute. ner va na Special purpose computation: Anton 13 flex (b)(a) flex flex flex flex flex flex flex flex flex flex flex HTIS HTIS flex flex flex flexX+ X+ Y- Y+ Z- Z+ HOST LA X- X- Y- Y+ Z- Z+ (c) (a) The Anton 2 ASICs are directly connected by high-speed channels to form a three-dimensional torus topology. (b) Schematic view of an Anton contains 2 connections to each neighbor in the torus topology, 16 flexible subsystem (“flex”) tiles, 2 high-throughput interaction subsystem (HTIS) t erface (HOST), and an on-die logic analyzer (LA). (c) Physical layout of a 20.4 mm × 20 mm Anton 2 ASIC implemented in 40-nm technology. On (Shaw et al., 2014)
  • 23. Proprietary and confidential. Do not distribute. ner va na Computational motifs 14 Motif Examples 1 Dense linear algebra Matrix multiply (GEMM) 2 Sparse linear algebra SpMV 3 Spectral methods FFT 4 N-Body methods Molecular dynamics 5 Structured grids Lattice Boltzmann 6 Unstructured grids CFD 7 Map-Reduce Expectation maximization8 Combinational logic Encryption, hashing 9 Graph traversal Decision trees, quicksort 10 Dynamic programming Forward-backward 11 Bactrack, branch and bound Constraint satisfaction 12 Graphical models HMM, Bayesian networks13 Finite state machines Compilers (Asanovic et al., 2006) • Silicon • Software • Neural network architectures! Can be implemented using:
  • 24. Proprietary and confidential. Do not distribute. ner va na Summary 15 • Computers are tools for solving problems of their time • Was: Coding, calculation, graphics, web • Today: Learning and Inference on data • Deep learning as a computational paradigm • Custom architecture can do vastly better • We are hiring! Summer interns and full time.