BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and FPGA Based Hardware Acceleration

MACHINE LEARNING AND
FPGA-BASED HARDWARE
ACCELERATION
Andreea-Ingrid Funie
PhD Candidate, Imperial College
London
0

Custom-Computing Group
Head: Prof. Wayne
Luk
1
Field Programmable Gate
Array

Next Generation Computing
• Existing computers:
- Slow
- Power hungry
- Complex to implement applications
• Our focus: custom computing
- Customise hardware/software to applications
- Enhance design quality and designer productivity
• Research strategy:
- FPGA: reconfigurable acceleration
- DFE: data flow engine = FPGA + memory + dataflow
2

Our Hardware Devices
• 3 MPCX nodes with 8 cards each with Stratix V FPGA. Each card has
a measured main memory throughput of 65GB/s :
 1.5 TB/s of potential access
 130PB/day of potential processing
• 10 FPGAs (e.g. : Altera Stratix V, Xilinx Virtex VI)
• 6 GPUs (e.g. : NVIDIA Tesla C2070 GPU: 448 cores running at
1.15GHz; NVIDIA Kepler k40/k80)
At Imperial College London HPC center: https://wiki.imperial.ac.uk/display/HPC/Systems
 ax4 (15 Tbytes of RAM & 1280 cores, 1.5PBytes fast raid
storage), cx2 (456 nodes & 5272 cores comprised of SGI Altix ICE
hardware), cx1 (1395 nodes & 13558 cores, 8 Nvidia K80, 4 Nvidia K40)
3

Financial simulation:
163 times faster1, 170 times less energy
Genomic analysis:
88 times faster, 3 times less energy
DFE: speed + energy efficiency
String MatchingLeast Square Monte Carlo Method
[1] Chow et al. (FPGA Conference,
2012)
[2] Arram et al. (FPGA Conference,
2015)
4
1faster than the equivalent single/multi-
core implementation

Climate modelling: 13 times faster1
5
Stencil Computation
Air traffic management:
17 times faster, 15 times less energy
Sequential Monte Carlo Method
[3] Russell et al. (FCCM Conference,
2015)
[4] Chau et al. (HEART Conference,
2013)1faster than the equivalent single/multi-
core implementation

Optimal architecture up to 47
times faster
(UoF Benchmark)
Iterative Sparse Linear Solvers
Computational
Fluid Dynamics
Power Systems
Simulation
6
[5] Grigoras et al.
(FPGA
Conference,
2016)

Machine Learning on DFEs
7
Multi Objective Machine Learning Optimizer
• Self-optimization of reconfigurable designs through automatic
analysis and adaptation of design parameters
• Can switch between a fast/power hungry design and a
relatively slow/low power alternative
• Uses:
- Gaussian Process Regression
- Support Vector Machine Classification
- Particle Swarm Optimization
[6] Kurek et al. (FCCM Conference,
2014)

Pipelined Genetic Propagation
Travelling salesman problem:
90 times faster
8
Neural Networks Simulation
Polychronous spiking neural network:
34 times faster1
[7] Cheung et al. (Frontiers in Neuroscience,
2016)
[8] Guo et al. (FCCM Conference,
core implementation

Incremental Support Vector Machine
Stock trading:
41 times faster1
One-class Support Vector Machine
Network anomaly detection:
6 times faster
9
[9] Shao et al. (FPT Conference,
2016)
[10] Bara et al. (FPT Conference,
core implementation

Machine Learning for
Financial Applications on
DFEs
Challenges:
• Quantity of data
• Speed of processing
• Accuracy of results
10

Genetic Programming
for Trading
needs
acceleration
[11] Funie et al.
(ASAP Conference, 201
11

DFE Speedup over CPU
DFE: Maxeler Maia DFE, 8 customised computing units
CPU: Dual Intel Xeon E5-2640, 12 cores 20 times speedup
992 expressions
12

Capability from acceleration
3.5x higher returns
20x speedup
Financial institution:
means:
Regulators analyze:
20x more rules
Return
s
Data
Points 13

Machine Learning on DFEs:
Future Work
• Deep Boltzmann Machine for financial market
direction prediction
• Support Vector Machines for satellite image
classification
• Data analysis and clustering methods such as
DBSCAN
14

Summary
15
• FPGAs accelerate many machine learning applications:
- Genetic Programming for optimized trading strategies
- Incremental Support Vector Machine for stock trading
- Deep Boltzmann Machine for financial market direction
prediction
- Support Vector Machine for satellite image classification
• Tools to enhance designer productivity:
- Aid users without electronic design experience
- Ensure high quality implementation: speed, accuracy, energy
efficiency.

References
16
[1] Gary C.T. Chow, Anson H.T. Tse, Qiwei Jin, Wayne Luk, Philip H.W. Leong,
David B. Thomas, “A Mixed Precision Monte Carlo Methodology for
Reconfigurable Accelerator Systems”, FPGA 2012.
[2] James Arram, Wayne Luk, Peiyong Jiang, "Ramethy: reconfigurable
acceleration of Bisulfite sequence alignment", FPGA, 2015.
[3] Francis P. Russell, Peter D. Duben, Xinyu Niu, Wayne Luk, T. N. Palmer,
“Architectures and precision analysis for modelling atmospheric variables with
chaotic behaviour”, FCCM 2015.
[4] Thomas C.P. Chau, James Targett, Marlon Wijeyasinghe,
Wayne Luk, Peter Y.K. Cheung, Benjamin Cope, Alison Eele, Jan Maciejowski,
“Accelerating Sequential Monte Carlo Method for Real-time Air Traffic
Management”, HEART 2013.
[5] Paul Grigoras, Pavel Burovskiy, Wayne Luk, “CASK – Open-Source
Custom Architectures for Sparse Kernels”, FPGA 2016.

References
17
[6] Maciej Kurek, Tobias Becker, Thomas P. Chau, Wayne Luk, “Automating
Optimization of Reconfigurable Designs”, FCCM 2014.
[7] Kit Cheung, Simon R. Schultz, Wayne Luk, “NeuroFlow: A general purpose
spiking neural network simulation platform using customizable processors”,
Frontiers in Neuroscience, 2016.
[8] Guo, Liucheng, Ce Guo, David B. Thomas, and Wayne Luk. “Pipelined
Genetic Propagation”, FCCM 2015.
[9] Shengjia Shao, Oskar Mencer, Wayne Luk, "Dataflow design for optimal
incremental SVM training", FPT, 2016
[10] Andrei bara, Xinyu Niu, Wayne Luk, “A dataflow system for anomaly
detection and analysis”, FPT 2014.
[11] Andreea-Ingrid Funie, Paul Grigoras, Pavel Burovskiy, Wayne Luk, Mark
Salmon, “Reconfigurable acceleration of fitness evaluation in trading
strategies”, ASAP 2015.

BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and FPGA Based Hardware Acceleration

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (20)

Similaire à BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and FPGA Based Hardware Acceleration

Similaire à BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and FPGA Based Hardware Acceleration (20)

Plus de Big Data Week

Plus de Big Data Week (10)

Dernier

Dernier (20)

BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and FPGA Based Hardware Acceleration