The main focus of the Custom Computing research group from Imperial College London is hardware acceleration for a range of applications such as finance, genomics, energy, image recognition and mathematical optimisation. We present how different machine learning techniques are employed in the research world in an attempt to try and provide cutting-edge solutions to a multitude of industrial applications. It is important to emphasise the fact that machine learning is becoming such an important part of the global research community, with a strong presence in many research groups, regardless of their primary areas of expertise.
3. Next Generation Computing
• Existing computers:
- Slow
- Power hungry
- Complex to implement applications
• Our focus: custom computing
- Customise hardware/software to applications
- Enhance design quality and designer productivity
• Research strategy:
- FPGA: reconfigurable acceleration
- DFE: data flow engine = FPGA + memory + dataflow
2
4. Our Hardware Devices
• 3 MPCX nodes with 8 cards each with Stratix V FPGA. Each card has
a measured main memory throughput of 65GB/s :
1.5 TB/s of potential access
130PB/day of potential processing
• 10 FPGAs (e.g. : Altera Stratix V, Xilinx Virtex VI)
• 6 GPUs (e.g. : NVIDIA Tesla C2070 GPU: 448 cores running at
1.15GHz; NVIDIA Kepler k40/k80)
At Imperial College London HPC center: https://wiki.imperial.ac.uk/display/HPC/Systems
ax4 (15 Tbytes of RAM & 1280 cores, 1.5PBytes fast raid
storage), cx2 (456 nodes & 5272 cores comprised of SGI Altix ICE
hardware), cx1 (1395 nodes & 13558 cores, 8 Nvidia K80, 4 Nvidia K40)
3
5. Financial simulation:
163 times faster1, 170 times less energy
Genomic analysis:
88 times faster, 3 times less energy
DFE: speed + energy efficiency
String MatchingLeast Square Monte Carlo Method
[1] Chow et al. (FPGA Conference,
2012)
[2] Arram et al. (FPGA Conference,
2015)
4
1faster than the equivalent single/multi-
core implementation
6. Climate modelling: 13 times faster1
DFE: speed + energy efficiency
5
Stencil Computation
Air traffic management:
17 times faster, 15 times less energy
Sequential Monte Carlo Method
[3] Russell et al. (FCCM Conference,
2015)
[4] Chau et al. (HEART Conference,
2013)1faster than the equivalent single/multi-
core implementation
7. DFE: speed + energy efficiency
Optimal architecture up to 47
times faster
(UoF Benchmark)
Iterative Sparse Linear Solvers
Computational
Fluid Dynamics
Power Systems
Simulation
6
[5] Grigoras et al.
(FPGA
Conference,
2016)
8. Machine Learning on DFEs
7
Multi Objective Machine Learning Optimizer
• Self-optimization of reconfigurable designs through automatic
analysis and adaptation of design parameters
• Can switch between a fast/power hungry design and a
relatively slow/low power alternative
• Uses:
- Gaussian Process Regression
- Support Vector Machine Classification
- Particle Swarm Optimization
[6] Kurek et al. (FCCM Conference,
2014)
9. Machine Learning on DFEs
Pipelined Genetic Propagation
Travelling salesman problem:
90 times faster
8
Neural Networks Simulation
Polychronous spiking neural network:
34 times faster1
[7] Cheung et al. (Frontiers in Neuroscience,
2016)
[8] Guo et al. (FCCM Conference,
2015)1faster than the equivalent single/multi-
core implementation
10. Incremental Support Vector Machine
Stock trading:
41 times faster1
One-class Support Vector Machine
Network anomaly detection:
6 times faster
Machine Learning on DFEs
9
[9] Shao et al. (FPT Conference,
2016)
[10] Bara et al. (FPT Conference,
2014)1faster than the equivalent single/multi-
core implementation
11. Machine Learning for
Financial Applications on
DFEs
Challenges:
• Quantity of data
• Speed of processing
• Accuracy of results
10
13. DFE Speedup over CPU
DFE: Maxeler Maia DFE, 8 customised computing units
CPU: Dual Intel Xeon E5-2640, 12 cores 20 times speedup
992 expressions
12
14. Capability from acceleration
3.5x higher returns
20x speedup
Financial institution:
means:
Regulators analyze:
20x more rules
Return
s
Data
Points 13
15. Machine Learning on DFEs:
Future Work
• Deep Boltzmann Machine for financial market
direction prediction
• Support Vector Machines for satellite image
classification
• Data analysis and clustering methods such as
DBSCAN
14
16. Summary
15
• FPGAs accelerate many machine learning applications:
- Genetic Programming for optimized trading strategies
- Incremental Support Vector Machine for stock trading
- Deep Boltzmann Machine for financial market direction
prediction
- Support Vector Machine for satellite image classification
• Tools to enhance designer productivity:
- Aid users without electronic design experience
- Ensure high quality implementation: speed, accuracy, energy
efficiency.
17. References
16
[1] Gary C.T. Chow, Anson H.T. Tse, Qiwei Jin, Wayne Luk, Philip H.W. Leong,
David B. Thomas, “A Mixed Precision Monte Carlo Methodology for
Reconfigurable Accelerator Systems”, FPGA 2012.
[2] James Arram, Wayne Luk, Peiyong Jiang, "Ramethy: reconfigurable
acceleration of Bisulfite sequence alignment", FPGA, 2015.
[3] Francis P. Russell, Peter D. Duben, Xinyu Niu, Wayne Luk, T. N. Palmer,
“Architectures and precision analysis for modelling atmospheric variables with
chaotic behaviour”, FCCM 2015.
[4] Thomas C.P. Chau, James Targett, Marlon Wijeyasinghe,
Wayne Luk, Peter Y.K. Cheung, Benjamin Cope, Alison Eele, Jan Maciejowski,
“Accelerating Sequential Monte Carlo Method for Real-time Air Traffic
Management”, HEART 2013.
[5] Paul Grigoras, Pavel Burovskiy, Wayne Luk, “CASK – Open-Source
Custom Architectures for Sparse Kernels”, FPGA 2016.
18. References
17
[6] Maciej Kurek, Tobias Becker, Thomas P. Chau, Wayne Luk, “Automating
Optimization of Reconfigurable Designs”, FCCM 2014.
[7] Kit Cheung, Simon R. Schultz, Wayne Luk, “NeuroFlow: A general purpose
spiking neural network simulation platform using customizable processors”,
Frontiers in Neuroscience, 2016.
[8] Guo, Liucheng, Ce Guo, David B. Thomas, and Wayne Luk. “Pipelined
Genetic Propagation”, FCCM 2015.
[9] Shengjia Shao, Oskar Mencer, Wayne Luk, "Dataflow design for optimal
incremental SVM training", FPT, 2016
[10] Andrei bara, Xinyu Niu, Wayne Luk, “A dataflow system for anomaly
detection and analysis”, FPT 2014.
[11] Andreea-Ingrid Funie, Paul Grigoras, Pavel Burovskiy, Wayne Luk, Mark
Salmon, “Reconfigurable acceleration of fitness evaluation in trading
strategies”, ASAP 2015.