Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

BlastFunctions: FPGA acceleration for serverless computing

The increasing need for on-demand, scalable and efficient computing power in the cloud has driven the shift from monolithic applications to distributed and decoupled applications based on micro-services and serverless computing. At the same time, common x86 CPUs are barely improving in performance. For this reason, heterogeneous computing is becoming an appealing field to squeeze performance and continue to meet Service Level Agreements. Hardware accelerators should be designed to optimize latency, as they face spiky requests that usually cannot be batched. Within this context, unfortunately, a hardware accelerator like a FPGA might remain idle most of the time if used exclusively by a single service.
In this talk, we will describe our proposed system to enable accelerated serverless computing using FPGAs. The system performs time-sharing of FPGA-based accelerators in the cluster, increasing utilization and allowing the integration with current serverless framework. In addition, we enable a seamless integration in a vendor-independent approach (supporting both Xilinx and Altera FPGAs) through a transparent OpenCL custom implementation. We will describe the system architecture and inner workings, along with experimental results showing a low overhead w.r.t. native execution, and possible future opportunities and features on which we are working.

  • Identifiez-vous pour voir les commentaires

  • Soyez le premier à aimer ceci

BlastFunctions: FPGA acceleration for serverless computing

  1. 1. BlastFunc*on: FPGA accelera*on for serverless compu*ng Marco Bacis <marco.bacis@mail.polimi.it> NGCX @ SF & Bay Area 20-30/05/2019
  2. 2. Cloud compu*ng trends 2 Virtual Machines Containers Serverless
  3. 3. What about FPGAs! 3
  4. 4. Cloud FPGAs issues 4 FPGAVM App Runtime req/s time
  5. 5. Cloud FPGAs issues 5 FPGAVM App Runtime FPGAVM App Runtime req/s time
  6. 6. Cloud FPGAs issues 6 FPGAVM App Runtime FPGAVM App Runtime FPGA VM App Runtime
  7. 7. Our approach: BlastFunc*on 7 FPGA App FPGA App FPGA App
  8. 8. Our approach: BlastFunc*on 8 FPGA App FPGA App FPGA App
  9. 9. Our approach: BlastFunc*on 8 App FPGA App FPGA App
  10. 10. BlastFunc*on in a nutshell 9 FPGA Sharing system for Microservices and Serverless functions Fine-grained control over reconfigurable resources in the cloud Easy / Transparent integration and scalability
  11. 11. Node FPGA Device Manager FPGA Device Manager High level architecture 10 Node FPGA Device Manager
  12. 12. Accelerator Registry Node FPGA Device Manager FPGA Device Manager High level architecture 10 Node FPGA Device Manager
  13. 13. Containers / Functions Orchestrator Accelerator Registry Node FPGA Device Manager FPGA Device Manager High level architecture 10 Node FPGA Device Manager
  14. 14. Containers / Functions Orchestrator Accelerator Registry Node FPGA Device Manager FPGA Device Manager High level architecture 10 Node FPGA Device Manager App
  15. 15. Experimental results 11 *benchmarks from: https://github.com/KastnerRG/spector
  16. 16. Experimental results 11 *benchmarks from: https://github.com/KastnerRG/spector Sobel Filter CPU + FPGA processing
  17. 17. Experimental results 11 *benchmarks from: https://github.com/KastnerRG/spector Sobel Filter CPU + FPGA processing Matrix Multiplication Memory/Bandwidth Intensive
  18. 18. Experimental results 11 *benchmarks from: https://github.com/KastnerRG/spector Sobel Filter CPU + FPGA processing Graph BFS Iterative and Multi-kernel Matrix Multiplication Memory/Bandwidth Intensive
  19. 19. Experimental results 11 *benchmarks from: https://github.com/KastnerRG/spector Wrk2 FPGA Device Manager Function Sobel Filter CPU + FPGA processing Graph BFS Iterative and Multi-kernel Matrix Multiplication Memory/Bandwidth Intensive
  20. 20. Experimental results 12
  21. 21. Experimental results 12 Matrix Multiplication
  22. 22. Experimental results 12 Matrix Multiplication Sobel Filter
  23. 23. Experimental results 12 Matrix Multiplication Sobel Filter Graph BFS
  24. 24. Experimental results 12 Matrix Multiplication Sobel Filter Graph BFS Results considered until saturation (inflection point) Small overhead (max 20 ms, depends on gRPC flow control) Reached high FPGA time utilization (up to 96%) Tested isolation and sharing mechanism, not scheduling
  25. 25. 13 Selected set of high performance accelerators available natively Implementation of Space Sharing strategies More templates / languages (now supporting Golang/C++) Ideas / Future development
  26. 26. Thank you!! BlastFunc*on: FPGA accelera*on for serverless compu*ng Rolando Brondolin rolando.brondolin@polimi.it Marco Bacis marco.bacis@mail.polimi.it

×