Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Prochain SlideShare
IAC 2020
IAC 2020
Chargement dans…3
×

Consultez-les par la suite

1 sur 28 Publicité

Plus De Contenu Connexe

Similaire à Dasia 2022 (20)

Plus récents (20)

Publicité

Dasia 2022

  1. 1. A LOW POWER AND HIGH PERFORMANCE SOFTWARE APPROACH TO ARTIFICIAL INTELLIGENCE ON- BOARD DASIA 2022 pablo.ghiglino@klepsydra.com www.klepsydra.com
  2. 2. Part 1 Lock-free programming
  3. 3. CONTEXT: PARALLEL PROCESSING • Recurrent mission failures due to software • Access to sensor data from Earth is time consuming. • Satellites struggle to meet power requirements Consequences for Space applications Challenges on on-board processing CPU Usage Low Medium Data volume Modern hardware and old software: • Computers max out with low to medium data volumes • Inef fi cient use of resources • Excessive power for low data processing
  4. 4. COMPARE AND SWAP • Compare-and-swap (CAS) is an instruction used in multithreading to achieve synchronisation. It compares the contents of a memory location with a given value and, only if they are the same, modi fi es the contents of that memory location to a new given value. This is done as a single atomic operation. • Compare-and-Swap has been an integral part of the IBM 370 architectures since 1970. • Maurice Herlihy (1991) proved that CAS can implement more of these algorithms than atomic read, write, and fetch-and-add
  5. 5. LOCK BASED PARALLELISATION VS LOCK FREE PARALLELISATION • Threads need to acquire lock to access resource. • Context switch: • Suspended while resource is locked by someone else • Awaken when resource is available. • Not deterministic, power consuming context switch. • Threads access resources using ‘Atomic Operations’ • Compare and Swap (CAS): • Try to update a memory entry • If not possible tried again • No locks involved, but ‘busy wait’ • No context switch required.
  6. 6. BENCHMARK TEST Mutex based queue Lock-free ring buffer Sensor data serialisation example • Sensor data is sent to a queue for processing. • Consumer listening to the queue that collects sensor data. • When a number of data instance is reached. It is serialised and stored.
  7. 7. Power consumption 2 Topic CPU (%) 42 49 55 Data Rate (Hz) 0 1 2 Klepsydra Single Queue Power consumption 4 Topic CPU (%) 50 60 70 Data Rate (Hz) 0 1 2 Klepsydra Multi queue Single Queue Data Throughout 2 Topic Process data rate (Hz) 0 1 2 Data Rate (Hz) 0 1 2 Klepsydra Single Queue Data Throughout 4 Topic Process data rate (Hz) 0 1 2 Data Rate (Hz) 0 1 2 Klepsydra Multi queue Single Queue GR716 Benchmark (RTEMS)
  8. 8. PROS AND CONS OF LOCK-FREE PROGRAMMING CPU Usage Data volume CPU Usage Data volume Lock-free programming Pros: • Less CPU consumption required • Lower latency and higher data throughput • Substantial increase in determinism Cons: • Extremely dif fi cult programming technique • Requires processor with CAS instructions (90% of the market have them, though)
  9. 9. Part 2 Pipelining
  10. 10. LOCK-FREE AS ALTERNATIVE TO PARALLELISATION Parallelisation Pipeline
  11. 11. APPROACH Input Matrix B = A x A C = B x B Output Matrix Input Matrix B = A x A Output Matrix C = B x B Klepsydra Parallel Streaming Setup OpenMP Sequential Setup { Thread 1 { Thread 2 { Parallelised { Parallelised
  12. 12. BENCHMARK DESCRIPTION Description • Given an input matrix, a number of sequential multiplications will be performed: • Step 1: A => B = A x A => Step 2 : C = B x B… • Matrix A randomly generated on each new sequence Parameters: • Matrix dimensions: 100x100 • Data type: Float, integer • Number of multiplications per matrix: [10, 60] • Processing frequency: [2Hz - 100Hz] Technical Spec • Computer: Odroid XU4 • OS: Ubuntu 18.04
  13. 13. FLOAT PERFORMANCE RESULTS I CPU Usage. 10 Steps 0,0 22,5 45,0 67,5 90,0 Publishing Rate (Hz) 2,00 26,50 51,00 75,50 100,00 OpenMp Klepsydra Throughput. 10 Steps 0,00 25,00 50,00 75,00 100,00 Publishing Rate (Hz) 2,00 26,50 51,00 75,50 100,00 OpenMp Klepsydra Latency. 10 Steps 0,00 12,50 25,00 37,50 50,00 Publishing Rate (Hz) 2,00 26,50 51,00 75,50 100,00 OpenMp Klepsydra Throughput. 20 Steps 0,00 10,00 20,00 30,00 40,00 Publishing Rate (Hz) 2,00 11,50 21,00 30,50 40,00 OpenMp Klepsydra Latency. 20 Steps 0,00 27,50 55,00 82,50 110,00 Publishing Rate (Hz) 2,00 11,50 21,00 30,50 40,00 OpenMp Klepsydra CPU Usage. 20 Steps 0,0 22,5 45,0 67,5 90,0 Publishing Rate (Hz) 2,00 11,50 21,00 30,50 40,00 OpenMp Klepsydra
  14. 14. FLOAT PERFORMANCE RESULTS II CPU Usage. 30 Steps 0,0 20,0 40,0 60,0 80,0 Publishing Rate (Hz) 2,00 6,50 11,00 15,50 20,00 OpenMp Klepsydra Throughput. 30 Steps 0,00 5,00 10,00 15,00 20,00 Publishing Rate (Hz) 2,00 6,50 11,00 15,50 20,00 OpenMp Klepsydra CPU Usage. 40 Steps 0,0 17,5 35,0 52,5 70,0 Publishing Rate (Hz) 2,00 5,00 8,00 11,00 14,00 OpenMp Klepsydra Throughput. 40 Steps 0,00 3,50 7,00 10,50 14,00 Publishing Rate (Hz) 2,00 5,00 8,00 11,00 14,00 OpenMp Klepsydra Latency. 40 Steps 0,00 60,00 120,00 180,00 240,00 Publishing Rate (Hz) 2,00 5,00 8,00 11,00 14,00 OpenMp Klepsydra Latency. 30 Steps 0,00 45,00 90,00 135,00 180,00 Publishing Rate (Hz) 2,00 6,50 11,00 15,50 20,00 OpenMp Klepsydra
  15. 15. FLOAT PERFORMANCE RESULTS III CPU Usage. 50 Steps 0,0 15,0 30,0 45,0 60,0 Publishing Rate (Hz) 2,00 4,00 6,00 8,00 10,00 OpenMp Klepsydra Throughput. 50 Steps 0,00 2,75 5,50 8,25 11,00 Publishing Rate (Hz) 2,00 4,00 6,00 8,00 10,00 OpenMp Klepsydra Latency. 50 Steps 0,00 100,00 200,00 300,00 400,00 Publishing Rate (Hz) 2,00 4,00 6,00 8,00 10,00 OpenMp Klepsydra CPU Usage. 60 Steps 0,0 15,0 30,0 45,0 60,0 Publishing Rate (Hz) 2,00 3,50 5,00 6,50 8,00 OpenMp Klepsydra Throughput. 60 Steps 0,00 2,00 4,00 6,00 8,00 Publishing Rate (Hz) 2,00 3,50 5,00 6,50 8,00 OpenMp Klepsydra Latency. 60 Steps 0,00 225,00 450,00 675,00 900,00 Publishing Rate (Hz) 2,00 3,50 5,00 6,50 8,00 OpenMp Klepsydra
  16. 16. Part 3 The threading model
  17. 17. 2-DIM THREADING MODEL Input Data Layer Output Data First dimension: pipelining { Thread 1 (Core 1) Layer Layer Layer Layer Layer { Thread 2 (Core 2) Layer Layer Layer Layer
  18. 18. 2-DIM THREADING MODEL Input Data Output Data Second dimension: Matrix multiplication parallelisation { T hread 1 (Core 1) Layer { T hread 2 (Core 2) { T hread 3 (Core 3)
  19. 19. 2-DIM THREADING MODEL Core 1 Core 2 Core 3 Core 4 Layer Layer Layer Layer Layer Layer Core 1 Core 2 Core 3 Core 4 Layer Layer Layer Layer Layer Layer Core 1 Core 2 Core 3 Core 4 Layer Layer Layer Layer Layer Layer Layer Layer Layer • Low CPU • High throughput CPU • High latency • Mid CPU • Mid throughput CPU • Mid latency • High CPU • Mid throughput CPU • Low latency Threading model con fi guration
  20. 20. Performance tuning Performance Criteria • CPU usage • RAM usage • Throughput (output data rate) • Latency 20 Performance parameters: • pool_size Size of the internal queues of the event loop publish/ subscribe pairs. High throughput requires large numbers, i.e., more RAM usage, low throughout requires smaller number, therefore less RAM. Performance parameters • number_of_cores Number of cores where event loops will be distributed (by default one event loop per core). High throughput requires more cores, i.e., more CPU usage, low throughput requires low number of cores, therefore substantial reduction in CPU usage. Performance parameters • number_of_parallel_threads Number of threads assigned to parallelise layers. For low latency requirements, assign large numbers (maximum = number of cores), i.e., increase CPU usage. For no latency requirements, use low numbers (minimum = 1), therefore substantial reduction in CPU usage.
  21. 21. 21 Example of performance benchmarks TensorFlow Klepsydra AI Latency: 56ms Latency: 35ms
  22. 22. Part 4 Space applications
  23. 23. Vision-based navigation Earth Observation Telecommunications • Process more images per second • Increase con fi dence in the mission • Reduce power consumption up to 50% • Faster access to data from Earth • Increase processed request per second (increase revenue) • Enable AI telecomm (Cognitive radios) APPLICATION TO SPACE 23
  24. 24. KATESU PROJECT • On-going activity with ESA Software: KLEPSYDRA AI TECHNOLOGY EVALUATION FOR SPACE USE • The main goal of the activity is to evaluate Klepsydra AI on Space quali fi ed computers. • Main target is: LS1046 with Linux and docker running inside.
  25. 25. QORIQ® LAYERSCAPE LS1046A MULTICORE PROCESSOR QorIQ® Layerscape LS1046A Klepsydra AI Container
  26. 26. Part 5 Conclusions and Future work
  27. 27. CONCLUSIONS • Lock-free programming techniques, together with pipelining can bring three main bene fi ts to on-board processing: • Faster data processing • Reduce power consumption on-board • Determinism • The bene fi ts for Space system are clear for those areas needing large data processing: EO, Navigation and telecom.
  28. 28. FUTURE WORK • FreeRTOS support Q3 2022 • Support to hardware acceleration (FPGA and GPU) Q4 2022 • Support to there architectures (RISC-V, Sparc) 2023 • Space quali fi cation of the software (Already technically ‘friendly’) 2024 Dr Pablo Ghiglino pablo.ghiglino@klepsydra.com +41786931544 www.klepsydra.com linkedin.com/company/klepsydra-technologies

×