SlideShare une entreprise Scribd logo
1  sur  28
Télécharger pour lire hors ligne
A LOW POWER AND HIGH PERFORMANCE
SOFTWARE APPROACH TO ARTIFICIAL
INTELLIGENCE ON- BOARD
DASIA 2022
pablo.ghiglino@klepsydra.com
www.klepsydra.com
Part 1
Lock-free
programming
CONTEXT: PARALLEL PROCESSING
• Recurrent mission
failures due to
software
• Access to sensor data
from Earth is time
consuming.
• Satellites struggle to
meet power
requirements
Consequences for Space applications
Challenges on on-board processing
CPU
Usage
Low Medium
Data volume
Modern hardware and old
software:
• Computers max out with low to
medium data volumes
• Inef
fi
cient use of resources
• Excessive power for low data
processing
COMPARE AND SWAP
• Compare-and-swap (CAS) is an instruction
used in multithreading to achieve
synchronisation. It compares the contents of
a memory location with a given value and,
only if they are the same, modi
fi
es the
contents of that memory location to a new
given value. This is done as a single
atomic operation.
• Compare-and-Swap has been an integral
part of the IBM 370 architectures since
1970.
• Maurice Herlihy (1991) proved that CAS can
implement more of these algorithms than
atomic read, write, and fetch-and-add
LOCK BASED PARALLELISATION VS
LOCK FREE PARALLELISATION
• Threads need to acquire lock to access resource.
• Context switch:
• Suspended while resource is locked by
someone else
• Awaken when resource is available.
• Not deterministic, power consuming context switch.
• Threads access resources using ‘Atomic Operations’
• Compare and Swap (CAS):
• Try to update a memory entry
• If not possible tried again
• No locks involved, but ‘busy wait’
• No context switch required.
BENCHMARK TEST
Mutex based queue Lock-free ring buffer
Sensor data serialisation example
• Sensor data is sent to a queue for processing.
• Consumer listening to the queue that collects sensor data.
• When a number of data instance is reached. It is serialised and stored.
Power consumption 2 Topic
CPU
(%)
42
49
55
Data Rate (Hz)
0 1 2
Klepsydra
Single Queue
Power consumption 4 Topic
CPU
(%)
50
60
70
Data Rate (Hz)
0 1 2
Klepsydra
Multi queue
Single Queue
Data Throughout 2 Topic
Process
data
rate
(Hz)
0
1
2
Data Rate (Hz)
0 1 2
Klepsydra
Single Queue
Data Throughout 4 Topic
Process
data
rate
(Hz)
0
1
2
Data Rate (Hz)
0 1 2
Klepsydra
Multi queue
Single Queue
GR716 Benchmark (RTEMS)
PROS AND CONS OF LOCK-FREE
PROGRAMMING
CPU
Usage
Data volume
CPU
Usage
Data volume
Lock-free programming
Pros:
• Less CPU consumption required
• Lower latency and higher data throughput
• Substantial increase in determinism
Cons:
• Extremely dif
fi
cult programming
technique
• Requires processor with CAS instructions
(90% of the market have them, though)
Part 2
Pipelining
LOCK-FREE AS ALTERNATIVE TO
PARALLELISATION
Parallelisation Pipeline
APPROACH
Input
Matrix
B = A x A C = B x B
Output
Matrix
Input
Matrix B = A x A
Output
Matrix
C = B x B
Klepsydra Parallel Streaming Setup
OpenMP Sequential Setup
{
Thread 1
{
Thread 2
{
Parallelised
{
Parallelised
BENCHMARK DESCRIPTION
Description
• Given an input matrix, a number of sequential multiplications will be
performed:
• Step 1: A => B = A x A => Step 2 : C = B x B…
• Matrix A randomly generated on each new sequence
Parameters:
• Matrix dimensions: 100x100
• Data type: Float, integer
• Number of multiplications per matrix: [10, 60]
• Processing frequency: [2Hz - 100Hz]
Technical Spec
• Computer: Odroid XU4
• OS: Ubuntu 18.04
FLOAT PERFORMANCE RESULTS I
CPU Usage. 10 Steps
0,0
22,5
45,0
67,5
90,0
Publishing Rate (Hz)
2,00 26,50 51,00 75,50 100,00
OpenMp Klepsydra
Throughput. 10 Steps
0,00
25,00
50,00
75,00
100,00
Publishing Rate (Hz)
2,00 26,50 51,00 75,50 100,00
OpenMp Klepsydra
Latency. 10 Steps
0,00
12,50
25,00
37,50
50,00
Publishing Rate (Hz)
2,00 26,50 51,00 75,50 100,00
OpenMp Klepsydra
Throughput. 20 Steps
0,00
10,00
20,00
30,00
40,00
Publishing Rate (Hz)
2,00 11,50 21,00 30,50 40,00
OpenMp Klepsydra
Latency. 20 Steps
0,00
27,50
55,00
82,50
110,00
Publishing Rate (Hz)
2,00 11,50 21,00 30,50 40,00
OpenMp Klepsydra
CPU Usage. 20 Steps
0,0
22,5
45,0
67,5
90,0
Publishing Rate (Hz)
2,00 11,50 21,00 30,50 40,00
OpenMp Klepsydra
FLOAT PERFORMANCE RESULTS II
CPU Usage. 30 Steps
0,0
20,0
40,0
60,0
80,0
Publishing Rate (Hz)
2,00 6,50 11,00 15,50 20,00
OpenMp Klepsydra
Throughput. 30 Steps
0,00
5,00
10,00
15,00
20,00
Publishing Rate (Hz)
2,00 6,50 11,00 15,50 20,00
OpenMp Klepsydra
CPU Usage. 40 Steps
0,0
17,5
35,0
52,5
70,0
Publishing Rate (Hz)
2,00 5,00 8,00 11,00 14,00
OpenMp Klepsydra
Throughput. 40 Steps
0,00
3,50
7,00
10,50
14,00
Publishing Rate (Hz)
2,00 5,00 8,00 11,00 14,00
OpenMp Klepsydra
Latency. 40 Steps
0,00
60,00
120,00
180,00
240,00
Publishing Rate (Hz)
2,00 5,00 8,00 11,00 14,00
OpenMp Klepsydra
Latency. 30 Steps
0,00
45,00
90,00
135,00
180,00
Publishing Rate (Hz)
2,00 6,50 11,00 15,50 20,00
OpenMp Klepsydra
FLOAT PERFORMANCE RESULTS III
CPU Usage. 50 Steps
0,0
15,0
30,0
45,0
60,0
Publishing Rate (Hz)
2,00 4,00 6,00 8,00 10,00
OpenMp Klepsydra
Throughput. 50 Steps
0,00
2,75
5,50
8,25
11,00
Publishing Rate (Hz)
2,00 4,00 6,00 8,00 10,00
OpenMp Klepsydra
Latency. 50 Steps
0,00
100,00
200,00
300,00
400,00
Publishing Rate (Hz)
2,00 4,00 6,00 8,00 10,00
OpenMp Klepsydra
CPU Usage. 60 Steps
0,0
15,0
30,0
45,0
60,0
Publishing Rate (Hz)
2,00 3,50 5,00 6,50 8,00
OpenMp Klepsydra
Throughput. 60 Steps
0,00
2,00
4,00
6,00
8,00
Publishing Rate (Hz)
2,00 3,50 5,00 6,50 8,00
OpenMp Klepsydra
Latency. 60 Steps
0,00
225,00
450,00
675,00
900,00
Publishing Rate (Hz)
2,00 3,50 5,00 6,50 8,00
OpenMp Klepsydra
Part 3
The threading
model
2-DIM THREADING MODEL
Input
Data
Layer
Output
Data
First dimension: pipelining
{
Thread 1 (Core 1)
Layer
Layer
Layer
Layer
Layer
{
Thread 2 (Core 2)
Layer
Layer
Layer
Layer
2-DIM THREADING MODEL
Input
Data
Output
Data
Second dimension: Matrix
multiplication parallelisation
{
T
hread
1
(Core
1)
Layer
{
T
hread
2
(Core
2)
{
T
hread
3
(Core
3)
2-DIM THREADING MODEL
Core 1 Core 2
Core 3 Core 4
Layer
Layer
Layer
Layer
Layer
Layer
Core 1 Core 2
Core 3 Core 4
Layer
Layer
Layer
Layer
Layer
Layer
Core 1 Core 2
Core 3 Core 4
Layer
Layer
Layer
Layer
Layer
Layer
Layer
Layer
Layer
• Low CPU
• High throughput CPU
• High latency
• Mid CPU
• Mid throughput CPU
• Mid latency
• High CPU
• Mid throughput CPU
• Low latency
Threading model con
fi
guration
Performance tuning
Performance Criteria
• CPU usage
• RAM usage
• Throughput (output data rate)
• Latency
20
Performance parameters:
• pool_size
Size of the internal queues of the event loop publish/
subscribe pairs.
High throughput requires large numbers, i.e., more RAM
usage, low throughout requires smaller number, therefore
less RAM.
Performance parameters
• number_of_cores
Number of cores where event loops will be distributed (by
default one event loop per core). High throughput requires
more cores, i.e., more CPU usage, low throughput requires
low number of cores, therefore substantial reduction in
CPU usage.
Performance parameters
• number_of_parallel_threads
Number of threads assigned to parallelise layers. For low
latency requirements, assign large numbers (maximum =
number of cores), i.e., increase CPU usage. For no latency
requirements, use low numbers (minimum = 1), therefore
substantial reduction in CPU usage.
21
Example of performance benchmarks
TensorFlow Klepsydra AI
Latency: 56ms
Latency: 35ms
Part 4
Space
applications
Vision-based navigation Earth Observation Telecommunications
• Process more images per
second
• Increase con
fi
dence in the
mission
• Reduce power consumption up
to 50%
• Faster access to data from Earth
• Increase processed request per
second (increase revenue)
• Enable AI telecomm (Cognitive
radios)
APPLICATION TO SPACE
23
KATESU PROJECT
• On-going activity with ESA Software: KLEPSYDRA AI TECHNOLOGY EVALUATION
FOR SPACE USE
• The main goal of the activity is to evaluate Klepsydra AI on Space quali
fi
ed
computers.
• Main target is: LS1046 with Linux and docker running inside.
QORIQ® LAYERSCAPE LS1046A
MULTICORE PROCESSOR
QorIQ® Layerscape LS1046A
Klepsydra AI Container
Part 5
Conclusions
and Future
work
CONCLUSIONS
• Lock-free programming techniques, together with
pipelining can bring three main bene
fi
ts to on-board
processing:
• Faster data processing
• Reduce power consumption on-board
• Determinism
• The bene
fi
ts for Space system are clear for those areas
needing large data processing: EO, Navigation and telecom.
FUTURE WORK
• FreeRTOS support Q3 2022
• Support to hardware acceleration (FPGA and GPU) Q4 2022
• Support to there architectures (RISC-V, Sparc) 2023
• Space quali
fi
cation of the software (Already technically
‘friendly’) 2024
Dr Pablo Ghiglino
pablo.ghiglino@klepsydra.com
+41786931544
www.klepsydra.com
linkedin.com/company/klepsydra-technologies

Contenu connexe

Similaire à Dasia 2022

Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloadsinside-BigData.com
 
Inside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable CloudInside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable Cloudinside-BigData.com
 
Inter Task Communication On Volatile Nodes
Inter Task Communication On Volatile NodesInter Task Communication On Volatile Nodes
Inter Task Communication On Volatile Nodesnagarajan_ka
 
Project Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptxProject Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptxAkshitAgiwal1
 
NNECST: an FPGA-based approach for the hardware acceleration of Convolutional...
NNECST: an FPGA-based approach for the hardware acceleration of Convolutional...NNECST: an FPGA-based approach for the hardware acceleration of Convolutional...
NNECST: an FPGA-based approach for the hardware acceleration of Convolutional...NECST Lab @ Politecnico di Milano
 
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Strata Singapore: GearpumpReal time DAG-Processing with Akka at ScaleStrata Singapore: GearpumpReal time DAG-Processing with Akka at Scale
Strata Singapore: Gearpump Real time DAG-Processing with Akka at ScaleSean Zhong
 
High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHungWei Chiu
 
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...Fisnik Kraja
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCoburn Watson
 
Inside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable CloudInside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable Cloudinside-BigData.com
 
DigitRecognition.pptx
DigitRecognition.pptxDigitRecognition.pptx
DigitRecognition.pptxruvex
 
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetupGanesan Narayanasamy
 
Fast Userspace OVS with AF_XDP, OVS CONF 2018
Fast Userspace OVS with AF_XDP, OVS CONF 2018Fast Userspace OVS with AF_XDP, OVS CONF 2018
Fast Userspace OVS with AF_XDP, OVS CONF 2018Cheng-Chun William Tu
 
improve deep learning training and inference performance
improve deep learning training and inference performanceimprove deep learning training and inference performance
improve deep learning training and inference performances.rohit
 
Exploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC CloudExploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC CloudRyousei Takano
 
CNNECST: an FPGA-based approach for the hardware acceleration of Convolutiona...
CNNECST: an FPGA-based approach for the hardware acceleration of Convolutiona...CNNECST: an FPGA-based approach for the hardware acceleration of Convolutiona...
CNNECST: an FPGA-based approach for the hardware acceleration of Convolutiona...NECST Lab @ Politecnico di Milano
 
Maxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorialMaxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorialmadhuinturi
 
Sunx4450 Intel7460 GigaSpaces XAP Platform Benchmark
Sunx4450 Intel7460 GigaSpaces XAP Platform BenchmarkSunx4450 Intel7460 GigaSpaces XAP Platform Benchmark
Sunx4450 Intel7460 GigaSpaces XAP Platform BenchmarkShay Hassidim
 
Large-Scale Training with GPUs at Facebook
Large-Scale Training with GPUs at FacebookLarge-Scale Training with GPUs at Facebook
Large-Scale Training with GPUs at FacebookFaisal Siddiqi
 

Similaire à Dasia 2022 (20)

Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloads
 
Inside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable CloudInside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable Cloud
 
Robotics technical Presentation
Robotics technical PresentationRobotics technical Presentation
Robotics technical Presentation
 
Inter Task Communication On Volatile Nodes
Inter Task Communication On Volatile NodesInter Task Communication On Volatile Nodes
Inter Task Communication On Volatile Nodes
 
Project Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptxProject Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptx
 
NNECST: an FPGA-based approach for the hardware acceleration of Convolutional...
NNECST: an FPGA-based approach for the hardware acceleration of Convolutional...NNECST: an FPGA-based approach for the hardware acceleration of Convolutional...
NNECST: an FPGA-based approach for the hardware acceleration of Convolutional...
 
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Strata Singapore: GearpumpReal time DAG-Processing with Akka at ScaleStrata Singapore: GearpumpReal time DAG-Processing with Akka at Scale
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
 
High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User Group
 
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
 
Inside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable CloudInside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable Cloud
 
DigitRecognition.pptx
DigitRecognition.pptxDigitRecognition.pptx
DigitRecognition.pptx
 
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup
 
Fast Userspace OVS with AF_XDP, OVS CONF 2018
Fast Userspace OVS with AF_XDP, OVS CONF 2018Fast Userspace OVS with AF_XDP, OVS CONF 2018
Fast Userspace OVS with AF_XDP, OVS CONF 2018
 
improve deep learning training and inference performance
improve deep learning training and inference performanceimprove deep learning training and inference performance
improve deep learning training and inference performance
 
Exploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC CloudExploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC Cloud
 
CNNECST: an FPGA-based approach for the hardware acceleration of Convolutiona...
CNNECST: an FPGA-based approach for the hardware acceleration of Convolutiona...CNNECST: an FPGA-based approach for the hardware acceleration of Convolutiona...
CNNECST: an FPGA-based approach for the hardware acceleration of Convolutiona...
 
Maxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorialMaxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorial
 
Sunx4450 Intel7460 GigaSpaces XAP Platform Benchmark
Sunx4450 Intel7460 GigaSpaces XAP Platform BenchmarkSunx4450 Intel7460 GigaSpaces XAP Platform Benchmark
Sunx4450 Intel7460 GigaSpaces XAP Platform Benchmark
 
Large-Scale Training with GPUs at Facebook
Large-Scale Training with GPUs at FacebookLarge-Scale Training with GPUs at Facebook
Large-Scale Training with GPUs at Facebook
 

Dernier

ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdfARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdfTobias Schneck
 
OpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS CalculatorOpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS CalculatorShane Coughlan
 
Kubernetes go-live checklist for your microservices.pptx
Kubernetes go-live checklist for your microservices.pptxKubernetes go-live checklist for your microservices.pptx
Kubernetes go-live checklist for your microservices.pptxPrakarsh -
 
About .NET 8 and a first glimpse into .NET9
About .NET 8 and a first glimpse into .NET9About .NET 8 and a first glimpse into .NET9
About .NET 8 and a first glimpse into .NET9Jürgen Gutsch
 
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine HarmonyLeveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmonyelliciumsolutionspun
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeNeo4j
 
Why Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdfWhy Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdfBrain Inventory
 
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...Jaydeep Chhasatia
 
Top Software Development Trends in 2024
Top Software Development Trends in  2024Top Software Development Trends in  2024
Top Software Development Trends in 2024Mind IT Systems
 
Deep Learning for Images with PyTorch - Datacamp
Deep Learning for Images with PyTorch - DatacampDeep Learning for Images with PyTorch - Datacamp
Deep Learning for Images with PyTorch - DatacampVICTOR MAESTRE RAMIREZ
 
online pdf editor software solutions.pdf
online pdf editor software solutions.pdfonline pdf editor software solutions.pdf
online pdf editor software solutions.pdfMeon Technology
 
Watermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security ChallengesWatermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security ChallengesShyamsundar Das
 
How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?AmeliaSmith90
 
eAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspectionseAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspectionsNirav Modi
 
Growing Oxen: channel operators and retries
Growing Oxen: channel operators and retriesGrowing Oxen: channel operators and retries
Growing Oxen: channel operators and retriesSoftwareMill
 
JS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIJS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIIvo Andreev
 
Mastering Kubernetes - Basics and Advanced Concepts using Example Project
Mastering Kubernetes - Basics and Advanced Concepts using Example ProjectMastering Kubernetes - Basics and Advanced Concepts using Example Project
Mastering Kubernetes - Basics and Advanced Concepts using Example Projectwajrcs
 
Your Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software TeamsYour Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software TeamsJaydeep Chhasatia
 
Generative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-CouncilGenerative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-CouncilVICTOR MAESTRE RAMIREZ
 

Dernier (20)

ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdfARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
 
OpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS CalculatorOpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS Calculator
 
Kubernetes go-live checklist for your microservices.pptx
Kubernetes go-live checklist for your microservices.pptxKubernetes go-live checklist for your microservices.pptx
Kubernetes go-live checklist for your microservices.pptx
 
About .NET 8 and a first glimpse into .NET9
About .NET 8 and a first glimpse into .NET9About .NET 8 and a first glimpse into .NET9
About .NET 8 and a first glimpse into .NET9
 
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine HarmonyLeveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG time
 
Why Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdfWhy Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdf
 
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
 
Top Software Development Trends in 2024
Top Software Development Trends in  2024Top Software Development Trends in  2024
Top Software Development Trends in 2024
 
Deep Learning for Images with PyTorch - Datacamp
Deep Learning for Images with PyTorch - DatacampDeep Learning for Images with PyTorch - Datacamp
Deep Learning for Images with PyTorch - Datacamp
 
online pdf editor software solutions.pdf
online pdf editor software solutions.pdfonline pdf editor software solutions.pdf
online pdf editor software solutions.pdf
 
Sustainable Web Design - Claire Thornewill
Sustainable Web Design - Claire ThornewillSustainable Web Design - Claire Thornewill
Sustainable Web Design - Claire Thornewill
 
Watermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security ChallengesWatermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security Challenges
 
How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?
 
eAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspectionseAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspections
 
Growing Oxen: channel operators and retries
Growing Oxen: channel operators and retriesGrowing Oxen: channel operators and retries
Growing Oxen: channel operators and retries
 
JS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIJS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AI
 
Mastering Kubernetes - Basics and Advanced Concepts using Example Project
Mastering Kubernetes - Basics and Advanced Concepts using Example ProjectMastering Kubernetes - Basics and Advanced Concepts using Example Project
Mastering Kubernetes - Basics and Advanced Concepts using Example Project
 
Your Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software TeamsYour Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
 
Generative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-CouncilGenerative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-Council
 

Dasia 2022

  • 1. A LOW POWER AND HIGH PERFORMANCE SOFTWARE APPROACH TO ARTIFICIAL INTELLIGENCE ON- BOARD DASIA 2022 pablo.ghiglino@klepsydra.com www.klepsydra.com
  • 3. CONTEXT: PARALLEL PROCESSING • Recurrent mission failures due to software • Access to sensor data from Earth is time consuming. • Satellites struggle to meet power requirements Consequences for Space applications Challenges on on-board processing CPU Usage Low Medium Data volume Modern hardware and old software: • Computers max out with low to medium data volumes • Inef fi cient use of resources • Excessive power for low data processing
  • 4. COMPARE AND SWAP • Compare-and-swap (CAS) is an instruction used in multithreading to achieve synchronisation. It compares the contents of a memory location with a given value and, only if they are the same, modi fi es the contents of that memory location to a new given value. This is done as a single atomic operation. • Compare-and-Swap has been an integral part of the IBM 370 architectures since 1970. • Maurice Herlihy (1991) proved that CAS can implement more of these algorithms than atomic read, write, and fetch-and-add
  • 5. LOCK BASED PARALLELISATION VS LOCK FREE PARALLELISATION • Threads need to acquire lock to access resource. • Context switch: • Suspended while resource is locked by someone else • Awaken when resource is available. • Not deterministic, power consuming context switch. • Threads access resources using ‘Atomic Operations’ • Compare and Swap (CAS): • Try to update a memory entry • If not possible tried again • No locks involved, but ‘busy wait’ • No context switch required.
  • 6. BENCHMARK TEST Mutex based queue Lock-free ring buffer Sensor data serialisation example • Sensor data is sent to a queue for processing. • Consumer listening to the queue that collects sensor data. • When a number of data instance is reached. It is serialised and stored.
  • 7. Power consumption 2 Topic CPU (%) 42 49 55 Data Rate (Hz) 0 1 2 Klepsydra Single Queue Power consumption 4 Topic CPU (%) 50 60 70 Data Rate (Hz) 0 1 2 Klepsydra Multi queue Single Queue Data Throughout 2 Topic Process data rate (Hz) 0 1 2 Data Rate (Hz) 0 1 2 Klepsydra Single Queue Data Throughout 4 Topic Process data rate (Hz) 0 1 2 Data Rate (Hz) 0 1 2 Klepsydra Multi queue Single Queue GR716 Benchmark (RTEMS)
  • 8. PROS AND CONS OF LOCK-FREE PROGRAMMING CPU Usage Data volume CPU Usage Data volume Lock-free programming Pros: • Less CPU consumption required • Lower latency and higher data throughput • Substantial increase in determinism Cons: • Extremely dif fi cult programming technique • Requires processor with CAS instructions (90% of the market have them, though)
  • 10. LOCK-FREE AS ALTERNATIVE TO PARALLELISATION Parallelisation Pipeline
  • 11. APPROACH Input Matrix B = A x A C = B x B Output Matrix Input Matrix B = A x A Output Matrix C = B x B Klepsydra Parallel Streaming Setup OpenMP Sequential Setup { Thread 1 { Thread 2 { Parallelised { Parallelised
  • 12. BENCHMARK DESCRIPTION Description • Given an input matrix, a number of sequential multiplications will be performed: • Step 1: A => B = A x A => Step 2 : C = B x B… • Matrix A randomly generated on each new sequence Parameters: • Matrix dimensions: 100x100 • Data type: Float, integer • Number of multiplications per matrix: [10, 60] • Processing frequency: [2Hz - 100Hz] Technical Spec • Computer: Odroid XU4 • OS: Ubuntu 18.04
  • 13. FLOAT PERFORMANCE RESULTS I CPU Usage. 10 Steps 0,0 22,5 45,0 67,5 90,0 Publishing Rate (Hz) 2,00 26,50 51,00 75,50 100,00 OpenMp Klepsydra Throughput. 10 Steps 0,00 25,00 50,00 75,00 100,00 Publishing Rate (Hz) 2,00 26,50 51,00 75,50 100,00 OpenMp Klepsydra Latency. 10 Steps 0,00 12,50 25,00 37,50 50,00 Publishing Rate (Hz) 2,00 26,50 51,00 75,50 100,00 OpenMp Klepsydra Throughput. 20 Steps 0,00 10,00 20,00 30,00 40,00 Publishing Rate (Hz) 2,00 11,50 21,00 30,50 40,00 OpenMp Klepsydra Latency. 20 Steps 0,00 27,50 55,00 82,50 110,00 Publishing Rate (Hz) 2,00 11,50 21,00 30,50 40,00 OpenMp Klepsydra CPU Usage. 20 Steps 0,0 22,5 45,0 67,5 90,0 Publishing Rate (Hz) 2,00 11,50 21,00 30,50 40,00 OpenMp Klepsydra
  • 14. FLOAT PERFORMANCE RESULTS II CPU Usage. 30 Steps 0,0 20,0 40,0 60,0 80,0 Publishing Rate (Hz) 2,00 6,50 11,00 15,50 20,00 OpenMp Klepsydra Throughput. 30 Steps 0,00 5,00 10,00 15,00 20,00 Publishing Rate (Hz) 2,00 6,50 11,00 15,50 20,00 OpenMp Klepsydra CPU Usage. 40 Steps 0,0 17,5 35,0 52,5 70,0 Publishing Rate (Hz) 2,00 5,00 8,00 11,00 14,00 OpenMp Klepsydra Throughput. 40 Steps 0,00 3,50 7,00 10,50 14,00 Publishing Rate (Hz) 2,00 5,00 8,00 11,00 14,00 OpenMp Klepsydra Latency. 40 Steps 0,00 60,00 120,00 180,00 240,00 Publishing Rate (Hz) 2,00 5,00 8,00 11,00 14,00 OpenMp Klepsydra Latency. 30 Steps 0,00 45,00 90,00 135,00 180,00 Publishing Rate (Hz) 2,00 6,50 11,00 15,50 20,00 OpenMp Klepsydra
  • 15. FLOAT PERFORMANCE RESULTS III CPU Usage. 50 Steps 0,0 15,0 30,0 45,0 60,0 Publishing Rate (Hz) 2,00 4,00 6,00 8,00 10,00 OpenMp Klepsydra Throughput. 50 Steps 0,00 2,75 5,50 8,25 11,00 Publishing Rate (Hz) 2,00 4,00 6,00 8,00 10,00 OpenMp Klepsydra Latency. 50 Steps 0,00 100,00 200,00 300,00 400,00 Publishing Rate (Hz) 2,00 4,00 6,00 8,00 10,00 OpenMp Klepsydra CPU Usage. 60 Steps 0,0 15,0 30,0 45,0 60,0 Publishing Rate (Hz) 2,00 3,50 5,00 6,50 8,00 OpenMp Klepsydra Throughput. 60 Steps 0,00 2,00 4,00 6,00 8,00 Publishing Rate (Hz) 2,00 3,50 5,00 6,50 8,00 OpenMp Klepsydra Latency. 60 Steps 0,00 225,00 450,00 675,00 900,00 Publishing Rate (Hz) 2,00 3,50 5,00 6,50 8,00 OpenMp Klepsydra
  • 17. 2-DIM THREADING MODEL Input Data Layer Output Data First dimension: pipelining { Thread 1 (Core 1) Layer Layer Layer Layer Layer { Thread 2 (Core 2) Layer Layer Layer Layer
  • 18. 2-DIM THREADING MODEL Input Data Output Data Second dimension: Matrix multiplication parallelisation { T hread 1 (Core 1) Layer { T hread 2 (Core 2) { T hread 3 (Core 3)
  • 19. 2-DIM THREADING MODEL Core 1 Core 2 Core 3 Core 4 Layer Layer Layer Layer Layer Layer Core 1 Core 2 Core 3 Core 4 Layer Layer Layer Layer Layer Layer Core 1 Core 2 Core 3 Core 4 Layer Layer Layer Layer Layer Layer Layer Layer Layer • Low CPU • High throughput CPU • High latency • Mid CPU • Mid throughput CPU • Mid latency • High CPU • Mid throughput CPU • Low latency Threading model con fi guration
  • 20. Performance tuning Performance Criteria • CPU usage • RAM usage • Throughput (output data rate) • Latency 20 Performance parameters: • pool_size Size of the internal queues of the event loop publish/ subscribe pairs. High throughput requires large numbers, i.e., more RAM usage, low throughout requires smaller number, therefore less RAM. Performance parameters • number_of_cores Number of cores where event loops will be distributed (by default one event loop per core). High throughput requires more cores, i.e., more CPU usage, low throughput requires low number of cores, therefore substantial reduction in CPU usage. Performance parameters • number_of_parallel_threads Number of threads assigned to parallelise layers. For low latency requirements, assign large numbers (maximum = number of cores), i.e., increase CPU usage. For no latency requirements, use low numbers (minimum = 1), therefore substantial reduction in CPU usage.
  • 21. 21 Example of performance benchmarks TensorFlow Klepsydra AI Latency: 56ms Latency: 35ms
  • 23. Vision-based navigation Earth Observation Telecommunications • Process more images per second • Increase con fi dence in the mission • Reduce power consumption up to 50% • Faster access to data from Earth • Increase processed request per second (increase revenue) • Enable AI telecomm (Cognitive radios) APPLICATION TO SPACE 23
  • 24. KATESU PROJECT • On-going activity with ESA Software: KLEPSYDRA AI TECHNOLOGY EVALUATION FOR SPACE USE • The main goal of the activity is to evaluate Klepsydra AI on Space quali fi ed computers. • Main target is: LS1046 with Linux and docker running inside.
  • 25. QORIQ® LAYERSCAPE LS1046A MULTICORE PROCESSOR QorIQ® Layerscape LS1046A Klepsydra AI Container
  • 27. CONCLUSIONS • Lock-free programming techniques, together with pipelining can bring three main bene fi ts to on-board processing: • Faster data processing • Reduce power consumption on-board • Determinism • The bene fi ts for Space system are clear for those areas needing large data processing: EO, Navigation and telecom.
  • 28. FUTURE WORK • FreeRTOS support Q3 2022 • Support to hardware acceleration (FPGA and GPU) Q4 2022 • Support to there architectures (RISC-V, Sparc) 2023 • Space quali fi cation of the software (Already technically ‘friendly’) 2024 Dr Pablo Ghiglino pablo.ghiglino@klepsydra.com +41786931544 www.klepsydra.com linkedin.com/company/klepsydra-technologies