A low power and high performance software approach to Artificial Intelligence on-board.
https://klepsydra.com/klepsydra-ai-technology-evaluation-space-use/
3. CONTEXT: PARALLEL PROCESSING
• Recurrent mission
failures due to
software
• Access to sensor data
from Earth is time
consuming.
• Satellites struggle to
meet power
requirements
Consequences for Space applications
Challenges on on-board processing
CPU
Usage
Low Medium
Data volume
Modern hardware and old
software:
• Computers max out with low to
medium data volumes
• Inef
fi
cient use of resources
• Excessive power for low data
processing
4. COMPARE AND SWAP
• Compare-and-swap (CAS) is an instruction
used in multithreading to achieve
synchronisation. It compares the contents of
a memory location with a given value and,
only if they are the same, modi
fi
es the
contents of that memory location to a new
given value. This is done as a single
atomic operation.
• Compare-and-Swap has been an integral
part of the IBM 370 architectures since
1970.
• Maurice Herlihy (1991) proved that CAS can
implement more of these algorithms than
atomic read, write, and fetch-and-add
5. LOCK BASED PARALLELISATION VS
LOCK FREE PARALLELISATION
• Threads need to acquire lock to access resource.
• Context switch:
• Suspended while resource is locked by
someone else
• Awaken when resource is available.
• Not deterministic, power consuming context switch.
• Threads access resources using ‘Atomic Operations’
• Compare and Swap (CAS):
• Try to update a memory entry
• If not possible tried again
• No locks involved, but ‘busy wait’
• No context switch required.
6. BENCHMARK TEST
Mutex based queue Lock-free ring buffer
Sensor data serialisation example
• Sensor data is sent to a queue for processing.
• Consumer listening to the queue that collects sensor data.
• When a number of data instance is reached. It is serialised and stored.
7. Power consumption 2 Topic
CPU
(%)
42
49
55
Data Rate (Hz)
0 1 2
Klepsydra
Single Queue
Power consumption 4 Topic
CPU
(%)
50
60
70
Data Rate (Hz)
0 1 2
Klepsydra
Multi queue
Single Queue
Data Throughout 2 Topic
Process
data
rate
(Hz)
0
1
2
Data Rate (Hz)
0 1 2
Klepsydra
Single Queue
Data Throughout 4 Topic
Process
data
rate
(Hz)
0
1
2
Data Rate (Hz)
0 1 2
Klepsydra
Multi queue
Single Queue
GR716 Benchmark (RTEMS)
8. PROS AND CONS OF LOCK-FREE
PROGRAMMING
CPU
Usage
Data volume
CPU
Usage
Data volume
Lock-free programming
Pros:
• Less CPU consumption required
• Lower latency and higher data throughput
• Substantial increase in determinism
Cons:
• Extremely dif
fi
cult programming
technique
• Requires processor with CAS instructions
(90% of the market have them, though)
11. APPROACH
Input
Matrix
B = A x A C = B x B
Output
Matrix
Input
Matrix B = A x A
Output
Matrix
C = B x B
Klepsydra Parallel Streaming Setup
OpenMP Sequential Setup
{
Thread 1
{
Thread 2
{
Parallelised
{
Parallelised
12. BENCHMARK DESCRIPTION
Description
• Given an input matrix, a number of sequential multiplications will be
performed:
• Step 1: A => B = A x A => Step 2 : C = B x B…
• Matrix A randomly generated on each new sequence
Parameters:
• Matrix dimensions: 100x100
• Data type: Float, integer
• Number of multiplications per matrix: [10, 60]
• Processing frequency: [2Hz - 100Hz]
Technical Spec
• Computer: Odroid XU4
• OS: Ubuntu 18.04
19. 2-DIM THREADING MODEL
Core 1 Core 2
Core 3 Core 4
Layer
Layer
Layer
Layer
Layer
Layer
Core 1 Core 2
Core 3 Core 4
Layer
Layer
Layer
Layer
Layer
Layer
Core 1 Core 2
Core 3 Core 4
Layer
Layer
Layer
Layer
Layer
Layer
Layer
Layer
Layer
• Low CPU
• High throughput CPU
• High latency
• Mid CPU
• Mid throughput CPU
• Mid latency
• High CPU
• Mid throughput CPU
• Low latency
Threading model con
fi
guration
20. Performance tuning
Performance Criteria
• CPU usage
• RAM usage
• Throughput (output data rate)
• Latency
20
Performance parameters:
• pool_size
Size of the internal queues of the event loop publish/
subscribe pairs.
High throughput requires large numbers, i.e., more RAM
usage, low throughout requires smaller number, therefore
less RAM.
Performance parameters
• number_of_cores
Number of cores where event loops will be distributed (by
default one event loop per core). High throughput requires
more cores, i.e., more CPU usage, low throughput requires
low number of cores, therefore substantial reduction in
CPU usage.
Performance parameters
• number_of_parallel_threads
Number of threads assigned to parallelise layers. For low
latency requirements, assign large numbers (maximum =
number of cores), i.e., increase CPU usage. For no latency
requirements, use low numbers (minimum = 1), therefore
substantial reduction in CPU usage.
23. Vision-based navigation Earth Observation Telecommunications
• Process more images per
second
• Increase con
fi
dence in the
mission
• Reduce power consumption up
to 50%
• Faster access to data from Earth
• Increase processed request per
second (increase revenue)
• Enable AI telecomm (Cognitive
radios)
APPLICATION TO SPACE
23
24. KATESU PROJECT
• On-going activity with ESA Software: KLEPSYDRA AI TECHNOLOGY EVALUATION
FOR SPACE USE
• The main goal of the activity is to evaluate Klepsydra AI on Space quali
fi
ed
computers.
• Main target is: LS1046 with Linux and docker running inside.
27. CONCLUSIONS
• Lock-free programming techniques, together with
pipelining can bring three main bene
fi
ts to on-board
processing:
• Faster data processing
• Reduce power consumption on-board
• Determinism
• The bene
fi
ts for Space system are clear for those areas
needing large data processing: EO, Navigation and telecom.
28. FUTURE WORK
• FreeRTOS support Q3 2022
• Support to hardware acceleration (FPGA and GPU) Q4 2022
• Support to there architectures (RISC-V, Sparc) 2023
• Space quali
fi
cation of the software (Already technically
‘friendly’) 2024
Dr Pablo Ghiglino
pablo.ghiglino@klepsydra.com
+41786931544
www.klepsydra.com
linkedin.com/company/klepsydra-technologies