SlideShare une entreprise Scribd logo
1  sur  17
Télécharger pour lire hors ligne
Elastic multicore scheduling with
the XiTAO runtime
Jing Chen, Pirah Noor, Mustafa Abduljabbar,
Miquel Pericàs
Chalmers University of Technology
Embedded Multicore Programming -
Industrial state-of-the-art and future directions
Edinburgh, April 17th
, 2019
22/01/2019 HiPEAC CSW Spring 2019 2
Heterogeneous-Parallel Platforms
Heterogeneity + Parallelism common in embedded platforms
●
Power-efficiency, battery-constrained devices
●
Examples:
– ARM big.LITTLE
– Nvidia Jetson TX2 (Denver2/A57/Pascal)
– Dynamic heterogeneity: DVFS, interference, cache
partitioning
HiKEY 960 Nvidia Jetson TX2
04/25/19 CSW Spring 2019 3
Heterogeneity as a dynamic property
Heterogeneity: cores in the system have different performance,
energy-efficiency etc.
Two types of heterogeneity: static and dynamic
●
Static:
– big.LITTLE, CPU-GPU
●
Dynamic:
– DVFS, cache partitioning, interference
– Interference:
●
Intra-process: cache, memory oversubscription
●
Inter-process: cache, memory, processor timesharing
●
Heterogeneity needs to be addressed dynamically by the
runtime!
22/01/2019 HiPEAC CSW Spring 2019 4
EU LEGaTO Project
• Create software stack-support for energy-
efficient heterogeneous computing
22/01/2019 HiPEAC CSW Spring 2019 5
EU LEGaTO Project
XiTAO
22/01/2019 HiPEAC CSW Spring 2019 6

Many applications can be expressed as mixed mode parallel
applications := external task parallelism + internal data parallelism

Naturally supports hierarchy/heterogeneity in modern architectures

Challenge: how to schedule? how many resources?
Mixed-mode parallelism
#pragma omp parallel for...
can be generalized to other
forms of parallelism!
22/01/2019 HiPEAC CSW Spring 2019 7

Improves Parallel Slackness

Bulk creation of parallelism
(low overhead)

Interference-avoidance

Constructive sharing
XiTAO mixed-mode runtime
1.Schedule external task parallelism via work stealing + locally
expand internal parallel tasks across multiple cores
2.Reduce inter-task interference by decoupling internal parallelism
from resources: Task Assembly Objects (TAO)
22/01/2019 HiPEAC CSW Spring 2019 8
XiTAO application
●
Example of 2D stencil execution on XiTAO
w=2
w=1
Application
22/01/2019 HiPEAC CSW Spring 2019 9
Elastic Places: Adaptivity
●
Example: Cilksort reduction on 48 cores. Dynamically resize places
as external parallelism decreases and TAO working set increases
●
Each colored box is a resource container, executing one TAO
Quick generation of parallelism, low overheads and good
isolation + constructive sharing
22/01/2019 HiPEAC CSW Spring 2019 10
XiTAO implementation
Basic TAO
class (XiTAO)
User-level API
for defining TAOs
User-level API for
defining TAO-DAGs
+ locality-awareness
●
XiTAO is fully implemented in C++11
●
Decentralized design targeting scalability
XiTAO API
22/01/2019 HiPEAC CSW Spring 2019 11
critical
path
internal DAG
fixed resource
container (cores, caches, ...)
Task Assembly Object (TAO)external
task
DAG
Heterogeneous scheduling
Main Idea: map only those tasks to high performance cores that
benefit due to criticality or due to performance characteristics
Faster Cores Slower Cores
Heterogeneous Platforms:
HiKEY 960,
Nvidia Jetson TX2
PTT
schedule
Performance Monitor
“Performance Trace Table”
22/01/2019 HiPEAC CSW Spring 2019 12
Performance Trace Table (PTT)
• Function: record the running time of each core in each resource
width;
• Aim: which is the best core and the best width to execute in the
available resources, efficiently resource usage;
• Implementation: table of size core_number * resource_width
1 PTT for each task type (in XiTAO: for each TAO type)
Resource width := number of cores that execute a TAO
22/01/2019 HiPEAC CSW Spring 2019 13
Random DAGs
250 500 1000 2000 4000
Task Number
16
8
4
2
1
Parallelism
500
750
1000
1250
1500
Throughput(TAOs/s)
250 500 1000 2000 4000
Task Number
16
8
4
2
1
Parallelism
500
750
1000
1250
1500
Throughput(TAOs/s)
Performance-based SchedulerPerformance-based Scheduler
(PTT-based)(PTT-based)
Homogeneous SchedulerHomogeneous Scheduler
(random work stealing)(random work stealing)
average DAG parallelism
throughput (performance)

Runtime assessment of resource partitions +
criticality-aware scheduling
22/01/2019 HiPEAC CSW Spring 2019 14
0 2 4 6 8 10 12 14
Elapsed Time [s]
0
1
2
3
4
5
6
7
8
9
Thread
8
10
12
14
16
18
20
PTTValue[ms]
Interference-awareness

Detects interference episodes and migrates critical tasks
tasks with multiple resources critical task schedules
interference episode PTT evolution for core=0 & width=1
●
Porting VGG-16 in Darknet framework to XiTAO
Current directions: VGG-16
maxpool
CONV3-64
CONV3-64
maxpool
CONV3-128
CONV3-128
maxpool
CONV3-256
CONV3-256
CONV3-256
CONV3-512
CONV3-512
CONV3-512
CONV3-512
CONV3-512
CONV3-512
FC-4096
FC-4096
FC-1000
maxpool
maxpool
GEMM
GEMM
GEMM
GEMM
GEMM
GEMM
GEMM
GEMM
GEMM
GEMM
GEMM
GEMM
GEMM
GEMM
maxpool
softmax
GEMM
GEMM
 TAO 0  TAO 1  TAO N.....
XiTAO
●
PTT automatically finds best widths to execute
VGG-16 on the dual-socket Intel platform (20 cores)
69,06
90,89
66,67
53,81
30,94
5,83
3,38
1,68
3,28
0,74
14,76
29,21
29,31
0,45
0
20
40
60
80
100
2 4 8 16
PercentageofTAOsw.r.t
TAO-width
Number of threads
1
2
4
8
16
22/01/2019 HiPEAC CSW Spring 2019 16
Future Directions
●
Front-ends for XiTAO
– OmpSs to XiTAO
– Array (tensor) programming
●
Low-energy runtime optimizations
●
Automatic DAG partitioning for generation of
mixed-mode computations
22/01/2019 HiPEAC CSW Spring 2019 17
Thank you!
Acknowledgements:
The XiTAO team
Jing Chen Pirah Noor Mustafa Abduljabbar Miquel Pericàs

Contenu connexe

Tendances

SRDS18: Security, Performance and Energy Trade-offs of Hardware-assisted Memo...
SRDS18: Security, Performance and Energy Trade-offs of Hardware-assisted Memo...SRDS18: Security, Performance and Energy Trade-offs of Hardware-assisted Memo...
SRDS18: Security, Performance and Energy Trade-offs of Hardware-assisted Memo...LEGATO project
 
Greenplum for Kubernetes - Greenplum Summit 2019
Greenplum for Kubernetes - Greenplum Summit 2019Greenplum for Kubernetes - Greenplum Summit 2019
Greenplum for Kubernetes - Greenplum Summit 2019VMware Tanzu
 
HPC Midlands - Supercomputing for Research and Industry (Hartree Centre prese...
HPC Midlands - Supercomputing for Research and Industry (Hartree Centre prese...HPC Midlands - Supercomputing for Research and Industry (Hartree Centre prese...
HPC Midlands - Supercomputing for Research and Industry (Hartree Centre prese...Martin Hamilton
 
KDB database (EPAM tech talks, Sofia, April, 2015)
KDB database (EPAM tech talks, Sofia, April, 2015)KDB database (EPAM tech talks, Sofia, April, 2015)
KDB database (EPAM tech talks, Sofia, April, 2015)Martin Toshev
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facilityinside-BigData.com
 
Scrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for ProfitScrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for ProfitLEGATO project
 
HPC Midlands - E.ON Supercomputing Case Study
HPC Midlands - E.ON Supercomputing Case StudyHPC Midlands - E.ON Supercomputing Case Study
HPC Midlands - E.ON Supercomputing Case StudyMartin Hamilton
 
OpenPOWER Application Optimisation meet up
OpenPOWER Application Optimisation meet up OpenPOWER Application Optimisation meet up
OpenPOWER Application Optimisation meet up Ganesan Narayanasamy
 
HPC Midlands Launch - Introduction to HPC Midlands
HPC Midlands Launch - Introduction to HPC MidlandsHPC Midlands Launch - Introduction to HPC Midlands
HPC Midlands Launch - Introduction to HPC MidlandsMartin Hamilton
 
OPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACKOPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACKInfluxData
 

Tendances (10)

SRDS18: Security, Performance and Energy Trade-offs of Hardware-assisted Memo...
SRDS18: Security, Performance and Energy Trade-offs of Hardware-assisted Memo...SRDS18: Security, Performance and Energy Trade-offs of Hardware-assisted Memo...
SRDS18: Security, Performance and Energy Trade-offs of Hardware-assisted Memo...
 
Greenplum for Kubernetes - Greenplum Summit 2019
Greenplum for Kubernetes - Greenplum Summit 2019Greenplum for Kubernetes - Greenplum Summit 2019
Greenplum for Kubernetes - Greenplum Summit 2019
 
HPC Midlands - Supercomputing for Research and Industry (Hartree Centre prese...
HPC Midlands - Supercomputing for Research and Industry (Hartree Centre prese...HPC Midlands - Supercomputing for Research and Industry (Hartree Centre prese...
HPC Midlands - Supercomputing for Research and Industry (Hartree Centre prese...
 
KDB database (EPAM tech talks, Sofia, April, 2015)
KDB database (EPAM tech talks, Sofia, April, 2015)KDB database (EPAM tech talks, Sofia, April, 2015)
KDB database (EPAM tech talks, Sofia, April, 2015)
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
 
Scrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for ProfitScrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for Profit
 
HPC Midlands - E.ON Supercomputing Case Study
HPC Midlands - E.ON Supercomputing Case StudyHPC Midlands - E.ON Supercomputing Case Study
HPC Midlands - E.ON Supercomputing Case Study
 
OpenPOWER Application Optimisation meet up
OpenPOWER Application Optimisation meet up OpenPOWER Application Optimisation meet up
OpenPOWER Application Optimisation meet up
 
HPC Midlands Launch - Introduction to HPC Midlands
HPC Midlands Launch - Introduction to HPC MidlandsHPC Midlands Launch - Introduction to HPC Midlands
HPC Midlands Launch - Introduction to HPC Midlands
 
OPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACKOPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACK
 

Similaire à Elastic multicore scheduling with the XiTAO runtime

LEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGATO project
 
DATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe ConferenceDATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe ConferenceLEGATO project
 
[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...
[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...
[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...Matteo Ferroni
 
HiPEAC 2020: Energy-aware Task Scheduling in LEGaTO: Low Energy Toolset for H...
HiPEAC 2020: Energy-aware Task Scheduling in LEGaTO: Low Energy Toolset for H...HiPEAC 2020: Energy-aware Task Scheduling in LEGaTO: Low Energy Toolset for H...
HiPEAC 2020: Energy-aware Task Scheduling in LEGaTO: Low Energy Toolset for H...LEGATO project
 
PADAL19: Runtime-Assisted Locality Abstraction Using Elastic Places and Virtu...
PADAL19: Runtime-Assisted Locality Abstraction Using Elastic Places and Virtu...PADAL19: Runtime-Assisted Locality Abstraction Using Elastic Places and Virtu...
PADAL19: Runtime-Assisted Locality Abstraction Using Elastic Places and Virtu...LEGATO project
 
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...LEGATO project
 
Red Hat Storage: Emerging Use Cases
Red Hat Storage: Emerging Use CasesRed Hat Storage: Emerging Use Cases
Red Hat Storage: Emerging Use CasesRed_Hat_Storage
 
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AIArm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AIinside-BigData.com
 
Opportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCIOpportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCIRyousei Takano
 
TWISummit 2019 - Return of Reconfigurable Computing
TWISummit 2019 - Return of Reconfigurable ComputingTWISummit 2019 - Return of Reconfigurable Computing
TWISummit 2019 - Return of Reconfigurable ComputingThoughtworks
 
Lecture_IIITD.pptx
Lecture_IIITD.pptxLecture_IIITD.pptx
Lecture_IIITD.pptxachakracu
 
The CAOS framework: Democratize the acceleration of compute intensive applica...
The CAOS framework: Democratize the acceleration of compute intensive applica...The CAOS framework: Democratize the acceleration of compute intensive applica...
The CAOS framework: Democratize the acceleration of compute intensive applica...NECST Lab @ Politecnico di Milano
 
Proof of Concept for Hadoop: storage and analytics of electrical time-series
Proof of Concept for Hadoop: storage and analytics of electrical time-seriesProof of Concept for Hadoop: storage and analytics of electrical time-series
Proof of Concept for Hadoop: storage and analytics of electrical time-seriesDataWorks Summit
 
FUNDAMENTALS OF COMPUTER DESIGN
FUNDAMENTALS OF COMPUTER DESIGNFUNDAMENTALS OF COMPUTER DESIGN
FUNDAMENTALS OF COMPUTER DESIGNvenkatraman227
 
electronics-11-03883.pdf
electronics-11-03883.pdfelectronics-11-03883.pdf
electronics-11-03883.pdfRioCarthiis
 
MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1blewington
 
Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Larry Smarr
 
Data Plane Evolution: Towards Openness and Flexibility
Data Plane Evolution: Towards Openness and FlexibilityData Plane Evolution: Towards Openness and Flexibility
Data Plane Evolution: Towards Openness and FlexibilityAPNIC
 
Automatically partitioning packet processing applications for pipelined archi...
Automatically partitioning packet processing applications for pipelined archi...Automatically partitioning packet processing applications for pipelined archi...
Automatically partitioning packet processing applications for pipelined archi...Ashley Carter
 

Similaire à Elastic multicore scheduling with the XiTAO runtime (20)

LEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGaTO: Software Stack Runtimes
LEGaTO: Software Stack Runtimes
 
DATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe ConferenceDATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe Conference
 
[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...
[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...
[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...
 
HiPEAC 2020: Energy-aware Task Scheduling in LEGaTO: Low Energy Toolset for H...
HiPEAC 2020: Energy-aware Task Scheduling in LEGaTO: Low Energy Toolset for H...HiPEAC 2020: Energy-aware Task Scheduling in LEGaTO: Low Energy Toolset for H...
HiPEAC 2020: Energy-aware Task Scheduling in LEGaTO: Low Energy Toolset for H...
 
PADAL19: Runtime-Assisted Locality Abstraction Using Elastic Places and Virtu...
PADAL19: Runtime-Assisted Locality Abstraction Using Elastic Places and Virtu...PADAL19: Runtime-Assisted Locality Abstraction Using Elastic Places and Virtu...
PADAL19: Runtime-Assisted Locality Abstraction Using Elastic Places and Virtu...
 
NWU and HPC
NWU and HPCNWU and HPC
NWU and HPC
 
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
 
Red Hat Storage: Emerging Use Cases
Red Hat Storage: Emerging Use CasesRed Hat Storage: Emerging Use Cases
Red Hat Storage: Emerging Use Cases
 
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AIArm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
 
Opportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCIOpportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCI
 
TWISummit 2019 - Return of Reconfigurable Computing
TWISummit 2019 - Return of Reconfigurable ComputingTWISummit 2019 - Return of Reconfigurable Computing
TWISummit 2019 - Return of Reconfigurable Computing
 
Lecture_IIITD.pptx
Lecture_IIITD.pptxLecture_IIITD.pptx
Lecture_IIITD.pptx
 
The CAOS framework: Democratize the acceleration of compute intensive applica...
The CAOS framework: Democratize the acceleration of compute intensive applica...The CAOS framework: Democratize the acceleration of compute intensive applica...
The CAOS framework: Democratize the acceleration of compute intensive applica...
 
Proof of Concept for Hadoop: storage and analytics of electrical time-series
Proof of Concept for Hadoop: storage and analytics of electrical time-seriesProof of Concept for Hadoop: storage and analytics of electrical time-series
Proof of Concept for Hadoop: storage and analytics of electrical time-series
 
FUNDAMENTALS OF COMPUTER DESIGN
FUNDAMENTALS OF COMPUTER DESIGNFUNDAMENTALS OF COMPUTER DESIGN
FUNDAMENTALS OF COMPUTER DESIGN
 
electronics-11-03883.pdf
electronics-11-03883.pdfelectronics-11-03883.pdf
electronics-11-03883.pdf
 
MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1
 
Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Panel: NRP Science Impacts​
Panel: NRP Science Impacts​
 
Data Plane Evolution: Towards Openness and Flexibility
Data Plane Evolution: Towards Openness and FlexibilityData Plane Evolution: Towards Openness and Flexibility
Data Plane Evolution: Towards Openness and Flexibility
 
Automatically partitioning packet processing applications for pipelined archi...
Automatically partitioning packet processing applications for pipelined archi...Automatically partitioning packet processing applications for pipelined archi...
Automatically partitioning packet processing applications for pipelined archi...
 

Plus de LEGATO project

A practical approach for updating an integrity-enforced operating system
A practical approach for updating an integrity-enforced operating systemA practical approach for updating an integrity-enforced operating system
A practical approach for updating an integrity-enforced operating systemLEGATO project
 
TEEMon: A continuous performance monitoring framework for TEEs
TEEMon: A continuous performance monitoring framework for TEEsTEEMon: A continuous performance monitoring framework for TEEs
TEEMon: A continuous performance monitoring framework for TEEsLEGATO project
 
secureTF: A Secure TensorFlow Framework
secureTF: A Secure TensorFlow FrameworksecureTF: A Secure TensorFlow Framework
secureTF: A Secure TensorFlow FrameworkLEGATO project
 
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...LEGATO project
 
LEGaTO: Machine Learning Use Case
LEGaTO: Machine Learning Use CaseLEGaTO: Machine Learning Use Case
LEGaTO: Machine Learning Use CaseLEGATO project
 
Smart Home AI at the edge
Smart Home AI at the edgeSmart Home AI at the edge
Smart Home AI at the edgeLEGATO project
 
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the projectLEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the projectLEGATO project
 
LEGaTO: Software Stack Programming Models
LEGaTO: Software Stack Programming ModelsLEGaTO: Software Stack Programming Models
LEGaTO: Software Stack Programming ModelsLEGATO project
 
LEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous HardwareLEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous HardwareLEGATO project
 
LEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing WorkshopLEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing WorkshopLEGATO project
 
Infection Research with Maxeler Dataflow Computing
Infection Research with Maxeler Dataflow ComputingInfection Research with Maxeler Dataflow Computing
Infection Research with Maxeler Dataflow ComputingLEGATO project
 
Smart Home - AI at the edge
Smart Home - AI at the edgeSmart Home - AI at the edge
Smart Home - AI at the edgeLEGATO project
 
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-ResiliencyFPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-ResiliencyLEGATO project
 
Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...LEGATO project
 
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric EnvironmentsScheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric EnvironmentsLEGATO project
 
RECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
RECS – Cloud to Edge Microserver Platform for Energy-Efficient ComputingRECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
RECS – Cloud to Edge Microserver Platform for Energy-Efficient ComputingLEGATO project
 
Secure Task-Based Programming with OmpSs and SGX
Secure Task-Based Programming with OmpSs and SGXSecure Task-Based Programming with OmpSs and SGX
Secure Task-Based Programming with OmpSs and SGXLEGATO project
 
HiPerMAb: A statistical tool for judging the potential of short fat data
HiPerMAb: A statistical tool for judging the potential of short fat dataHiPerMAb: A statistical tool for judging the potential of short fat data
HiPerMAb: A statistical tool for judging the potential of short fat dataLEGATO project
 

Plus de LEGATO project (20)

A practical approach for updating an integrity-enforced operating system
A practical approach for updating an integrity-enforced operating systemA practical approach for updating an integrity-enforced operating system
A practical approach for updating an integrity-enforced operating system
 
TEEMon: A continuous performance monitoring framework for TEEs
TEEMon: A continuous performance monitoring framework for TEEsTEEMon: A continuous performance monitoring framework for TEEs
TEEMon: A continuous performance monitoring framework for TEEs
 
secureTF: A Secure TensorFlow Framework
secureTF: A Secure TensorFlow FrameworksecureTF: A Secure TensorFlow Framework
secureTF: A Secure TensorFlow Framework
 
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
 
LEGaTO: Machine Learning Use Case
LEGaTO: Machine Learning Use CaseLEGaTO: Machine Learning Use Case
LEGaTO: Machine Learning Use Case
 
Smart Home AI at the edge
Smart Home AI at the edgeSmart Home AI at the edge
Smart Home AI at the edge
 
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the projectLEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
 
LEGaTO Integration
LEGaTO IntegrationLEGaTO Integration
LEGaTO Integration
 
LEGaTO: Use cases
LEGaTO: Use casesLEGaTO: Use cases
LEGaTO: Use cases
 
LEGaTO: Software Stack Programming Models
LEGaTO: Software Stack Programming ModelsLEGaTO: Software Stack Programming Models
LEGaTO: Software Stack Programming Models
 
LEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous HardwareLEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous Hardware
 
LEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing WorkshopLEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing Workshop
 
Infection Research with Maxeler Dataflow Computing
Infection Research with Maxeler Dataflow ComputingInfection Research with Maxeler Dataflow Computing
Infection Research with Maxeler Dataflow Computing
 
Smart Home - AI at the edge
Smart Home - AI at the edgeSmart Home - AI at the edge
Smart Home - AI at the edge
 
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-ResiliencyFPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
 
Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...
 
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric EnvironmentsScheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
 
RECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
RECS – Cloud to Edge Microserver Platform for Energy-Efficient ComputingRECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
RECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
 
Secure Task-Based Programming with OmpSs and SGX
Secure Task-Based Programming with OmpSs and SGXSecure Task-Based Programming with OmpSs and SGX
Secure Task-Based Programming with OmpSs and SGX
 
HiPerMAb: A statistical tool for judging the potential of short fat data
HiPerMAb: A statistical tool for judging the potential of short fat dataHiPerMAb: A statistical tool for judging the potential of short fat data
HiPerMAb: A statistical tool for judging the potential of short fat data
 

Dernier

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 

Dernier (20)

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

Elastic multicore scheduling with the XiTAO runtime

  • 1. Elastic multicore scheduling with the XiTAO runtime Jing Chen, Pirah Noor, Mustafa Abduljabbar, Miquel Pericàs Chalmers University of Technology Embedded Multicore Programming - Industrial state-of-the-art and future directions Edinburgh, April 17th , 2019
  • 2. 22/01/2019 HiPEAC CSW Spring 2019 2 Heterogeneous-Parallel Platforms Heterogeneity + Parallelism common in embedded platforms ● Power-efficiency, battery-constrained devices ● Examples: – ARM big.LITTLE – Nvidia Jetson TX2 (Denver2/A57/Pascal) – Dynamic heterogeneity: DVFS, interference, cache partitioning HiKEY 960 Nvidia Jetson TX2
  • 3. 04/25/19 CSW Spring 2019 3 Heterogeneity as a dynamic property Heterogeneity: cores in the system have different performance, energy-efficiency etc. Two types of heterogeneity: static and dynamic ● Static: – big.LITTLE, CPU-GPU ● Dynamic: – DVFS, cache partitioning, interference – Interference: ● Intra-process: cache, memory oversubscription ● Inter-process: cache, memory, processor timesharing ● Heterogeneity needs to be addressed dynamically by the runtime!
  • 4. 22/01/2019 HiPEAC CSW Spring 2019 4 EU LEGaTO Project • Create software stack-support for energy- efficient heterogeneous computing
  • 5. 22/01/2019 HiPEAC CSW Spring 2019 5 EU LEGaTO Project XiTAO
  • 6. 22/01/2019 HiPEAC CSW Spring 2019 6  Many applications can be expressed as mixed mode parallel applications := external task parallelism + internal data parallelism  Naturally supports hierarchy/heterogeneity in modern architectures  Challenge: how to schedule? how many resources? Mixed-mode parallelism #pragma omp parallel for... can be generalized to other forms of parallelism!
  • 7. 22/01/2019 HiPEAC CSW Spring 2019 7  Improves Parallel Slackness  Bulk creation of parallelism (low overhead)  Interference-avoidance  Constructive sharing XiTAO mixed-mode runtime 1.Schedule external task parallelism via work stealing + locally expand internal parallel tasks across multiple cores 2.Reduce inter-task interference by decoupling internal parallelism from resources: Task Assembly Objects (TAO)
  • 8. 22/01/2019 HiPEAC CSW Spring 2019 8 XiTAO application ● Example of 2D stencil execution on XiTAO w=2 w=1 Application
  • 9. 22/01/2019 HiPEAC CSW Spring 2019 9 Elastic Places: Adaptivity ● Example: Cilksort reduction on 48 cores. Dynamically resize places as external parallelism decreases and TAO working set increases ● Each colored box is a resource container, executing one TAO Quick generation of parallelism, low overheads and good isolation + constructive sharing
  • 10. 22/01/2019 HiPEAC CSW Spring 2019 10 XiTAO implementation Basic TAO class (XiTAO) User-level API for defining TAOs User-level API for defining TAO-DAGs + locality-awareness ● XiTAO is fully implemented in C++11 ● Decentralized design targeting scalability XiTAO API
  • 11. 22/01/2019 HiPEAC CSW Spring 2019 11 critical path internal DAG fixed resource container (cores, caches, ...) Task Assembly Object (TAO)external task DAG Heterogeneous scheduling Main Idea: map only those tasks to high performance cores that benefit due to criticality or due to performance characteristics Faster Cores Slower Cores Heterogeneous Platforms: HiKEY 960, Nvidia Jetson TX2 PTT schedule Performance Monitor “Performance Trace Table”
  • 12. 22/01/2019 HiPEAC CSW Spring 2019 12 Performance Trace Table (PTT) • Function: record the running time of each core in each resource width; • Aim: which is the best core and the best width to execute in the available resources, efficiently resource usage; • Implementation: table of size core_number * resource_width 1 PTT for each task type (in XiTAO: for each TAO type) Resource width := number of cores that execute a TAO
  • 13. 22/01/2019 HiPEAC CSW Spring 2019 13 Random DAGs 250 500 1000 2000 4000 Task Number 16 8 4 2 1 Parallelism 500 750 1000 1250 1500 Throughput(TAOs/s) 250 500 1000 2000 4000 Task Number 16 8 4 2 1 Parallelism 500 750 1000 1250 1500 Throughput(TAOs/s) Performance-based SchedulerPerformance-based Scheduler (PTT-based)(PTT-based) Homogeneous SchedulerHomogeneous Scheduler (random work stealing)(random work stealing) average DAG parallelism throughput (performance)  Runtime assessment of resource partitions + criticality-aware scheduling
  • 14. 22/01/2019 HiPEAC CSW Spring 2019 14 0 2 4 6 8 10 12 14 Elapsed Time [s] 0 1 2 3 4 5 6 7 8 9 Thread 8 10 12 14 16 18 20 PTTValue[ms] Interference-awareness  Detects interference episodes and migrates critical tasks tasks with multiple resources critical task schedules interference episode PTT evolution for core=0 & width=1
  • 15. ● Porting VGG-16 in Darknet framework to XiTAO Current directions: VGG-16 maxpool CONV3-64 CONV3-64 maxpool CONV3-128 CONV3-128 maxpool CONV3-256 CONV3-256 CONV3-256 CONV3-512 CONV3-512 CONV3-512 CONV3-512 CONV3-512 CONV3-512 FC-4096 FC-4096 FC-1000 maxpool maxpool GEMM GEMM GEMM GEMM GEMM GEMM GEMM GEMM GEMM GEMM GEMM GEMM GEMM GEMM maxpool softmax GEMM GEMM  TAO 0  TAO 1  TAO N..... XiTAO ● PTT automatically finds best widths to execute VGG-16 on the dual-socket Intel platform (20 cores) 69,06 90,89 66,67 53,81 30,94 5,83 3,38 1,68 3,28 0,74 14,76 29,21 29,31 0,45 0 20 40 60 80 100 2 4 8 16 PercentageofTAOsw.r.t TAO-width Number of threads 1 2 4 8 16
  • 16. 22/01/2019 HiPEAC CSW Spring 2019 16 Future Directions ● Front-ends for XiTAO – OmpSs to XiTAO – Array (tensor) programming ● Low-energy runtime optimizations ● Automatic DAG partitioning for generation of mixed-mode computations
  • 17. 22/01/2019 HiPEAC CSW Spring 2019 17 Thank you! Acknowledgements: The XiTAO team Jing Chen Pirah Noor Mustafa Abduljabbar Miquel Pericàs