SlideShare une entreprise Scribd logo
1  sur  22
Krzysztof Rojek, CTO, byteLAKE
krojek@byteLAKE.com
6TH INTERNATIONAL EULAG USERS WORKSHOP
MAY 29, 2018, 14:00-14:45
 Area: Adaptation of a real-life scientific codes to the most advanced
computing architectures.
 Challenge: Device architectures are constantly changing. Current
architectures are very various. Our codes need to be very portable
and flexible.
 Goal: Take HPC to the "Industry 4.0" by implementing smart techniques
to optimize the codes in terms of performance and energy
consumption.
2
 Piz Daint (ranked 3-rd at top 500):
 GPU: NVIDIA Tesla P100 - PASCAL
 1xGPU per node
 Single GPU design
 5320 nodes (up to 36 used in this work)
 Calculation speed: float is 2x faster
than double
 MICLAB:
 GPU: NVIDIA Tesla K80 - KEPLER
 2xGPUs per node
 Dual GPU design
 2 nodes (remaining nodes with Intel
Xeon Phi)
 Calculation speed: float is 3x faster
than double
3
 Size of data transfer between nodes: 2x less using float than double
 No access to sudo user – it makes a problem when your code is based on DVFS
Expectation: Mixed precision arithmetic allows us to reduce the energy consumption
and execution time; It can be used in the real HPC platforms (without special access)
 Stencil-based algorithm for numerical simulation of
geophysical fluids flows on micro-to-planetary scales:
 7 stencils (compressed into 4 kernels) – each
depends on one or more others (343 flops-per-el.)
 Iterative algorithm – a single iteration represents one
time step
 11 matrices:
 x, xP – scalar quantity (i.e. temperature); input/output
matrices between time steps
 v1, v2, v3, v1P, v2P, v3P – velocity vectors in i, j, and k
directions
 h – density matrix
 cp, cn – temporary, intermediate matrices
4
xP
x
xPKernel 3
Kernel 2
Kernel 1
Kernel 0 xP
v1P v2P
cp
xP
cn
v3P
Outputs:
(x,v1,v2,v3,h)
(v1,v2,v3,h,xP)
(x,h,xP,v1P,v2P,v3P)
(x,h,v1P,v2P,v3P,cp,cn)
Inputs:
 Idea: Provide a highly parametrized code in order to easily map the algorithm onto GPU
 Mapping: Select the right values of given code parameters (configuration) in terms of
desired criterion (Energy consumption)
 How to: We build the search space of possible configurations and prune it using our
machine learning module (MLM)
 MLM: It is still the ongoing task. Here we propose to apply the modified random forest
algorithm.
5
 We can use different number of:
 Streams count (SPG)
 Nodes count (NDS)
 With different topologies:
 Topology of streams (TGP)
 Topology of nodes (TDS)
6
1x1 1x2
1x1
2x1
1x1
1x1
2x1
1x2
2x2
1x1 1x2 1x3 1x4
1x1
2x1
3x1
4x1
Stream/Node: 1
Topology: 1
Streams/Nodes: 2
Topologies: 2
Streams/Nodes: 4
Topologies: 3
7
Synchronize domain
Synchronize domain
Synchronize domain
Synchronize domain
Synchronize domain
Synchronize domain
Synchronize domain
Synchronize domain
Kernel 0
Kernel 1
Kernel 2
Kernel 3
Data transfer
Kernel 0
Kernel 1
Kernel 2
Kernel 3
Data transfer
vs.
 Each stream
works on
distributed
subdomains
 Computations
are
independent
within a single
time step
 Halo exchange
is required after
each time step
 Each stream
share the same
subdomain
 Computations
depends on the
neighboring
streams
 Halo exchange
is not required
within a single
node
 By selecting the right strategy of
halo exchanging we can focus
on more parallelism or less
operations
 We can use a different strategy
within node and between nodes
8
Halo
exchange
cudaMemcpy
Buffers
GPU direct No GPU direct
No-buffers
Kernels
Single kernel Two kernels
 We takes into consideration also some basic parameters:
 CUDA block sizes for each of 4 kernels
 CUDA blocks are of size X times Y, where
 X*Y mod 32 = 0
 X>=Y
 X mod 16 =0
 X*Y <= 1024
 X<=M and Y<=N, where NxMxL is a size of the grid
 Data alignment and padding within a range from 1 to 4096 B
 Align in: {1, 2, 4, 8, …, 4096}
9
 Assumption: We believe that we can find a really good configuration by testing about 5000
configurations from the search space (more that this is too expensive)
 We consider two possible approaches:
 Positive: Find good solutions and eliminate groups that seem to be worse than ours
 Risk: When we find a branch with a good solution we can eliminate other branches (also quite good) that should
be worse. In fact we can eliminate a branch containing the best solution.
 Negative: Find bad solutions and eliminate them
 Risk: When we find branches with bad solutions we can eliminate them although the worst one can be still in (the
best one also is there).
 Fact: We test random branches (we may not select the best or the worst one); we are searching
for the suboptimal solution.
10
 Precision:
 DOUBLE
 Diameter:
 28.0
 L2 norm:
 0.0746
 Diffusion error:
 1.7503
 Phase error:
 0.7576
11
Halo exchange
(xP or x)
 Precision: DOUBLE
 Diameter: 28.0
 L2 norm: 0.0746
 Diff. err.: 1.7503
 Phase err.: 0.7576
 Precision: FLOAT
 Diameter: 28.0
 L2 norm: 0.1301
 Diff. err.: 2.2439
 Phase err.: 7.5919
12
FloatDouble
 Goal: Reduce the energy consumption
 Condition: Keep the accuracy at a high level (1% loss is
acceptable)
 Assumptions:
 The proposed method is intended to iterative algorithms
 Dynamic approach, self adaptable to a particular
simulation
 Self adaptation is done based on the short training stage
(the first 11 time steps)
13
1. Change
the i-th
matrix from
DP to SP
2. Execute a
single time
step
3. Measure
Energy and
Accuracy
4. Restore
the i-th
matrix to DP
Training stage:
Traditional approach based on static selection of precision arithmetic
is less flexible and may be too restrictive for some simulations
14
0
10
20
30
40
50
60
70
80
90
100
0/4 1/4 2/4 3/4 4/4
ENERGYCONSUMPTION[%]
PART OF A SINGLE SOLID BODY ROTATION
Change the
precision of
the i-th matrix
from DP to SP
0
10
20
30
40
50
60
70
80
90
100
0/4 1/4 2/4 3/4 4/4
ACCURACYLOSS[%]
PART OF A SINGLE SOLID BODY ROTATION
DOUBLE
Change the
precision of
the i-th matrix
from DP to SP
Should be
minimizedShould be
maximized
ΔE
ΔA
15
 Assumptions:
 ΔE – should be maximized
 ΔA – should be minimized
 Conclusion:
 R=ΔE/ΔA – the higher, the better
 Method:
 We estimate the R ratio for each matrix and set
matrices with the highest R from double to float
 This step is repeated until the accuracy loss is
lower than 1%
Set the
matrices with
the highest R
to float until
the accuracy
loss<1%
Sort them
decreasing
by R
Calculate
R=ΔE/ΔA
for each
matrix
The higher,
the better
 Precision: DOUBLE
 Diameter: 28.0
 L2 norm: 0.0746
 Diff. err.: 1.7503
 Phase err.: 0.7576
 Precision: MIXED
 Diameter: 28.0
 L2 norm: 0.0749
 Diff. err.: 1.7504
 Phase err.: 0.7576
16
Float group: x, xP, v3, h, v1P,v3P
Double group: v1, v2, v2P, cp, cn
Double
17
Double Mixed The proposed
method was also
validated for the
other tests
 The difference
between L2 norms
for double and
mixed precision is
0.00001
 The phase is
44.2135 for both
cases
 Test: 512x512x512 – 3909 time steps
18
Double 1 335.48 44
Mixed 1 255.02 1.32 35 19.79
Double 32 27.52 71
Mixed 24 21.65 1.27 48 32.63
Conclusion: Energy consumption is reduced by 33%
Best
performance
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
1 6 11 16 21 26 31 36
Gflop/s
# of nodes
Double Mixed
 Test: 512x512x512 – 3909 time steps
19
Double 1/2 533.65 80
Mixed 1/2 352.18 1.51 53 34.00
Double 4/8 144.83 87
Mixed 4/8 96.04 1.51 57 33.66
Conclusion: Energy consumption is reduced by 33%
Best
performance
0
200
400
600
800
1000
1200
1400
1600
1800
2000
1 2 3 4
Gflop/s
# of GPUs – 2 GPUs per node
Double Mixed
 The developed implementation of MPDATA is very flexible and portable
 The proposed method allows us to automate the code adaptation even for a very large
number of possible configurations
 Mixed precision arithmetic allows us to reduce the energy consumption and execution
time
 It can be used in the real HPC platforms without special access to the machine
 It has an effect on the computation speed, data transfer, and scalability of the
application
 The proposed method allows us to reduce the energy consumption by 33% without loss
in accuracy
 It has also improved the performance by the factor of 1.27 for Piz Daint and 1.51 for
MICLAB in relation to double precision arithmetic
20
We build Artificial Intelligence
software and integrate that into
products.
We port and optimize algorithms for
parallel, CPU+GPU architectures.
We design and optimize algorithms
for HPC supercomputers.
byteLAKE We are specialists in:
www.byteLAKE.com
Machine Learning
Deep Learning
Computer Vision
High Performance Computing
Heterogeneous Computing
Edge Computing
Our mission:
help industries transform for the era of Artificial Intelligence.
We combine business and academia. Our team consists of experts
schooled by Fortune 500 corporations as well as PhD researchers.
22
Areas of Our Expertise
Computer Vision
Deep Learning
Machine Learning
Learning Optimization
AI for Edge Devices
HPC
Expertise Consultancy
Proof of Concept

Contenu connexe

Tendances

Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...Pooyan Jamshidi
 
Fuzzy Self-Learning Controllers for Elasticity Management in Dynamic Cloud Ar...
Fuzzy Self-Learning Controllers for Elasticity Management in Dynamic Cloud Ar...Fuzzy Self-Learning Controllers for Elasticity Management in Dynamic Cloud Ar...
Fuzzy Self-Learning Controllers for Elasticity Management in Dynamic Cloud Ar...Pooyan Jamshidi
 
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
Transfer Learning for Improving Model Predictions in Robotic Systems
Transfer Learning for Improving Model Predictions  in Robotic SystemsTransfer Learning for Improving Model Predictions  in Robotic Systems
Transfer Learning for Improving Model Predictions in Robotic SystemsPooyan Jamshidi
 
A Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in ParallelA Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in ParallelJenny Liu
 
Optimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data PerspectiveOptimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data Perspectiveপল্লব রায়
 
Incremental collaborative filtering via evolutionary co clustering
Incremental collaborative filtering via evolutionary co clusteringIncremental collaborative filtering via evolutionary co clustering
Incremental collaborative filtering via evolutionary co clusteringAllen Wu
 
Daniel Shank, Data Scientist, Talla at MLconf SF 2016
Daniel Shank, Data Scientist, Talla at MLconf SF 2016Daniel Shank, Data Scientist, Talla at MLconf SF 2016
Daniel Shank, Data Scientist, Talla at MLconf SF 2016MLconf
 
Beyond data and model parallelism for deep neural networks
Beyond data and model parallelism for deep neural networksBeyond data and model parallelism for deep neural networks
Beyond data and model parallelism for deep neural networksJunKudo2
 
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016MLconf
 
HRNET : Deep High-Resolution Representation Learning for Human Pose Estimation
HRNET : Deep High-Resolution Representation Learning for Human Pose EstimationHRNET : Deep High-Resolution Representation Learning for Human Pose Estimation
HRNET : Deep High-Resolution Representation Learning for Human Pose Estimationtaeseon ryu
 
Training and Inference for Deep Gaussian Processes
Training and Inference for Deep Gaussian ProcessesTraining and Inference for Deep Gaussian Processes
Training and Inference for Deep Gaussian ProcessesKeyon Vafa
 
MobileNet - PR044
MobileNet - PR044MobileNet - PR044
MobileNet - PR044Jinwon Lee
 
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkEnterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkAlpine Data
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Universitat Politècnica de Catalunya
 
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in TheanoConvolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in TheanoSeongwon Hwang
 

Tendances (20)

Chapter 3 pc
Chapter 3 pcChapter 3 pc
Chapter 3 pc
 
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
 
Dg34662666
Dg34662666Dg34662666
Dg34662666
 
Fuzzy Self-Learning Controllers for Elasticity Management in Dynamic Cloud Ar...
Fuzzy Self-Learning Controllers for Elasticity Management in Dynamic Cloud Ar...Fuzzy Self-Learning Controllers for Elasticity Management in Dynamic Cloud Ar...
Fuzzy Self-Learning Controllers for Elasticity Management in Dynamic Cloud Ar...
 
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
 
Transfer Learning for Improving Model Predictions in Robotic Systems
Transfer Learning for Improving Model Predictions  in Robotic SystemsTransfer Learning for Improving Model Predictions  in Robotic Systems
Transfer Learning for Improving Model Predictions in Robotic Systems
 
A Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in ParallelA Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in Parallel
 
deep CNN vs conventional ML
deep CNN vs conventional MLdeep CNN vs conventional ML
deep CNN vs conventional ML
 
Optimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data PerspectiveOptimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data Perspective
 
Incremental collaborative filtering via evolutionary co clustering
Incremental collaborative filtering via evolutionary co clusteringIncremental collaborative filtering via evolutionary co clustering
Incremental collaborative filtering via evolutionary co clustering
 
Daniel Shank, Data Scientist, Talla at MLconf SF 2016
Daniel Shank, Data Scientist, Talla at MLconf SF 2016Daniel Shank, Data Scientist, Talla at MLconf SF 2016
Daniel Shank, Data Scientist, Talla at MLconf SF 2016
 
Beyond data and model parallelism for deep neural networks
Beyond data and model parallelism for deep neural networksBeyond data and model parallelism for deep neural networks
Beyond data and model parallelism for deep neural networks
 
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
 
HRNET : Deep High-Resolution Representation Learning for Human Pose Estimation
HRNET : Deep High-Resolution Representation Learning for Human Pose EstimationHRNET : Deep High-Resolution Representation Learning for Human Pose Estimation
HRNET : Deep High-Resolution Representation Learning for Human Pose Estimation
 
Training and Inference for Deep Gaussian Processes
Training and Inference for Deep Gaussian ProcessesTraining and Inference for Deep Gaussian Processes
Training and Inference for Deep Gaussian Processes
 
MobileNet - PR044
MobileNet - PR044MobileNet - PR044
MobileNet - PR044
 
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkEnterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using Spark
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in TheanoConvolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in Theano
 

Similaire à AI optimizing HPC simulations (presentation from 6th EULAG Workshop)

APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2Junli Gu
 
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...Fisnik Kraja
 
Efficient de cvpr_2020_paper
Efficient de cvpr_2020_paperEfficient de cvpr_2020_paper
Efficient de cvpr_2020_papershanullah3
 
byteLAKE's expertise across NVIDIA architectures and configurations
byteLAKE's expertise across NVIDIA architectures and configurationsbyteLAKE's expertise across NVIDIA architectures and configurations
byteLAKE's expertise across NVIDIA architectures and configurationsbyteLAKE
 
Parallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDAParallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDAprithan
 
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and ArchitecturesMetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and ArchitecturesMLAI2
 
Accelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsAccelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsIJMER
 
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...inside-BigData.com
 
AI On the Edge: Model Compression
AI On the Edge: Model CompressionAI On the Edge: Model Compression
AI On the Edge: Model CompressionApache MXNet
 
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloadsinside-BigData.com
 
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsUnderstand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsIntel® Software
 
Techniques in Deep Learning
Techniques in Deep LearningTechniques in Deep Learning
Techniques in Deep LearningSourya Dey
 
Colfax-Winograd-Summary _final (1)
Colfax-Winograd-Summary _final (1)Colfax-Winograd-Summary _final (1)
Colfax-Winograd-Summary _final (1)Sangamesh Ragate
 
IRJET - Design of a Low Power Serial- Parallel Multiplier with Low Transition...
IRJET - Design of a Low Power Serial- Parallel Multiplier with Low Transition...IRJET - Design of a Low Power Serial- Parallel Multiplier with Low Transition...
IRJET - Design of a Low Power Serial- Parallel Multiplier with Low Transition...IRJET Journal
 
DigitRecognition.pptx
DigitRecognition.pptxDigitRecognition.pptx
DigitRecognition.pptxruvex
 
Super Resolution with OCR Optimization
Super Resolution with OCR OptimizationSuper Resolution with OCR Optimization
Super Resolution with OCR OptimizationniveditJain
 
Paper id 25201467
Paper id 25201467Paper id 25201467
Paper id 25201467IJRAT
 
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...Edge AI and Vision Alliance
 
My Postdoctoral Research
My Postdoctoral ResearchMy Postdoctoral Research
My Postdoctoral ResearchPo-Ting Wu
 
PARTIAL PRODUCT ARRAY HEIGHT REDUCTION USING RADIX-16 FOR 64-BIT BOOTH MULTI...
PARTIAL PRODUCT ARRAY HEIGHT REDUCTION USING RADIX-16 FOR 64-BIT BOOTH MULTI...PARTIAL PRODUCT ARRAY HEIGHT REDUCTION USING RADIX-16 FOR 64-BIT BOOTH MULTI...
PARTIAL PRODUCT ARRAY HEIGHT REDUCTION USING RADIX-16 FOR 64-BIT BOOTH MULTI...Hari M
 

Similaire à AI optimizing HPC simulations (presentation from 6th EULAG Workshop) (20)

APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2
 
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
 
Efficient de cvpr_2020_paper
Efficient de cvpr_2020_paperEfficient de cvpr_2020_paper
Efficient de cvpr_2020_paper
 
byteLAKE's expertise across NVIDIA architectures and configurations
byteLAKE's expertise across NVIDIA architectures and configurationsbyteLAKE's expertise across NVIDIA architectures and configurations
byteLAKE's expertise across NVIDIA architectures and configurations
 
Parallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDAParallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDA
 
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and ArchitecturesMetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
 
Accelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsAccelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous Platforms
 
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
 
AI On the Edge: Model Compression
AI On the Edge: Model CompressionAI On the Edge: Model Compression
AI On the Edge: Model Compression
 
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloads
 
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsUnderstand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
 
Techniques in Deep Learning
Techniques in Deep LearningTechniques in Deep Learning
Techniques in Deep Learning
 
Colfax-Winograd-Summary _final (1)
Colfax-Winograd-Summary _final (1)Colfax-Winograd-Summary _final (1)
Colfax-Winograd-Summary _final (1)
 
IRJET - Design of a Low Power Serial- Parallel Multiplier with Low Transition...
IRJET - Design of a Low Power Serial- Parallel Multiplier with Low Transition...IRJET - Design of a Low Power Serial- Parallel Multiplier with Low Transition...
IRJET - Design of a Low Power Serial- Parallel Multiplier with Low Transition...
 
DigitRecognition.pptx
DigitRecognition.pptxDigitRecognition.pptx
DigitRecognition.pptx
 
Super Resolution with OCR Optimization
Super Resolution with OCR OptimizationSuper Resolution with OCR Optimization
Super Resolution with OCR Optimization
 
Paper id 25201467
Paper id 25201467Paper id 25201467
Paper id 25201467
 
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
 
My Postdoctoral Research
My Postdoctoral ResearchMy Postdoctoral Research
My Postdoctoral Research
 
PARTIAL PRODUCT ARRAY HEIGHT REDUCTION USING RADIX-16 FOR 64-BIT BOOTH MULTI...
PARTIAL PRODUCT ARRAY HEIGHT REDUCTION USING RADIX-16 FOR 64-BIT BOOTH MULTI...PARTIAL PRODUCT ARRAY HEIGHT REDUCTION USING RADIX-16 FOR 64-BIT BOOTH MULTI...
PARTIAL PRODUCT ARRAY HEIGHT REDUCTION USING RADIX-16 FOR 64-BIT BOOTH MULTI...
 

Plus de byteLAKE

byteLAKE's AI Products (use cases) (short)
byteLAKE's AI Products (use cases) (short)byteLAKE's AI Products (use cases) (short)
byteLAKE's AI Products (use cases) (short)byteLAKE
 
byteLAKE's AI Products (use cases) - presentation
byteLAKE's AI Products (use cases) - presentationbyteLAKE's AI Products (use cases) - presentation
byteLAKE's AI Products (use cases) - presentationbyteLAKE
 
byteLAKE's AI Products for Industries (2024-02)
byteLAKE's AI Products for Industries (2024-02)byteLAKE's AI Products for Industries (2024-02)
byteLAKE's AI Products for Industries (2024-02)byteLAKE
 
byteLAKE's CFD Suite (AI-accelerated CFD) (2024-02)
byteLAKE's CFD Suite (AI-accelerated CFD) (2024-02)byteLAKE's CFD Suite (AI-accelerated CFD) (2024-02)
byteLAKE's CFD Suite (AI-accelerated CFD) (2024-02)byteLAKE
 
AI Solutions for Industries | Quality Inspection | Data Insights | Predictive...
AI Solutions for Industries | Quality Inspection | Data Insights | Predictive...AI Solutions for Industries | Quality Inspection | Data Insights | Predictive...
AI Solutions for Industries | Quality Inspection | Data Insights | Predictive...byteLAKE
 
Self-Checkout for Restaurants / AI Restaurants (2024-02)
Self-Checkout for Restaurants / AI Restaurants (2024-02)Self-Checkout for Restaurants / AI Restaurants (2024-02)
Self-Checkout for Restaurants / AI Restaurants (2024-02)byteLAKE
 
Self-Checkout (AI for Restautants) - case study by byteLAKE's partner: Simpra
Self-Checkout (AI for Restautants) - case study by byteLAKE's partner: SimpraSelf-Checkout (AI for Restautants) - case study by byteLAKE's partner: Simpra
Self-Checkout (AI for Restautants) - case study by byteLAKE's partner: SimprabyteLAKE
 
byteLAKE: Sztuczna Inteligencja dla Przemysłu i Usług
byteLAKE: Sztuczna Inteligencja dla Przemysłu i UsługbyteLAKE: Sztuczna Inteligencja dla Przemysłu i Usług
byteLAKE: Sztuczna Inteligencja dla Przemysłu i UsługbyteLAKE
 
Przegląd zastosowań sztucznej inteligencji (2024-01)
Przegląd zastosowań sztucznej inteligencji (2024-01)Przegląd zastosowań sztucznej inteligencji (2024-01)
Przegląd zastosowań sztucznej inteligencji (2024-01)byteLAKE
 
Przegląd zastosowań Sztucznej inteligencjI
Przegląd zastosowań Sztucznej inteligencjIPrzegląd zastosowań Sztucznej inteligencjI
Przegląd zastosowań Sztucznej inteligencjIbyteLAKE
 
AI Solutions for Industries
AI Solutions for IndustriesAI Solutions for Industries
AI Solutions for IndustriesbyteLAKE
 
AI-accelerated CFD (Computational Fluid Dynamics)
AI-accelerated CFD (Computational Fluid Dynamics)AI-accelerated CFD (Computational Fluid Dynamics)
AI-accelerated CFD (Computational Fluid Dynamics)byteLAKE
 
Advanced Quality Inspection and Data Insights (Artificial Intelligence)
Advanced Quality Inspection and Data Insights (Artificial Intelligence)Advanced Quality Inspection and Data Insights (Artificial Intelligence)
Advanced Quality Inspection and Data Insights (Artificial Intelligence)byteLAKE
 
AI Solutions for Industries (short)
AI Solutions for Industries (short)AI Solutions for Industries (short)
AI Solutions for Industries (short)byteLAKE
 
Self-Checkout (AI for Restautants)
Self-Checkout (AI for Restautants)Self-Checkout (AI for Restautants)
Self-Checkout (AI for Restautants)byteLAKE
 
Applying Industrial AI Models to Product Quality Inspection
Applying Industrial AI Models to Product Quality InspectionApplying Industrial AI Models to Product Quality Inspection
Applying Industrial AI Models to Product Quality InspectionbyteLAKE
 
byteLAKE and Intel Partnership
byteLAKE and Intel PartnershipbyteLAKE and Intel Partnership
byteLAKE and Intel PartnershipbyteLAKE
 
CFD Suite (AI-accelerated CFD) - Sztuczna Inteligencja Przyspiesza Symulacje ...
CFD Suite (AI-accelerated CFD) - Sztuczna Inteligencja Przyspiesza Symulacje ...CFD Suite (AI-accelerated CFD) - Sztuczna Inteligencja Przyspiesza Symulacje ...
CFD Suite (AI-accelerated CFD) - Sztuczna Inteligencja Przyspiesza Symulacje ...byteLAKE
 
byteLAKE's Scan&GO - Self-Check-Out Solution for Retail (EuroShop'23)
byteLAKE's Scan&GO - Self-Check-Out Solution for Retail (EuroShop'23)byteLAKE's Scan&GO - Self-Check-Out Solution for Retail (EuroShop'23)
byteLAKE's Scan&GO - Self-Check-Out Solution for Retail (EuroShop'23)byteLAKE
 
Empowering Industries with byteLAKE's High-Performance AI
Empowering Industries with byteLAKE's High-Performance AIEmpowering Industries with byteLAKE's High-Performance AI
Empowering Industries with byteLAKE's High-Performance AIbyteLAKE
 

Plus de byteLAKE (20)

byteLAKE's AI Products (use cases) (short)
byteLAKE's AI Products (use cases) (short)byteLAKE's AI Products (use cases) (short)
byteLAKE's AI Products (use cases) (short)
 
byteLAKE's AI Products (use cases) - presentation
byteLAKE's AI Products (use cases) - presentationbyteLAKE's AI Products (use cases) - presentation
byteLAKE's AI Products (use cases) - presentation
 
byteLAKE's AI Products for Industries (2024-02)
byteLAKE's AI Products for Industries (2024-02)byteLAKE's AI Products for Industries (2024-02)
byteLAKE's AI Products for Industries (2024-02)
 
byteLAKE's CFD Suite (AI-accelerated CFD) (2024-02)
byteLAKE's CFD Suite (AI-accelerated CFD) (2024-02)byteLAKE's CFD Suite (AI-accelerated CFD) (2024-02)
byteLAKE's CFD Suite (AI-accelerated CFD) (2024-02)
 
AI Solutions for Industries | Quality Inspection | Data Insights | Predictive...
AI Solutions for Industries | Quality Inspection | Data Insights | Predictive...AI Solutions for Industries | Quality Inspection | Data Insights | Predictive...
AI Solutions for Industries | Quality Inspection | Data Insights | Predictive...
 
Self-Checkout for Restaurants / AI Restaurants (2024-02)
Self-Checkout for Restaurants / AI Restaurants (2024-02)Self-Checkout for Restaurants / AI Restaurants (2024-02)
Self-Checkout for Restaurants / AI Restaurants (2024-02)
 
Self-Checkout (AI for Restautants) - case study by byteLAKE's partner: Simpra
Self-Checkout (AI for Restautants) - case study by byteLAKE's partner: SimpraSelf-Checkout (AI for Restautants) - case study by byteLAKE's partner: Simpra
Self-Checkout (AI for Restautants) - case study by byteLAKE's partner: Simpra
 
byteLAKE: Sztuczna Inteligencja dla Przemysłu i Usług
byteLAKE: Sztuczna Inteligencja dla Przemysłu i UsługbyteLAKE: Sztuczna Inteligencja dla Przemysłu i Usług
byteLAKE: Sztuczna Inteligencja dla Przemysłu i Usług
 
Przegląd zastosowań sztucznej inteligencji (2024-01)
Przegląd zastosowań sztucznej inteligencji (2024-01)Przegląd zastosowań sztucznej inteligencji (2024-01)
Przegląd zastosowań sztucznej inteligencji (2024-01)
 
Przegląd zastosowań Sztucznej inteligencjI
Przegląd zastosowań Sztucznej inteligencjIPrzegląd zastosowań Sztucznej inteligencjI
Przegląd zastosowań Sztucznej inteligencjI
 
AI Solutions for Industries
AI Solutions for IndustriesAI Solutions for Industries
AI Solutions for Industries
 
AI-accelerated CFD (Computational Fluid Dynamics)
AI-accelerated CFD (Computational Fluid Dynamics)AI-accelerated CFD (Computational Fluid Dynamics)
AI-accelerated CFD (Computational Fluid Dynamics)
 
Advanced Quality Inspection and Data Insights (Artificial Intelligence)
Advanced Quality Inspection and Data Insights (Artificial Intelligence)Advanced Quality Inspection and Data Insights (Artificial Intelligence)
Advanced Quality Inspection and Data Insights (Artificial Intelligence)
 
AI Solutions for Industries (short)
AI Solutions for Industries (short)AI Solutions for Industries (short)
AI Solutions for Industries (short)
 
Self-Checkout (AI for Restautants)
Self-Checkout (AI for Restautants)Self-Checkout (AI for Restautants)
Self-Checkout (AI for Restautants)
 
Applying Industrial AI Models to Product Quality Inspection
Applying Industrial AI Models to Product Quality InspectionApplying Industrial AI Models to Product Quality Inspection
Applying Industrial AI Models to Product Quality Inspection
 
byteLAKE and Intel Partnership
byteLAKE and Intel PartnershipbyteLAKE and Intel Partnership
byteLAKE and Intel Partnership
 
CFD Suite (AI-accelerated CFD) - Sztuczna Inteligencja Przyspiesza Symulacje ...
CFD Suite (AI-accelerated CFD) - Sztuczna Inteligencja Przyspiesza Symulacje ...CFD Suite (AI-accelerated CFD) - Sztuczna Inteligencja Przyspiesza Symulacje ...
CFD Suite (AI-accelerated CFD) - Sztuczna Inteligencja Przyspiesza Symulacje ...
 
byteLAKE's Scan&GO - Self-Check-Out Solution for Retail (EuroShop'23)
byteLAKE's Scan&GO - Self-Check-Out Solution for Retail (EuroShop'23)byteLAKE's Scan&GO - Self-Check-Out Solution for Retail (EuroShop'23)
byteLAKE's Scan&GO - Self-Check-Out Solution for Retail (EuroShop'23)
 
Empowering Industries with byteLAKE's High-Performance AI
Empowering Industries with byteLAKE's High-Performance AIEmpowering Industries with byteLAKE's High-Performance AI
Empowering Industries with byteLAKE's High-Performance AI
 

Dernier

Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Servicemeghakumariji156
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiessarkmank1
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayEpec Engineered Technologies
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdfAldoGarca30
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.Kamal Acharya
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsvanyagupta248
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdfKamal Acharya
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwaitjaanualu31
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdfKamal Acharya
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilVinayVitekari
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxSCMS School of Architecture
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdfKamal Acharya
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersMairaAshraf6
 
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEGEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEselvakumar948
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxmaisarahman1
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadhamedmustafa094
 

Dernier (20)

Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and properties
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech Civil
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEGEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 

AI optimizing HPC simulations (presentation from 6th EULAG Workshop)

  • 1. Krzysztof Rojek, CTO, byteLAKE krojek@byteLAKE.com 6TH INTERNATIONAL EULAG USERS WORKSHOP MAY 29, 2018, 14:00-14:45
  • 2.  Area: Adaptation of a real-life scientific codes to the most advanced computing architectures.  Challenge: Device architectures are constantly changing. Current architectures are very various. Our codes need to be very portable and flexible.  Goal: Take HPC to the "Industry 4.0" by implementing smart techniques to optimize the codes in terms of performance and energy consumption. 2
  • 3.  Piz Daint (ranked 3-rd at top 500):  GPU: NVIDIA Tesla P100 - PASCAL  1xGPU per node  Single GPU design  5320 nodes (up to 36 used in this work)  Calculation speed: float is 2x faster than double  MICLAB:  GPU: NVIDIA Tesla K80 - KEPLER  2xGPUs per node  Dual GPU design  2 nodes (remaining nodes with Intel Xeon Phi)  Calculation speed: float is 3x faster than double 3  Size of data transfer between nodes: 2x less using float than double  No access to sudo user – it makes a problem when your code is based on DVFS Expectation: Mixed precision arithmetic allows us to reduce the energy consumption and execution time; It can be used in the real HPC platforms (without special access)
  • 4.  Stencil-based algorithm for numerical simulation of geophysical fluids flows on micro-to-planetary scales:  7 stencils (compressed into 4 kernels) – each depends on one or more others (343 flops-per-el.)  Iterative algorithm – a single iteration represents one time step  11 matrices:  x, xP – scalar quantity (i.e. temperature); input/output matrices between time steps  v1, v2, v3, v1P, v2P, v3P – velocity vectors in i, j, and k directions  h – density matrix  cp, cn – temporary, intermediate matrices 4 xP x xPKernel 3 Kernel 2 Kernel 1 Kernel 0 xP v1P v2P cp xP cn v3P Outputs: (x,v1,v2,v3,h) (v1,v2,v3,h,xP) (x,h,xP,v1P,v2P,v3P) (x,h,v1P,v2P,v3P,cp,cn) Inputs:
  • 5.  Idea: Provide a highly parametrized code in order to easily map the algorithm onto GPU  Mapping: Select the right values of given code parameters (configuration) in terms of desired criterion (Energy consumption)  How to: We build the search space of possible configurations and prune it using our machine learning module (MLM)  MLM: It is still the ongoing task. Here we propose to apply the modified random forest algorithm. 5
  • 6.  We can use different number of:  Streams count (SPG)  Nodes count (NDS)  With different topologies:  Topology of streams (TGP)  Topology of nodes (TDS) 6 1x1 1x2 1x1 2x1 1x1 1x1 2x1 1x2 2x2 1x1 1x2 1x3 1x4 1x1 2x1 3x1 4x1 Stream/Node: 1 Topology: 1 Streams/Nodes: 2 Topologies: 2 Streams/Nodes: 4 Topologies: 3
  • 7. 7 Synchronize domain Synchronize domain Synchronize domain Synchronize domain Synchronize domain Synchronize domain Synchronize domain Synchronize domain Kernel 0 Kernel 1 Kernel 2 Kernel 3 Data transfer Kernel 0 Kernel 1 Kernel 2 Kernel 3 Data transfer vs.  Each stream works on distributed subdomains  Computations are independent within a single time step  Halo exchange is required after each time step  Each stream share the same subdomain  Computations depends on the neighboring streams  Halo exchange is not required within a single node
  • 8.  By selecting the right strategy of halo exchanging we can focus on more parallelism or less operations  We can use a different strategy within node and between nodes 8 Halo exchange cudaMemcpy Buffers GPU direct No GPU direct No-buffers Kernels Single kernel Two kernels
  • 9.  We takes into consideration also some basic parameters:  CUDA block sizes for each of 4 kernels  CUDA blocks are of size X times Y, where  X*Y mod 32 = 0  X>=Y  X mod 16 =0  X*Y <= 1024  X<=M and Y<=N, where NxMxL is a size of the grid  Data alignment and padding within a range from 1 to 4096 B  Align in: {1, 2, 4, 8, …, 4096} 9
  • 10.  Assumption: We believe that we can find a really good configuration by testing about 5000 configurations from the search space (more that this is too expensive)  We consider two possible approaches:  Positive: Find good solutions and eliminate groups that seem to be worse than ours  Risk: When we find a branch with a good solution we can eliminate other branches (also quite good) that should be worse. In fact we can eliminate a branch containing the best solution.  Negative: Find bad solutions and eliminate them  Risk: When we find branches with bad solutions we can eliminate them although the worst one can be still in (the best one also is there).  Fact: We test random branches (we may not select the best or the worst one); we are searching for the suboptimal solution. 10
  • 11.  Precision:  DOUBLE  Diameter:  28.0  L2 norm:  0.0746  Diffusion error:  1.7503  Phase error:  0.7576 11 Halo exchange (xP or x)
  • 12.  Precision: DOUBLE  Diameter: 28.0  L2 norm: 0.0746  Diff. err.: 1.7503  Phase err.: 0.7576  Precision: FLOAT  Diameter: 28.0  L2 norm: 0.1301  Diff. err.: 2.2439  Phase err.: 7.5919 12 FloatDouble
  • 13.  Goal: Reduce the energy consumption  Condition: Keep the accuracy at a high level (1% loss is acceptable)  Assumptions:  The proposed method is intended to iterative algorithms  Dynamic approach, self adaptable to a particular simulation  Self adaptation is done based on the short training stage (the first 11 time steps) 13 1. Change the i-th matrix from DP to SP 2. Execute a single time step 3. Measure Energy and Accuracy 4. Restore the i-th matrix to DP Training stage: Traditional approach based on static selection of precision arithmetic is less flexible and may be too restrictive for some simulations
  • 14. 14 0 10 20 30 40 50 60 70 80 90 100 0/4 1/4 2/4 3/4 4/4 ENERGYCONSUMPTION[%] PART OF A SINGLE SOLID BODY ROTATION Change the precision of the i-th matrix from DP to SP 0 10 20 30 40 50 60 70 80 90 100 0/4 1/4 2/4 3/4 4/4 ACCURACYLOSS[%] PART OF A SINGLE SOLID BODY ROTATION DOUBLE Change the precision of the i-th matrix from DP to SP Should be minimizedShould be maximized ΔE ΔA
  • 15. 15  Assumptions:  ΔE – should be maximized  ΔA – should be minimized  Conclusion:  R=ΔE/ΔA – the higher, the better  Method:  We estimate the R ratio for each matrix and set matrices with the highest R from double to float  This step is repeated until the accuracy loss is lower than 1% Set the matrices with the highest R to float until the accuracy loss<1% Sort them decreasing by R Calculate R=ΔE/ΔA for each matrix The higher, the better
  • 16.  Precision: DOUBLE  Diameter: 28.0  L2 norm: 0.0746  Diff. err.: 1.7503  Phase err.: 0.7576  Precision: MIXED  Diameter: 28.0  L2 norm: 0.0749  Diff. err.: 1.7504  Phase err.: 0.7576 16 Float group: x, xP, v3, h, v1P,v3P Double group: v1, v2, v2P, cp, cn Double
  • 17. 17 Double Mixed The proposed method was also validated for the other tests  The difference between L2 norms for double and mixed precision is 0.00001  The phase is 44.2135 for both cases
  • 18.  Test: 512x512x512 – 3909 time steps 18 Double 1 335.48 44 Mixed 1 255.02 1.32 35 19.79 Double 32 27.52 71 Mixed 24 21.65 1.27 48 32.63 Conclusion: Energy consumption is reduced by 33% Best performance 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 1 6 11 16 21 26 31 36 Gflop/s # of nodes Double Mixed
  • 19.  Test: 512x512x512 – 3909 time steps 19 Double 1/2 533.65 80 Mixed 1/2 352.18 1.51 53 34.00 Double 4/8 144.83 87 Mixed 4/8 96.04 1.51 57 33.66 Conclusion: Energy consumption is reduced by 33% Best performance 0 200 400 600 800 1000 1200 1400 1600 1800 2000 1 2 3 4 Gflop/s # of GPUs – 2 GPUs per node Double Mixed
  • 20.  The developed implementation of MPDATA is very flexible and portable  The proposed method allows us to automate the code adaptation even for a very large number of possible configurations  Mixed precision arithmetic allows us to reduce the energy consumption and execution time  It can be used in the real HPC platforms without special access to the machine  It has an effect on the computation speed, data transfer, and scalability of the application  The proposed method allows us to reduce the energy consumption by 33% without loss in accuracy  It has also improved the performance by the factor of 1.27 for Piz Daint and 1.51 for MICLAB in relation to double precision arithmetic 20
  • 21. We build Artificial Intelligence software and integrate that into products. We port and optimize algorithms for parallel, CPU+GPU architectures. We design and optimize algorithms for HPC supercomputers. byteLAKE We are specialists in: www.byteLAKE.com Machine Learning Deep Learning Computer Vision High Performance Computing Heterogeneous Computing Edge Computing Our mission: help industries transform for the era of Artificial Intelligence. We combine business and academia. Our team consists of experts schooled by Fortune 500 corporations as well as PhD researchers.
  • 22. 22 Areas of Our Expertise Computer Vision Deep Learning Machine Learning Learning Optimization AI for Edge Devices HPC Expertise Consultancy Proof of Concept