SlideShare une entreprise Scribd logo
1  sur  40
Télécharger pour lire hors ligne
Extreme Computing for Extreme Adaptive Optics:
the Key to Finding Life Outside our Solar System
H. Ltaief1, D. Sukkari1, O. Guyon2,3,4, and D. Keyes1
1Extreme Computing Research Center, KAUST, Saudi Arabia
3Steward Observatory, University of Arizona, Tucson, USA
2National Institutes of Natural Sciences, Tokyo, Japan
4National Astronomical Observatory of Japan, Subaru Telescope
HL, DS, OG, DK QDWH-Based Partial SVD 1 / 40
Outline
1 Motivation
2 State-of-the-art Approach
3 QDWH-Based Partial SVD
4 Environment Settings
5 Numerical Accuracy
6 Performance Results
7 Conclusion and Future Work
HL, DS, OG, DK QDWH-Based Partial SVD 2 / 40
Outline
1 Motivation
2 State-of-the-art Approach
3 QDWH-Based Partial SVD
4 Environment Settings
5 Numerical Accuracy
6 Performance Results
7 Conclusion and Future Work
HL, DS, OG, DK QDWH-Based Partial SVD 3 / 40
The Subaru Telescope
Represents a flagship telescope of the National Astronomical
Observatory of Japan
Carries a 8.2-meter (320in) diameter telescope
Contains a high-contrast imaging system for directly imaging
exoplanets
Operates perhaps the most advanced HPC facility in computational
astronomy
Lives at the Mauna Kea Observatory in Hawaii
HL, DS, OG, DK QDWH-Based Partial SVD 4 / 40
The Subaru Telescope
HL, DS, OG, DK QDWH-Based Partial SVD 5 / 40
(Perhaps) The Highest in Altitude GPU System Recorded
at 14,000 feet!
HL, DS, OG, DK QDWH-Based Partial SVD 6 / 40
The Atmosphere Turbulence and The Optical Aberration
HL, DS, OG, DK QDWH-Based Partial SVD 7 / 40
The Astronomical Challenge
Turbulence in the atmosphere limits the
performance of astronomical telescopes
Without active correction of such defects, images
would be blurred to approximately one arcsecond
angle
To recover the loss of angular resolution, adaptive
optics (AO) systems measure and correct
atmospheric turbulence
In the absence of optical aberrations, the telescope
should provide λ
D angular resolution (D:telescope
diameter, λ wavelength)
HL, DS, OG, DK QDWH-Based Partial SVD 8 / 40
Adaptive Optics 101
Wavefront sensor(s)
(WFS)
Measure details of
blurring from ’guide
star’ near the object
you want to observe
A real time controller
(RTC)
Processes the WFS
signals to compute
the control matrix
based on the pseudo
inverse
Light from both guide
star and astronomical
object is reflected
from deformable
mirror; distortions are
removed
https : //www.uniдe.ch/sciences/astro/index .php/download_f ile/view/34/168/
HL, DS, OG, DK QDWH-Based Partial SVD 9 / 40
How AO Works?
HL, DS, OG, DK QDWH-Based Partial SVD 10 / 40
And Here Comes the Linear Algebra...
Compute the pseudo inverse A+
AA+
A = A ,A ∈ Rm×n
(m ≥ n)
The numerical challenge of the pseudo inverse are twofold:
Numerical: dealing with rectangular matrix which may engender
numerical instabilities
A A = V ΛV
Computational: high algorithmic complexity, it should still be able to
keep up with the overall throughput of the AO framework
Using SVD: A = U ΣV then:
A+
= V Σ−1
U
Only most significant singular values with their associated singular
vectors are required (≈10%)
HL, DS, OG, DK QDWH-Based Partial SVD 11 / 40
Outline
1 Motivation
2 State-of-the-art Approach
3 QDWH-Based Partial SVD
4 Environment Settings
5 Numerical Accuracy
6 Performance Results
7 Conclusion and Future Work
HL, DS, OG, DK QDWH-Based Partial SVD 12 / 40
SVD Algorithm - Standard Approach (LAPACK)
This forms the SGESDD (divide and conquer algorithm) for the computation
of SVD.
Bidiagonal Reduction (8/3n3) SGESDD (Σ)(8/3n3) SGESDD (UΣV )(22n3)
Level-2 BLAS (4/3n3)
50%flops 50%flops 6%flops
90%time 85%time 30%time
HL, DS, OG, DK QDWH-Based Partial SVD 13 / 40
Hardware Trends: Energy Matters!
2011 2018
DP FLOP 100 pJ 10 pJ
DP DRAM Read 4800 pJ 1920 pJ
Local interconnect 7500 pJ 2500 pJ
Cross system 9000 pJ 3500 pJ
John Shalf, LBNL
HL, DS, OG, DK QDWH-Based Partial SVD 14 / 40
The Big Picture (Similar w/ SVD)
Cray
LibSci17.11.1
HL, DS, OG, DK QDWH-Based Partial SVD 15 / 40
Outline
1 Motivation
2 State-of-the-art Approach
3 QDWH-Based Partial SVD
4 Environment Settings
5 Numerical Accuracy
6 Performance Results
7 Conclusion and Future Work
HL, DS, OG, DK QDWH-Based Partial SVD 16 / 40
What is The Polar Decomposition?
The polar decomposition:
A = UpH , A ∈ Rm×n
(m ≥ n),
where Up is an orthogonal matrix and H =
√
A A is a symmetric positive
semidefinite matrix
The polar decomposition is a critical numerical algorithm for various
applications, including aerospace computations, chemistry, factor
analysis
HL, DS, OG, DK QDWH-Based Partial SVD 17 / 40
QDWH Polar Decomposition Algorithm
The QR-Dynamically Weighted Halley iterations:
X0 = A/α,
√
ckXk
I
=
Q1
Q2
R, Xk+1 =
bk
ck
Xk +
1
√
ck
ak −
bk
ck
Q1Q2
, k ≥ 0
The iterative procedure converges:
A = UpH,
where, UpUp = In, H is symmetric positive semidefinite
Backward stable algorithm for computing the polar decomposition
Based on conventional computational kernels, i.e., Cholesky/QR
factorizations (≤ 6 iterations for double precision) and GEMM
HL, DS, OG, DK QDWH-Based Partial SVD 18 / 40
Numerical Algorithm
Algorithm 1 Pseudo-Inverse using the QDWH-Based Partial SVD.
Compute the polar decomposition A = UpH using QDWH
Calculate [Q R] = QR(Up + Id)
Find the index ind = min(f ind(abs(diaд(R)) < threshold))
Extract ˜Q = Q(:,ind : end)
Reduce the original matrix problem ˜A = A × ˜Q
Compute the SVD of the reduced matrix problem ˜A = U Σ ˜VT
Compute the right singular vectors V = ˜QT × ˜V
Calculate the pseudo-inverse A+ = V Σ−1UT
HL, DS, OG, DK QDWH-Based Partial SVD 19 / 40
Algorithmic Complexity
Standard QDWH-based QDWH-based
SVD Full SVD Partial SVD
QDWH: (4+1/3)Nn3 x #itChol
Algorithmic 22Nn3 43Nn3 QR and GEMM: 4/3Nn3 + 2sNn2 + 2Nns2
complexity SVD: 22s3
Where, Nn is the matrix size, and s is the number of the selected singular values/vectors
(s << Nn)
HL, DS, OG, DK QDWH-Based Partial SVD 20 / 40
Outline
1 Motivation
2 State-of-the-art Approach
3 QDWH-Based Partial SVD
4 Environment Settings
5 Numerical Accuracy
6 Performance Results
7 Conclusion and Future Work
HL, DS, OG, DK QDWH-Based Partial SVD 21 / 40
Environment Settings
Software:
GCC compilers
MAGMA v2.3 and CUDA v9.0 (including cuBLAS)
Single precision (SP) arithmetics is used
Ill-conditioned matrices generated using SLATMS MAGMA routine
Hardware:
The K80 GPU:
12GB of memory,
Two-socket 14-core system
Intel Broadwell system with 128GB of main memory
The P100 and V100 GPUs:
16GB of memory
Two-socket 16-core system
Intel Haswell systems with 128GB of main memory
HL, DS, OG, DK QDWH-Based Partial SVD 22 / 40
Outline
1 Motivation
2 State-of-the-art Approach
3 QDWH-Based Partial SVD
4 Environment Settings
5 Numerical Accuracy
6 Performance Results
7 Conclusion and Future Work
HL, DS, OG, DK QDWH-Based Partial SVD 23 / 40
Synthetic Ill-Conditioned matrices, K80
1e-11
1e-10
1e-09
1e-08
1e-07
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
1000
2000
3000
4000
5000
6000
7000
8000
9000
1000011000120001300014000150001600017000180001900020000
AccuracySingularValues
Matrix size
SGESDD
QDWHpartial, 13% SVD
QDWHpartial, 13% SVD, QR+PO
QDWHpartial, 10% SVD
QDWHpartial, 7% SVD
QDWHpartial, 3% SVD
(a) Singular Value Accuracy.
1e-11
1e-10
1e-09
1e-08
1e-07
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
1000
2000
3000
4000
5000
6000
7000
8000
9000
1000011000120001300014000150001600017000180001900020000
ResidualofSVD Matrix Size
QDWHpartial, Left, 13% SVD
QDWHpartial, Right, 13% SVD
QDWHpartial, Left, 13% SVD, QR+PO
QDWHpartial, Right, 13% SVD, QR+PO
QDWHpartial, Left, 10% SVD
QDWHpartial, Right, 10% SVD
1e-11
1e-10
1e-09
1e-08
1e-07
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
1000
2000
3000
4000
5000
6000
7000
8000
9000
1000011000120001300014000150001600017000180001900020000
ResidualofSVD Matrix Size
QDWHpartial, Left, 7% SVD
QDWHpartial, Right, 7% SVD
QDWHpartial, Left, 3% SVD
QDWHpartial, Right, 3% SVD
SGESDD, Left
SGESDD, Right
(b) Backward Error.
HL, DS, OG, DK QDWH-Based Partial SVD 24 / 40
Real Observational Datasets, K80
1e-11
1e-10
1e-09
1e-08
1e-07
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
1161 5805 11610 17415
AccuracySingularValues
Matrix size
SGESDD
QDWHpartial
(c) Singular Value Accuracy.
1e-12
1e-11
1e-10
1e-09
1e-08
1e-07
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
1161 5805 11610 17415
ResidualofSVD Matrix size
Right
Left
SGESDD, Left
SGESDD, Right
(d) Backward Error.
HL, DS, OG, DK QDWH-Based Partial SVD 25 / 40
Outline
1 Motivation
2 State-of-the-art Approach
3 QDWH-Based Partial SVD
4 Environment Settings
5 Numerical Accuracy
6 Performance Results
7 Conclusion and Future Work
HL, DS, OG, DK QDWH-Based Partial SVD 26 / 40
Synthetic Ill-Conditioned matrices, K80
0.1
1
10
100
1000
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000
12000
13000
14000
15000
16000
17000
18000
19000
20000
Time(s)
Matrix size
SGESDD
QDWHpartial, 13% SVD, QR+PO
QDWHpartial, 13% SVD
QDWHpartial, 10% SVD
QDWHpartial, 7% SVD
QDWHpartial, 3% SVD
(e) In Seconds.
0
500
1000
1500
2000
2500
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000
12000
13000
14000
15000
16000
17000
18000
19000
20000
Gflop/s
Matrix size
QDWHpartial, 3% SVD
QDWHpartial, 7% SVD
QDWHpartial, 10% SVD
QDWHpartial, 13% SVD
QDWHpartial, 13% SVD, QR+PO
SGESDD
(f) In Gflops/s.
Up to 3X speedup, 1.8Tflop/s, 45% of the theoretical peak performance
HL, DS, OG, DK QDWH-Based Partial SVD 27 / 40
Synthetic Ill-Conditioned matrices, P100
0.01
0.1
1
10
100
1000
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000
12000
13000
14000
15000
16000
17000
18000
19000
20000
Time(s)
Matrix size
SGESDD
QDWHpartial, 13% SVD, QR+PO
QDWHpartial, 13% SVD
QDWHpartial, 10% SVD
QDWHpartial, 7% SVD
QDWHpartial, 3% SVD
(g) In Seconds.
0
1000
2000
3000
4000
5000
6000
7000
8000
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000
12000
13000
14000
15000
16000
17000
18000
19000
20000
Gflop/s
Matrix size
QDWHpartial, 3% SVD
QDWHpartial, 7% SVD
QDWHpartial, 10% SVD
QDWHpartial, 13% SVD
QDWHpartial, 13% SVD, QR+PO
SGESDD
(h) In Gflops/s.
Up to 4X speedup, 7Tflop/s, 75% of the theoretical peak performance
HL, DS, OG, DK QDWH-Based Partial SVD 28 / 40
Synthetic Ill-Conditioned matrices, V100
0.01
0.1
1
10
100
1000
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000
12000
13000
14000
15000
16000
17000
18000
19000
20000
Time(s)
Matrix size
SGESDD
QDWHpartial, 13% SVD, QR+PO
QDWHpartial, 13% SVD
QDWHpartial, 10% SVD
QDWHpartial, 7% SVD
QDWHpartial, 3% SVD
(i) In Seconds.
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000
12000
13000
14000
15000
16000
17000
18000
19000
20000
Gflop/s
Matrix size
QDWHpartial, 3% SVD
QDWHpartial, 7% SVD
QDWHpartial, 10% SVD
QDWHpartial, 13% SVD
QDWHpartial, 13% SVD, QR+PO
SGESDD
(j) In Gflops/s.
Up to 5X speedup, 9Tflop/s, 65% of the theoretical peak performance
HL, DS, OG, DK QDWH-Based Partial SVD 29 / 40
Real Observational Datasets, V100
0.1
1
10
100
1161
5805
11610
17415
Time(s)
Matrix size
SGESDD
QDWHpartial
(k) In Seconds.
0
1000
2000
3000
4000
5000
6000
7000
8000
1161
5805
11610
17415
Gflop/s
Matrix size
QDWHpartial
SGESDD
(l) In Gflops/s.
Up to 4X speedup
HL, DS, OG, DK QDWH-Based Partial SVD 30 / 40
Outline
1 Motivation
2 State-of-the-art Approach
3 QDWH-Based Partial SVD
4 Environment Settings
5 Numerical Accuracy
6 Performance Results
7 Conclusion and Future Work
HL, DS, OG, DK QDWH-Based Partial SVD 31 / 40
Conclusion and Future Work
Comprehensive accuracy/performance analysis of a novel
QDWH-based partial SVD algorithm
Significant performance improvement of the QDWH-based partial
SVD: up to 5X and 4X against state-of-the-art implementations on
synthetic ill-conditioned matrices and real datasets across various
hardware technologies
The pseudo inverse simulation code has been deployed at the Subaru
telescope and operating since June 24th 2018!
Future work includes:
Asynchronous task-based QDWH-based partial SVD implementation
Multi-GPUs QDWH-based partial SVD implementation
HL, DS, OG, DK QDWH-Based Partial SVD 32 / 40
Acknowledgments
Yuji Nakatsukasa, National Institute of Informatics @ Tokyo, Japan
NVIDIA GPU Research Center
Cray Center of Excellence
Intel Parallel Computing Center
HL, DS, OG, DK QDWH-Based Partial SVD 33 / 40
The World’s Biggest Eye on The Sky
Credits: ESO (http://www.eso.org/public/teles-instr/e-elt/)
HL, DS, OG, DK QDWH-Based Partial SVD 34 / 40
The World’s Biggest Eye on The Sky
Credits: ESO (http://www.eso.org/public/teles-instr/e-elt/)
The largest optical/near-infrared telescope in the world.
It will weigh about 2700 tons with a main mirror diameter of 39m.
Location: Chile, South America.
H. Ltaief et al., Real-Time Massively Distributed Multi-Object Adaptive Optics
Simulations for the European Extremely Large Telescope, IEEE IPDPS 2018: designing
one of the most challenging instruments (MOSAIC)
HL, DS, OG, DK QDWH-Based Partial SVD 35 / 40
Exciting Time for Astronomy at KAUST/ECRC!
Supporting two major worldwide ground-based astronomy efforts
The E-ELT Telescope The Subaru Telescope
HL, DS, OG, DK QDWH-Based Partial SVD 36 / 40
Bringing Astronomy Back Home ;-)
Courtesy from CEMSE Communications, KAUST
HL, DS, OG, DK QDWH-Based Partial SVD 37 / 40
The Hourglass Revisited
@KAUST_ECRC
https://www.facebook.com/ecrckaust
HL, DS, OG, DK QDWH-Based Partial SVD 38 / 40
Questions?
HL, DS, OG, DK QDWH-Based Partial SVD 39 / 40
Moving Forward with Extreme AO
Last N WFS
measurements
sensor 1
N x n
Last N WFS
measurements
sensor K
N x n
MVM
Last N WFS
measurements
N x n
MVM
Last WFS
measurement
n
MVM
DM state
m
DM state
m
DM state
m
Control Matrix
m x n
Predictive Control Matrix
m x ( N x n )
Sensor Fusion and
Predictive control Matrix
m x ( K x N x n )
HL, DS, OG, DK QDWH-Based Partial SVD 40 / 40

Contenu connexe

Similaire à Extreme Computing for Extreme Adaptive Optics: The Key to Finding Life Outside our Solar System

QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...
QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...
QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...Austin Benson
 
A next-generation ground array for the detection of ultrahigh-energy cosmic r...
A next-generation ground array for the detection of ultrahigh-energy cosmic r...A next-generation ground array for the detection of ultrahigh-energy cosmic r...
A next-generation ground array for the detection of ultrahigh-energy cosmic r...Toshihiro FUJII
 
16Mitch-BrachyPrimaryStandards.pdf
16Mitch-BrachyPrimaryStandards.pdf16Mitch-BrachyPrimaryStandards.pdf
16Mitch-BrachyPrimaryStandards.pdfNishant835443
 
DSD-INT - SWAN Advanced Course - 02 - Setting up a SWAN computation
DSD-INT - SWAN Advanced Course - 02 - Setting up a SWAN computationDSD-INT - SWAN Advanced Course - 02 - Setting up a SWAN computation
DSD-INT - SWAN Advanced Course - 02 - Setting up a SWAN computationDeltares
 
Hairong Qi V Swaminathan
Hairong Qi V SwaminathanHairong Qi V Swaminathan
Hairong Qi V SwaminathanFNian
 
reservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdf
reservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdfreservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdf
reservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdfRTEFGDFGJU
 
Presentació renovables
Presentació renovablesPresentació renovables
Presentació renovablesJordi Cusido
 
4_IGARSS11_younis_NoGIFv4.pptx
4_IGARSS11_younis_NoGIFv4.pptx4_IGARSS11_younis_NoGIFv4.pptx
4_IGARSS11_younis_NoGIFv4.pptxgrssieee
 
Jim Metz.SAR Contour Landscapes
Jim Metz.SAR Contour LandscapesJim Metz.SAR Contour Landscapes
Jim Metz.SAR Contour LandscapesJames T. Metz
 
CAPACITIVE SENSORS ELECTRICAL WAFER SORT
CAPACITIVE SENSORS ELECTRICAL WAFER SORTCAPACITIVE SENSORS ELECTRICAL WAFER SORT
CAPACITIVE SENSORS ELECTRICAL WAFER SORTMassimo Garavaglia
 
Svd filtered temporal usage clustering
Svd filtered temporal usage clusteringSvd filtered temporal usage clustering
Svd filtered temporal usage clusteringLiang Xie, PhD
 
Differential Modulation and Non-Coherent Detection in Wireless Relay Networks
Differential Modulation and Non-Coherent Detection in Wireless Relay NetworksDifferential Modulation and Non-Coherent Detection in Wireless Relay Networks
Differential Modulation and Non-Coherent Detection in Wireless Relay Networksmravendi
 
Task - Surround - Ambient Lighting
 Task - Surround - Ambient Lighting Task - Surround - Ambient Lighting
Task - Surround - Ambient LightingCindy Foster-Warthen
 
Direct QR factorizations for tall-and-skinny matrices in MapReduce architectu...
Direct QR factorizations for tall-and-skinny matrices in MapReduce architectu...Direct QR factorizations for tall-and-skinny matrices in MapReduce architectu...
Direct QR factorizations for tall-and-skinny matrices in MapReduce architectu...Austin Benson
 

Similaire à Extreme Computing for Extreme Adaptive Optics: The Key to Finding Life Outside our Solar System (20)

QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...
QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...
QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...
 
A next-generation ground array for the detection of ultrahigh-energy cosmic r...
A next-generation ground array for the detection of ultrahigh-energy cosmic r...A next-generation ground array for the detection of ultrahigh-energy cosmic r...
A next-generation ground array for the detection of ultrahigh-energy cosmic r...
 
Xray interferometry
Xray interferometryXray interferometry
Xray interferometry
 
Polynomial Matrix Decompositions
Polynomial Matrix DecompositionsPolynomial Matrix Decompositions
Polynomial Matrix Decompositions
 
16Mitch-BrachyPrimaryStandards.pdf
16Mitch-BrachyPrimaryStandards.pdf16Mitch-BrachyPrimaryStandards.pdf
16Mitch-BrachyPrimaryStandards.pdf
 
DSD-INT - SWAN Advanced Course - 02 - Setting up a SWAN computation
DSD-INT - SWAN Advanced Course - 02 - Setting up a SWAN computationDSD-INT - SWAN Advanced Course - 02 - Setting up a SWAN computation
DSD-INT - SWAN Advanced Course - 02 - Setting up a SWAN computation
 
Hairong Qi V Swaminathan
Hairong Qi V SwaminathanHairong Qi V Swaminathan
Hairong Qi V Swaminathan
 
reservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdf
reservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdfreservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdf
reservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdf
 
Presentació renovables
Presentació renovablesPresentació renovables
Presentació renovables
 
Be lab manual csvtu
Be lab manual csvtuBe lab manual csvtu
Be lab manual csvtu
 
4_IGARSS11_younis_NoGIFv4.pptx
4_IGARSS11_younis_NoGIFv4.pptx4_IGARSS11_younis_NoGIFv4.pptx
4_IGARSS11_younis_NoGIFv4.pptx
 
PCA and SVD in brief
PCA and SVD in briefPCA and SVD in brief
PCA and SVD in brief
 
Jim Metz.SAR Contour Landscapes
Jim Metz.SAR Contour LandscapesJim Metz.SAR Contour Landscapes
Jim Metz.SAR Contour Landscapes
 
CAPACITIVE SENSORS ELECTRICAL WAFER SORT
CAPACITIVE SENSORS ELECTRICAL WAFER SORTCAPACITIVE SENSORS ELECTRICAL WAFER SORT
CAPACITIVE SENSORS ELECTRICAL WAFER SORT
 
Svd filtered temporal usage clustering
Svd filtered temporal usage clusteringSvd filtered temporal usage clustering
Svd filtered temporal usage clustering
 
Differential Modulation and Non-Coherent Detection in Wireless Relay Networks
Differential Modulation and Non-Coherent Detection in Wireless Relay NetworksDifferential Modulation and Non-Coherent Detection in Wireless Relay Networks
Differential Modulation and Non-Coherent Detection in Wireless Relay Networks
 
Task - Surround - Ambient Lighting
 Task - Surround - Ambient Lighting Task - Surround - Ambient Lighting
Task - Surround - Ambient Lighting
 
2 simulation in aodv and dsr
2 simulation in aodv and dsr2 simulation in aodv and dsr
2 simulation in aodv and dsr
 
2 simulation in aodv and dsr
2 simulation in aodv and dsr2 simulation in aodv and dsr
2 simulation in aodv and dsr
 
Direct QR factorizations for tall-and-skinny matrices in MapReduce architectu...
Direct QR factorizations for tall-and-skinny matrices in MapReduce architectu...Direct QR factorizations for tall-and-skinny matrices in MapReduce architectu...
Direct QR factorizations for tall-and-skinny matrices in MapReduce architectu...
 

Plus de inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networksinside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...inside-BigData.com
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networksinside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoringinside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecastsinside-BigData.com
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Updateinside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuninginside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODinside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Accelerationinside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficientlyinside-BigData.com
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Erainside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Clusterinside-BigData.com
 

Plus de inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Dernier

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 

Dernier (20)

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 

Extreme Computing for Extreme Adaptive Optics: The Key to Finding Life Outside our Solar System

  • 1. Extreme Computing for Extreme Adaptive Optics: the Key to Finding Life Outside our Solar System H. Ltaief1, D. Sukkari1, O. Guyon2,3,4, and D. Keyes1 1Extreme Computing Research Center, KAUST, Saudi Arabia 3Steward Observatory, University of Arizona, Tucson, USA 2National Institutes of Natural Sciences, Tokyo, Japan 4National Astronomical Observatory of Japan, Subaru Telescope HL, DS, OG, DK QDWH-Based Partial SVD 1 / 40
  • 2. Outline 1 Motivation 2 State-of-the-art Approach 3 QDWH-Based Partial SVD 4 Environment Settings 5 Numerical Accuracy 6 Performance Results 7 Conclusion and Future Work HL, DS, OG, DK QDWH-Based Partial SVD 2 / 40
  • 3. Outline 1 Motivation 2 State-of-the-art Approach 3 QDWH-Based Partial SVD 4 Environment Settings 5 Numerical Accuracy 6 Performance Results 7 Conclusion and Future Work HL, DS, OG, DK QDWH-Based Partial SVD 3 / 40
  • 4. The Subaru Telescope Represents a flagship telescope of the National Astronomical Observatory of Japan Carries a 8.2-meter (320in) diameter telescope Contains a high-contrast imaging system for directly imaging exoplanets Operates perhaps the most advanced HPC facility in computational astronomy Lives at the Mauna Kea Observatory in Hawaii HL, DS, OG, DK QDWH-Based Partial SVD 4 / 40
  • 5. The Subaru Telescope HL, DS, OG, DK QDWH-Based Partial SVD 5 / 40
  • 6. (Perhaps) The Highest in Altitude GPU System Recorded at 14,000 feet! HL, DS, OG, DK QDWH-Based Partial SVD 6 / 40
  • 7. The Atmosphere Turbulence and The Optical Aberration HL, DS, OG, DK QDWH-Based Partial SVD 7 / 40
  • 8. The Astronomical Challenge Turbulence in the atmosphere limits the performance of astronomical telescopes Without active correction of such defects, images would be blurred to approximately one arcsecond angle To recover the loss of angular resolution, adaptive optics (AO) systems measure and correct atmospheric turbulence In the absence of optical aberrations, the telescope should provide λ D angular resolution (D:telescope diameter, λ wavelength) HL, DS, OG, DK QDWH-Based Partial SVD 8 / 40
  • 9. Adaptive Optics 101 Wavefront sensor(s) (WFS) Measure details of blurring from ’guide star’ near the object you want to observe A real time controller (RTC) Processes the WFS signals to compute the control matrix based on the pseudo inverse Light from both guide star and astronomical object is reflected from deformable mirror; distortions are removed https : //www.uniдe.ch/sciences/astro/index .php/download_f ile/view/34/168/ HL, DS, OG, DK QDWH-Based Partial SVD 9 / 40
  • 10. How AO Works? HL, DS, OG, DK QDWH-Based Partial SVD 10 / 40
  • 11. And Here Comes the Linear Algebra... Compute the pseudo inverse A+ AA+ A = A ,A ∈ Rm×n (m ≥ n) The numerical challenge of the pseudo inverse are twofold: Numerical: dealing with rectangular matrix which may engender numerical instabilities A A = V ΛV Computational: high algorithmic complexity, it should still be able to keep up with the overall throughput of the AO framework Using SVD: A = U ΣV then: A+ = V Σ−1 U Only most significant singular values with their associated singular vectors are required (≈10%) HL, DS, OG, DK QDWH-Based Partial SVD 11 / 40
  • 12. Outline 1 Motivation 2 State-of-the-art Approach 3 QDWH-Based Partial SVD 4 Environment Settings 5 Numerical Accuracy 6 Performance Results 7 Conclusion and Future Work HL, DS, OG, DK QDWH-Based Partial SVD 12 / 40
  • 13. SVD Algorithm - Standard Approach (LAPACK) This forms the SGESDD (divide and conquer algorithm) for the computation of SVD. Bidiagonal Reduction (8/3n3) SGESDD (Σ)(8/3n3) SGESDD (UΣV )(22n3) Level-2 BLAS (4/3n3) 50%flops 50%flops 6%flops 90%time 85%time 30%time HL, DS, OG, DK QDWH-Based Partial SVD 13 / 40
  • 14. Hardware Trends: Energy Matters! 2011 2018 DP FLOP 100 pJ 10 pJ DP DRAM Read 4800 pJ 1920 pJ Local interconnect 7500 pJ 2500 pJ Cross system 9000 pJ 3500 pJ John Shalf, LBNL HL, DS, OG, DK QDWH-Based Partial SVD 14 / 40
  • 15. The Big Picture (Similar w/ SVD) Cray LibSci17.11.1 HL, DS, OG, DK QDWH-Based Partial SVD 15 / 40
  • 16. Outline 1 Motivation 2 State-of-the-art Approach 3 QDWH-Based Partial SVD 4 Environment Settings 5 Numerical Accuracy 6 Performance Results 7 Conclusion and Future Work HL, DS, OG, DK QDWH-Based Partial SVD 16 / 40
  • 17. What is The Polar Decomposition? The polar decomposition: A = UpH , A ∈ Rm×n (m ≥ n), where Up is an orthogonal matrix and H = √ A A is a symmetric positive semidefinite matrix The polar decomposition is a critical numerical algorithm for various applications, including aerospace computations, chemistry, factor analysis HL, DS, OG, DK QDWH-Based Partial SVD 17 / 40
  • 18. QDWH Polar Decomposition Algorithm The QR-Dynamically Weighted Halley iterations: X0 = A/α, √ ckXk I = Q1 Q2 R, Xk+1 = bk ck Xk + 1 √ ck ak − bk ck Q1Q2 , k ≥ 0 The iterative procedure converges: A = UpH, where, UpUp = In, H is symmetric positive semidefinite Backward stable algorithm for computing the polar decomposition Based on conventional computational kernels, i.e., Cholesky/QR factorizations (≤ 6 iterations for double precision) and GEMM HL, DS, OG, DK QDWH-Based Partial SVD 18 / 40
  • 19. Numerical Algorithm Algorithm 1 Pseudo-Inverse using the QDWH-Based Partial SVD. Compute the polar decomposition A = UpH using QDWH Calculate [Q R] = QR(Up + Id) Find the index ind = min(f ind(abs(diaд(R)) < threshold)) Extract ˜Q = Q(:,ind : end) Reduce the original matrix problem ˜A = A × ˜Q Compute the SVD of the reduced matrix problem ˜A = U Σ ˜VT Compute the right singular vectors V = ˜QT × ˜V Calculate the pseudo-inverse A+ = V Σ−1UT HL, DS, OG, DK QDWH-Based Partial SVD 19 / 40
  • 20. Algorithmic Complexity Standard QDWH-based QDWH-based SVD Full SVD Partial SVD QDWH: (4+1/3)Nn3 x #itChol Algorithmic 22Nn3 43Nn3 QR and GEMM: 4/3Nn3 + 2sNn2 + 2Nns2 complexity SVD: 22s3 Where, Nn is the matrix size, and s is the number of the selected singular values/vectors (s << Nn) HL, DS, OG, DK QDWH-Based Partial SVD 20 / 40
  • 21. Outline 1 Motivation 2 State-of-the-art Approach 3 QDWH-Based Partial SVD 4 Environment Settings 5 Numerical Accuracy 6 Performance Results 7 Conclusion and Future Work HL, DS, OG, DK QDWH-Based Partial SVD 21 / 40
  • 22. Environment Settings Software: GCC compilers MAGMA v2.3 and CUDA v9.0 (including cuBLAS) Single precision (SP) arithmetics is used Ill-conditioned matrices generated using SLATMS MAGMA routine Hardware: The K80 GPU: 12GB of memory, Two-socket 14-core system Intel Broadwell system with 128GB of main memory The P100 and V100 GPUs: 16GB of memory Two-socket 16-core system Intel Haswell systems with 128GB of main memory HL, DS, OG, DK QDWH-Based Partial SVD 22 / 40
  • 23. Outline 1 Motivation 2 State-of-the-art Approach 3 QDWH-Based Partial SVD 4 Environment Settings 5 Numerical Accuracy 6 Performance Results 7 Conclusion and Future Work HL, DS, OG, DK QDWH-Based Partial SVD 23 / 40
  • 24. Synthetic Ill-Conditioned matrices, K80 1e-11 1e-10 1e-09 1e-08 1e-07 1e-06 1e-05 0.0001 0.001 0.01 0.1 1 1000 2000 3000 4000 5000 6000 7000 8000 9000 1000011000120001300014000150001600017000180001900020000 AccuracySingularValues Matrix size SGESDD QDWHpartial, 13% SVD QDWHpartial, 13% SVD, QR+PO QDWHpartial, 10% SVD QDWHpartial, 7% SVD QDWHpartial, 3% SVD (a) Singular Value Accuracy. 1e-11 1e-10 1e-09 1e-08 1e-07 1e-06 1e-05 0.0001 0.001 0.01 0.1 1 1000 2000 3000 4000 5000 6000 7000 8000 9000 1000011000120001300014000150001600017000180001900020000 ResidualofSVD Matrix Size QDWHpartial, Left, 13% SVD QDWHpartial, Right, 13% SVD QDWHpartial, Left, 13% SVD, QR+PO QDWHpartial, Right, 13% SVD, QR+PO QDWHpartial, Left, 10% SVD QDWHpartial, Right, 10% SVD 1e-11 1e-10 1e-09 1e-08 1e-07 1e-06 1e-05 0.0001 0.001 0.01 0.1 1 1000 2000 3000 4000 5000 6000 7000 8000 9000 1000011000120001300014000150001600017000180001900020000 ResidualofSVD Matrix Size QDWHpartial, Left, 7% SVD QDWHpartial, Right, 7% SVD QDWHpartial, Left, 3% SVD QDWHpartial, Right, 3% SVD SGESDD, Left SGESDD, Right (b) Backward Error. HL, DS, OG, DK QDWH-Based Partial SVD 24 / 40
  • 25. Real Observational Datasets, K80 1e-11 1e-10 1e-09 1e-08 1e-07 1e-06 1e-05 0.0001 0.001 0.01 0.1 1 1161 5805 11610 17415 AccuracySingularValues Matrix size SGESDD QDWHpartial (c) Singular Value Accuracy. 1e-12 1e-11 1e-10 1e-09 1e-08 1e-07 1e-06 1e-05 0.0001 0.001 0.01 0.1 1 1161 5805 11610 17415 ResidualofSVD Matrix size Right Left SGESDD, Left SGESDD, Right (d) Backward Error. HL, DS, OG, DK QDWH-Based Partial SVD 25 / 40
  • 26. Outline 1 Motivation 2 State-of-the-art Approach 3 QDWH-Based Partial SVD 4 Environment Settings 5 Numerical Accuracy 6 Performance Results 7 Conclusion and Future Work HL, DS, OG, DK QDWH-Based Partial SVD 26 / 40
  • 27. Synthetic Ill-Conditioned matrices, K80 0.1 1 10 100 1000 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 14000 15000 16000 17000 18000 19000 20000 Time(s) Matrix size SGESDD QDWHpartial, 13% SVD, QR+PO QDWHpartial, 13% SVD QDWHpartial, 10% SVD QDWHpartial, 7% SVD QDWHpartial, 3% SVD (e) In Seconds. 0 500 1000 1500 2000 2500 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 14000 15000 16000 17000 18000 19000 20000 Gflop/s Matrix size QDWHpartial, 3% SVD QDWHpartial, 7% SVD QDWHpartial, 10% SVD QDWHpartial, 13% SVD QDWHpartial, 13% SVD, QR+PO SGESDD (f) In Gflops/s. Up to 3X speedup, 1.8Tflop/s, 45% of the theoretical peak performance HL, DS, OG, DK QDWH-Based Partial SVD 27 / 40
  • 28. Synthetic Ill-Conditioned matrices, P100 0.01 0.1 1 10 100 1000 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 14000 15000 16000 17000 18000 19000 20000 Time(s) Matrix size SGESDD QDWHpartial, 13% SVD, QR+PO QDWHpartial, 13% SVD QDWHpartial, 10% SVD QDWHpartial, 7% SVD QDWHpartial, 3% SVD (g) In Seconds. 0 1000 2000 3000 4000 5000 6000 7000 8000 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 14000 15000 16000 17000 18000 19000 20000 Gflop/s Matrix size QDWHpartial, 3% SVD QDWHpartial, 7% SVD QDWHpartial, 10% SVD QDWHpartial, 13% SVD QDWHpartial, 13% SVD, QR+PO SGESDD (h) In Gflops/s. Up to 4X speedup, 7Tflop/s, 75% of the theoretical peak performance HL, DS, OG, DK QDWH-Based Partial SVD 28 / 40
  • 29. Synthetic Ill-Conditioned matrices, V100 0.01 0.1 1 10 100 1000 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 14000 15000 16000 17000 18000 19000 20000 Time(s) Matrix size SGESDD QDWHpartial, 13% SVD, QR+PO QDWHpartial, 13% SVD QDWHpartial, 10% SVD QDWHpartial, 7% SVD QDWHpartial, 3% SVD (i) In Seconds. 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 14000 15000 16000 17000 18000 19000 20000 Gflop/s Matrix size QDWHpartial, 3% SVD QDWHpartial, 7% SVD QDWHpartial, 10% SVD QDWHpartial, 13% SVD QDWHpartial, 13% SVD, QR+PO SGESDD (j) In Gflops/s. Up to 5X speedup, 9Tflop/s, 65% of the theoretical peak performance HL, DS, OG, DK QDWH-Based Partial SVD 29 / 40
  • 30. Real Observational Datasets, V100 0.1 1 10 100 1161 5805 11610 17415 Time(s) Matrix size SGESDD QDWHpartial (k) In Seconds. 0 1000 2000 3000 4000 5000 6000 7000 8000 1161 5805 11610 17415 Gflop/s Matrix size QDWHpartial SGESDD (l) In Gflops/s. Up to 4X speedup HL, DS, OG, DK QDWH-Based Partial SVD 30 / 40
  • 31. Outline 1 Motivation 2 State-of-the-art Approach 3 QDWH-Based Partial SVD 4 Environment Settings 5 Numerical Accuracy 6 Performance Results 7 Conclusion and Future Work HL, DS, OG, DK QDWH-Based Partial SVD 31 / 40
  • 32. Conclusion and Future Work Comprehensive accuracy/performance analysis of a novel QDWH-based partial SVD algorithm Significant performance improvement of the QDWH-based partial SVD: up to 5X and 4X against state-of-the-art implementations on synthetic ill-conditioned matrices and real datasets across various hardware technologies The pseudo inverse simulation code has been deployed at the Subaru telescope and operating since June 24th 2018! Future work includes: Asynchronous task-based QDWH-based partial SVD implementation Multi-GPUs QDWH-based partial SVD implementation HL, DS, OG, DK QDWH-Based Partial SVD 32 / 40
  • 33. Acknowledgments Yuji Nakatsukasa, National Institute of Informatics @ Tokyo, Japan NVIDIA GPU Research Center Cray Center of Excellence Intel Parallel Computing Center HL, DS, OG, DK QDWH-Based Partial SVD 33 / 40
  • 34. The World’s Biggest Eye on The Sky Credits: ESO (http://www.eso.org/public/teles-instr/e-elt/) HL, DS, OG, DK QDWH-Based Partial SVD 34 / 40
  • 35. The World’s Biggest Eye on The Sky Credits: ESO (http://www.eso.org/public/teles-instr/e-elt/) The largest optical/near-infrared telescope in the world. It will weigh about 2700 tons with a main mirror diameter of 39m. Location: Chile, South America. H. Ltaief et al., Real-Time Massively Distributed Multi-Object Adaptive Optics Simulations for the European Extremely Large Telescope, IEEE IPDPS 2018: designing one of the most challenging instruments (MOSAIC) HL, DS, OG, DK QDWH-Based Partial SVD 35 / 40
  • 36. Exciting Time for Astronomy at KAUST/ECRC! Supporting two major worldwide ground-based astronomy efforts The E-ELT Telescope The Subaru Telescope HL, DS, OG, DK QDWH-Based Partial SVD 36 / 40
  • 37. Bringing Astronomy Back Home ;-) Courtesy from CEMSE Communications, KAUST HL, DS, OG, DK QDWH-Based Partial SVD 37 / 40
  • 39. Questions? HL, DS, OG, DK QDWH-Based Partial SVD 39 / 40
  • 40. Moving Forward with Extreme AO Last N WFS measurements sensor 1 N x n Last N WFS measurements sensor K N x n MVM Last N WFS measurements N x n MVM Last WFS measurement n MVM DM state m DM state m DM state m Control Matrix m x n Predictive Control Matrix m x ( N x n ) Sensor Fusion and Predictive control Matrix m x ( K x N x n ) HL, DS, OG, DK QDWH-Based Partial SVD 40 / 40