SlideShare une entreprise Scribd logo
1  sur  21
Télécharger pour lire hors ligne
Distinguishing signal
            noise
from noise in an SVD
of simulation data
DAVID F. GLEICH !     PAUL G. CONSTANTINE!
 PURDUE UNIVERSITY
     STANFORD UNIVERSITY
COMPUTER SCIENCE !
 DEPARTMENT





                                                               1
                           David Gleich · Purdue 
   ICASSP
Large scale non-linear, time
dependent heat transfer problem
                    105 nodes, 103 time steps
                    30 minutes on 16 cores
                    ~ 1GB
                    

                    Questions
                    What is the probability of failure? 
                    Which input values cause failure?




                                                            2
                        David Gleich · Purdue 
   ICASSP
Insight and confidence requires multiple runs
and hits the curse of dimensionality.

The problem
A simulation run is time-consuming!

Our solution
Use “big-data” techniques and platforms.




                                                            3
                        David Gleich · Purdue 
   ICASSP
We store a few runs …
Supercomputer   Data computing cluster          Engineer




Run 100-1000    Store them on the        Run 10000-100000
simulations     MapReduce cluster        interpolated simulations
                                         for approximate statistics


        … and build an interpolant from the
           data for computational steering.




                                                                        4
                               David Gleich · Purdue 
        ICASSP
The Database
    Input "                                Time history"
 Parameters
                               of simulation
             s1 -> f1
                                                                      s2 -> f2
      s
                                          f
       "                                         
                       
 5-10 of them
                           “a few gigabytes”
           sk -> fk

                 2                3 A single simulation
                   q(x1 , t1 , s)
                 6       .
                         .        7 at one time step
The simulation 6 6       .        7
                                  7
 as a vector
    6q(xn , t1 , s)7
                 6                7
                 6q(x1 , t2 , s)7                 ⇥                          ⇤
                 6                7
          f(s) = 6       .        7          X = f(s1 ) f(s2 ) ... f(sp )
                 6       .
                         .        7
                 6                7
                 6q(xn , t2 , s)7
                 6                7               The database as a matrix.
                 6       .        7
                 4       .
                         .        5                    100GB – 100TB
                     q(xn , tk , s)




                                                                                    5
                                          David Gleich · Purdue 
       ICASSP
Xi,j = f (xi , sj )                    One-dimensional
             1                             test problem
f (x, s) =        log[1 + 4s(x 2          x)]
            8s

 f(x)
        X=             f1


                            f2



          f5

                   x


               “plot( X )”
                “imagesc(X )”




                                                                     6
                                 David Gleich · Purdue 
   ICASSP
The interpolant

Motivation!
                                               This idea was inspired by
Let the data give you the basis.
              the success of other
         ⇥                           ⇤         reduced order models

    X = f(s1 ) f(s2 ) ... f(sp )              like POD; and Paul’s
                                               residual minimizing idea.
Then find the right combination
            Xr

     f(s) ⇡     uj ↵j (s)

           j=1
                       These are the left singular
                       vectors from X!




                                                                        7
                                David Gleich · Purdue 
      ICASSP
Why the SVD? It splits “space-
 time” from “parameters”
                                                                            treat each right
                                                                            singular vector
        x is the “space-time” index
                                                                            as samples of
                                                                            the unknown
                 r                         r                                basis functions
                 X                         X
f (xi , sj ) =         Ui,`   ` Vj,`   =          u` (xi ) ` v` (sj )
                 `=1                       `=1                              split x and s
    a general parameter
               r                                          p
             X                                            X                 (`)
f (xi , s) =         u` (xi ) ` v` (s) v` (s) ⇡                  v` (sj )   j (s)
               `=1                                         j=1
                       Interpolate v any way you wish


    … and it has a “smoothness” property.




                                                                                             8
                                                 David Gleich · Purdue 
       ICASSP
MapReduce and Interpolation

                          f1        Interpolation
                                       Sample

                               f2

                                                                Interp.!

                    f5

   The Database                                                            New Samples
                                       The Surrogate
         s1 -> f1                                                           sa -> fa
         s2 -> f2       Use SVD on                  Form a linear
                                                                            sb -> fb
                        MapReduce Just one machine combination of
      sk -> fk cluster to get                     singular vectors s -> f
                                                                       c       c
                     singular vector
 On the MapReduce cluster basis                                On the MapReduce cluster
ICASSP                                  David Gleich · Purdue                            9/18
A quiz!
Which section would you rather
try and interpolate, A or B?




          A
          B




                                                         10
                     David Gleich · Purdue 
   ICASSP
Fig. 1. An example of when the functions v` become d
How predictable is a !          cult to interpolate. Each plot shows a singular-vector f
                                the example in Section 3, which we interpret as a func

singular vector?
                                v` (s). While we might have some confidence in an interp
                                tion of v1 (s) and v2 (s), interpolating v3 (s) for s nearby
                                problematic, and interpolating v7 (s) anywhere is dubious

Folk Theorem (O’Leary 2011)
                        v1                    v2
                                           1                      1
The singular vectors of a matrix of        0                      0
“smooth” data become more                 −1                     −1
oscillatory as the index increases.
       −1       0       1     −1      0        1
                                                    v                     v
                                                     3                      7
Implication!                             0.5                     0.5

The gradient of the singular vectors 0                            0

increases as the index increases. 
 −0.5
                                       −1           0       1
                                                                −0.5
                                                                   −1      0       1


                               Fig. 2. For reference, we show a finer discretization of
v1 (s), v2 (s), ... , vt (s)

                                        v         (s), ... , v (s)
                                functions above, which shows that interpolating v7 (s) ne
                                1 is difficult.t+1                     r

      Predictable signal
                       Unpredictable noise
                                    Once we have determined the predictable bases, w




                                                                                       11
                                terpolate them using procedures discussed above to cr
                                    David Gleich · Purdue 
           ICASSP
                                the ↵` (s). From the singular values and left singular vec
A refined method with !
an error model
                                  Don’t even try to
                                                 interpolate the
                                                 predictable modes.
         t(s)                                r
         X                                   X
f(s) ⇡          uj ↵j (s)        +                      uj j ⌘j
         j=1     Predictable
             j=t(s)+1           Unpredictable
                                                             ⌘j ⇠ N(0, 1)
                            0                                1
                                 r
                                 X
                                               2     TA
Variance[f] = diag @                           j uj uj
                                j=t(s)+1

           But now, how to choose t(s)?




                                                                              12
                                   David Gleich · Purdue 
        ICASSP
Our current approach to
choosing the predictability
                                                 v1                    v2
t(s) is the largest 𝜏 such that
 1                     1

                                  0                   0

         X⌧

      1             @vi           −1
                                   −1      0
                                                     −1
                                                   1 −1         0      1
                 i                         v3                  v7

       1            @s            1                   1
          i=1

                                  0                   0

            < threshold −1                          −1

                                  −1      0       1 −1         0      1

Better ideas? Come talk to me!
 We can use more black v` becom
                          Fig. 1. An example of when the functions
                                         gradients than red gradients,
                             cult to interpolate. Each will be higher singular-vecto
                                               so error plot shows a for red.
                             the example in Section 3, which we interpret as a fu




                                                                                13
                             v` (s). While we might have some confidence in an int
                             tion of vDavidand v2 (s), interpolating v3 (s) for s nearb
                                      1 (s)
                                            Gleich · Purdue 
        ICASSP
An experimental test case

                                 A heat equation
                                 problem
                                 
                                 Two parameters
                                 that control the
                                 material properties




                                                         14
                     David Gleich · Purdue 
   ICASSP
Where the error is the worst
                                              Error
                Our Reduced Order Model


                                               10-2


                                               10-3
Histogram of errors




                                                       The Truth




                                                                                                15
                       Error
 10-3
   10-2
                 David Gleich · Purdue 
   ICASSP
A Large Scale Example




Nonlinear heat transfer model
80k nodes, 300 time-steps
104 basis runs
SVD of 24m x 104 data matrix
 500x reduction in wall clock time
(100x including the SVD)




                                                                           16
                                       David Gleich · Purdue 
   ICASSP
SVD from QR: R-SVD

Old algorithm …

Let A = QR


                   T
then   A=   QUR ⌃R VR

… helps when A is tall and skinny.




                                                                17
                            David Gleich · Purdue 
   ICASSP
Intro to MapReduce
Originated at Google for indexing web   Data scalable
pages and computing PageRank.
                Maps
                        M         M
                                                                           1
        2
                                        1
     M
The idea Bring the                                  Reduce
                                        2
     M                           M         M
computations to the data.
                            R                    3
        4
                                        3
     M
                                                      R
                                               M                                M
Express algorithms in "
                                        4
                                                                                5
                                        5
     M Shuffle
data-local operations.
                                        Fault-tolerance by design
Implement one type of                        Input stored in triplicate
communication: shuffle.
                                 M
                                                                    Reduce input/"
                                                                    output on disk
                                                        M
Shuffle moves all data with                              M
                                                                 R

the same key to the same                                M        R

reducer.
                                                   Map output"
                                                            persisted to disk"




                                                                                          18
                                                            before shuffle
                                         David Gleich · Purdue 
      ICASSP
MapReduceTSQR summary
 MapReduce is great for TSQR!
Data A tall and skinny (TS) matrix by rows

Map QR factorization of local rows                       Demmel et al. showed that
                                                         this construction works to
Reduce QR factorization of local rows                    compute a QR factorization
                                                         with minimal communication
Input 500,000,000-by-100 matrix
Each record 1-by-100 row
HDFS Size 423.3 GB
Time to compute        (the norm of each column) 161 sec.
Time to compute    in qr(   ) 387 sec.




                                                                                        19
                         On a 64-node Hadoop cluster with · Purdue 
                                        David Gleich 4x2TB, one Core i7-920,ICASSP
                                                                             12GB RAM/node
Key Limitations
Computes only R and not Q

Can get Q via Q = AR+ with another MR iteration. "
  (we currently use this for computing the SVD) 
Not numerically orthogonal; iterative refinement helps.

We are working on better ways to compute Q"
(with Austin Benson, Jim Demmel)




                                                                20
                            David Gleich · Purdue 
   ICASSP
Our vision!
To enable analysts
and engineers to
hypothesize from "                 Paul G. Constantine "
                                            

data computations
instead of expensive
HPC computations.




                                                           21
                  David Gleich · Purdue 
       ICASSP

Contenu connexe

Tendances

SEGMENTATION OF POLARIMETRIC SAR DATA WITH A MULTI-TEXTURE PRODUCT MODEL
SEGMENTATION OF POLARIMETRIC SAR DATA WITH A MULTI-TEXTURE PRODUCT MODELSEGMENTATION OF POLARIMETRIC SAR DATA WITH A MULTI-TEXTURE PRODUCT MODEL
SEGMENTATION OF POLARIMETRIC SAR DATA WITH A MULTI-TEXTURE PRODUCT MODELgrssieee
 
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 6
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 6Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 6
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 6Ono Shigeru
 
Apache Hadoop & Friends at Utah Java User's Group
Apache Hadoop & Friends at Utah Java User's GroupApache Hadoop & Friends at Utah Java User's Group
Apache Hadoop & Friends at Utah Java User's GroupCloudera, Inc.
 
Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf
Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdfKernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf
Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdfgrssieee
 
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7Ono Shigeru
 
Racing To Win: Using Race Conditions to Build Correct and Concurrent Software
Racing To Win: Using Race Conditions to Build Correct and Concurrent SoftwareRacing To Win: Using Race Conditions to Build Correct and Concurrent Software
Racing To Win: Using Race Conditions to Build Correct and Concurrent SoftwareFastly
 

Tendances (7)

Defense
DefenseDefense
Defense
 
SEGMENTATION OF POLARIMETRIC SAR DATA WITH A MULTI-TEXTURE PRODUCT MODEL
SEGMENTATION OF POLARIMETRIC SAR DATA WITH A MULTI-TEXTURE PRODUCT MODELSEGMENTATION OF POLARIMETRIC SAR DATA WITH A MULTI-TEXTURE PRODUCT MODEL
SEGMENTATION OF POLARIMETRIC SAR DATA WITH A MULTI-TEXTURE PRODUCT MODEL
 
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 6
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 6Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 6
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 6
 
Apache Hadoop & Friends at Utah Java User's Group
Apache Hadoop & Friends at Utah Java User's GroupApache Hadoop & Friends at Utah Java User's Group
Apache Hadoop & Friends at Utah Java User's Group
 
Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf
Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdfKernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf
Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf
 
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
 
Racing To Win: Using Race Conditions to Build Correct and Concurrent Software
Racing To Win: Using Race Conditions to Build Correct and Concurrent SoftwareRacing To Win: Using Race Conditions to Build Correct and Concurrent Software
Racing To Win: Using Race Conditions to Build Correct and Concurrent Software
 

Similaire à Distinguishing Signal from Noise in an SVD of Simulation Data

Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksDavid Gleich
 
Deep Learning on Aerial Imagery: What does it look like on a map?
Deep Learning on Aerial Imagery: What does it look like on a map?Deep Learning on Aerial Imagery: What does it look like on a map?
Deep Learning on Aerial Imagery: What does it look like on a map?Rob Emanuele
 
Tall-and-skinny QR factorizations in MapReduce architectures
Tall-and-skinny QR factorizations in MapReduce architecturesTall-and-skinny QR factorizations in MapReduce architectures
Tall-and-skinny QR factorizations in MapReduce architecturesDavid Gleich
 
Asynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and AlgorithmsAsynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and AlgorithmsFabian Pedregosa
 
Patch Matching with Polynomial Exponential Families and Projective Divergences
Patch Matching with Polynomial Exponential Families and Projective DivergencesPatch Matching with Polynomial Exponential Families and Projective Divergences
Patch Matching with Polynomial Exponential Families and Projective DivergencesFrank Nielsen
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential David Gleich
 
Neural Processes Family
Neural Processes FamilyNeural Processes Family
Neural Processes FamilyKota Matsui
 
Extending lifespan with Hadoop and R
Extending lifespan with Hadoop and RExtending lifespan with Hadoop and R
Extending lifespan with Hadoop and RRadek Maciaszek
 
Visualizing, Modeling and Forecasting of Functional Time Series
Visualizing, Modeling and Forecasting of Functional Time SeriesVisualizing, Modeling and Forecasting of Functional Time Series
Visualizing, Modeling and Forecasting of Functional Time Serieshanshang
 
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305mjfrankli
 
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012Amazon Web Services
 
Lecture4 kenrels functions_rkhs
Lecture4 kenrels functions_rkhsLecture4 kenrels functions_rkhs
Lecture4 kenrels functions_rkhsStéphane Canu
 
R Analytics in the Cloud
R Analytics in the CloudR Analytics in the Cloud
R Analytics in the CloudDataMine Lab
 
2024.03.22 - Mike Heddes - Introduction to Hyperdimensional Computing.pdf
2024.03.22 - Mike Heddes - Introduction to Hyperdimensional Computing.pdf2024.03.22 - Mike Heddes - Introduction to Hyperdimensional Computing.pdf
2024.03.22 - Mike Heddes - Introduction to Hyperdimensional Computing.pdfAdvanced-Concepts-Team
 
Deep Learning for Cyber Security
Deep Learning for Cyber SecurityDeep Learning for Cyber Security
Deep Learning for Cyber SecurityAltoros
 
Higher-order organization of complex networks
Higher-order organization of complex networksHigher-order organization of complex networks
Higher-order organization of complex networksDavid Gleich
 
Nec 602 unit ii Random Variables and Random process
Nec 602 unit ii Random Variables and Random processNec 602 unit ii Random Variables and Random process
Nec 602 unit ii Random Variables and Random processDr Naim R Kidwai
 
A Development of Log-based Game AI using Deep Learning
A Development of Log-based Game AI using Deep LearningA Development of Log-based Game AI using Deep Learning
A Development of Log-based Game AI using Deep LearningSuntae Kim
 

Similaire à Distinguishing Signal from Noise in an SVD of Simulation Data (20)

Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networks
 
Deep Learning on Aerial Imagery: What does it look like on a map?
Deep Learning on Aerial Imagery: What does it look like on a map?Deep Learning on Aerial Imagery: What does it look like on a map?
Deep Learning on Aerial Imagery: What does it look like on a map?
 
Tall-and-skinny QR factorizations in MapReduce architectures
Tall-and-skinny QR factorizations in MapReduce architecturesTall-and-skinny QR factorizations in MapReduce architectures
Tall-and-skinny QR factorizations in MapReduce architectures
 
Asynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and AlgorithmsAsynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and Algorithms
 
Patch Matching with Polynomial Exponential Families and Projective Divergences
Patch Matching with Polynomial Exponential Families and Projective DivergencesPatch Matching with Polynomial Exponential Families and Projective Divergences
Patch Matching with Polynomial Exponential Families and Projective Divergences
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential
 
Neural Processes Family
Neural Processes FamilyNeural Processes Family
Neural Processes Family
 
Extending lifespan with Hadoop and R
Extending lifespan with Hadoop and RExtending lifespan with Hadoop and R
Extending lifespan with Hadoop and R
 
Visualizing, Modeling and Forecasting of Functional Time Series
Visualizing, Modeling and Forecasting of Functional Time SeriesVisualizing, Modeling and Forecasting of Functional Time Series
Visualizing, Modeling and Forecasting of Functional Time Series
 
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
 
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
 
Approximate Tree Kernels
Approximate Tree KernelsApproximate Tree Kernels
Approximate Tree Kernels
 
Lecture4 kenrels functions_rkhs
Lecture4 kenrels functions_rkhsLecture4 kenrels functions_rkhs
Lecture4 kenrels functions_rkhs
 
R Analytics in the Cloud
R Analytics in the CloudR Analytics in the Cloud
R Analytics in the Cloud
 
2024.03.22 - Mike Heddes - Introduction to Hyperdimensional Computing.pdf
2024.03.22 - Mike Heddes - Introduction to Hyperdimensional Computing.pdf2024.03.22 - Mike Heddes - Introduction to Hyperdimensional Computing.pdf
2024.03.22 - Mike Heddes - Introduction to Hyperdimensional Computing.pdf
 
Deep Learning for Cyber Security
Deep Learning for Cyber SecurityDeep Learning for Cyber Security
Deep Learning for Cyber Security
 
Higher-order organization of complex networks
Higher-order organization of complex networksHigher-order organization of complex networks
Higher-order organization of complex networks
 
Nec 602 unit ii Random Variables and Random process
Nec 602 unit ii Random Variables and Random processNec 602 unit ii Random Variables and Random process
Nec 602 unit ii Random Variables and Random process
 
Astaño 4
Astaño 4Astaño 4
Astaño 4
 
A Development of Log-based Game AI using Deep Learning
A Development of Log-based Game AI using Deep LearningA Development of Log-based Game AI using Deep Learning
A Development of Log-based Game AI using Deep Learning
 

Plus de David Gleich

Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisDavid Gleich
 
Correlation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksCorrelation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksDavid Gleich
 
Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresDavid Gleich
 
Spacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisSpacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisDavid Gleich
 
Non-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansNon-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansDavid Gleich
 
Using Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningUsing Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningDavid Gleich
 
Spacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsSpacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsDavid Gleich
 
Localized methods in graph mining
Localized methods in graph miningLocalized methods in graph mining
Localized methods in graph miningDavid Gleich
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresDavid Gleich
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structuresDavid Gleich
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsDavid Gleich
 
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...David Gleich
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphsDavid Gleich
 
Anti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutAnti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutDavid Gleich
 
Fast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreFast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreDavid Gleich
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...David Gleich
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsDavid Gleich
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLDavid Gleich
 
Personalized PageRank based community detection
Personalized PageRank based community detectionPersonalized PageRank based community detection
Personalized PageRank based community detectionDavid Gleich
 
Tall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceTall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceDavid Gleich
 

Plus de David Gleich (20)

Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network Analysis
 
Correlation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksCorrelation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networks
 
Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structures
 
Spacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisSpacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysis
 
Non-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansNon-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-means
 
Using Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningUsing Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based Learning
 
Spacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsSpacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chains
 
Localized methods in graph mining
Localized methods in graph miningLocalized methods in graph mining
Localized methods in graph mining
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structures
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structures
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphs
 
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphs
 
Anti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutAnti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCut
 
Fast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreFast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and more
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applications
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQL
 
Personalized PageRank based community detection
Personalized PageRank based community detectionPersonalized PageRank based community detection
Personalized PageRank based community detection
 
Tall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceTall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduce
 

Dernier

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 

Dernier (20)

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 

Distinguishing Signal from Noise in an SVD of Simulation Data

  • 1. Distinguishing signal noise from noise in an SVD of simulation data DAVID F. GLEICH ! PAUL G. CONSTANTINE! PURDUE UNIVERSITY STANFORD UNIVERSITY COMPUTER SCIENCE ! DEPARTMENT 1 David Gleich · Purdue ICASSP
  • 2. Large scale non-linear, time dependent heat transfer problem 105 nodes, 103 time steps 30 minutes on 16 cores ~ 1GB Questions What is the probability of failure? Which input values cause failure? 2 David Gleich · Purdue ICASSP
  • 3. Insight and confidence requires multiple runs and hits the curse of dimensionality. The problem A simulation run is time-consuming! Our solution Use “big-data” techniques and platforms. 3 David Gleich · Purdue ICASSP
  • 4. We store a few runs … Supercomputer Data computing cluster Engineer Run 100-1000 Store them on the Run 10000-100000 simulations MapReduce cluster interpolated simulations for approximate statistics … and build an interpolant from the data for computational steering. 4 David Gleich · Purdue ICASSP
  • 5. The Database Input " Time history" Parameters of simulation s1 -> f1 s2 -> f2 s f " 5-10 of them “a few gigabytes” sk -> fk 2 3 A single simulation q(x1 , t1 , s) 6 . . 7 at one time step The simulation 6 6 . 7 7 as a vector 6q(xn , t1 , s)7 6 7 6q(x1 , t2 , s)7 ⇥ ⇤ 6 7 f(s) = 6 . 7 X = f(s1 ) f(s2 ) ... f(sp ) 6 . . 7 6 7 6q(xn , t2 , s)7 6 7 The database as a matrix. 6 . 7 4 . . 5 100GB – 100TB q(xn , tk , s) 5 David Gleich · Purdue ICASSP
  • 6. Xi,j = f (xi , sj ) One-dimensional 1 test problem f (x, s) = log[1 + 4s(x 2 x)] 8s f(x) X= f1 f2 f5 x “plot( X )” “imagesc(X )” 6 David Gleich · Purdue ICASSP
  • 7. The interpolant Motivation! This idea was inspired by Let the data give you the basis. the success of other ⇥ ⇤ reduced order models X = f(s1 ) f(s2 ) ... f(sp ) like POD; and Paul’s residual minimizing idea. Then find the right combination Xr f(s) ⇡ uj ↵j (s) j=1 These are the left singular vectors from X! 7 David Gleich · Purdue ICASSP
  • 8. Why the SVD? It splits “space- time” from “parameters” treat each right singular vector x is the “space-time” index as samples of the unknown r r basis functions X X f (xi , sj ) = Ui,` ` Vj,` = u` (xi ) ` v` (sj ) `=1 `=1 split x and s a general parameter r p X X (`) f (xi , s) = u` (xi ) ` v` (s) v` (s) ⇡ v` (sj ) j (s) `=1 j=1 Interpolate v any way you wish … and it has a “smoothness” property. 8 David Gleich · Purdue ICASSP
  • 9. MapReduce and Interpolation f1 Interpolation Sample f2 Interp.! f5 The Database New Samples The Surrogate s1 -> f1 sa -> fa s2 -> f2 Use SVD on Form a linear sb -> fb MapReduce Just one machine combination of sk -> fk cluster to get singular vectors s -> f c c singular vector On the MapReduce cluster basis On the MapReduce cluster ICASSP David Gleich · Purdue 9/18
  • 10. A quiz! Which section would you rather try and interpolate, A or B? A B 10 David Gleich · Purdue ICASSP
  • 11. Fig. 1. An example of when the functions v` become d How predictable is a ! cult to interpolate. Each plot shows a singular-vector f the example in Section 3, which we interpret as a func singular vector? v` (s). While we might have some confidence in an interp tion of v1 (s) and v2 (s), interpolating v3 (s) for s nearby problematic, and interpolating v7 (s) anywhere is dubious Folk Theorem (O’Leary 2011) v1 v2 1 1 The singular vectors of a matrix of 0 0 “smooth” data become more −1 −1 oscillatory as the index increases. −1 0 1 −1 0 1 v v 3 7 Implication! 0.5 0.5 The gradient of the singular vectors 0 0 increases as the index increases. −0.5 −1 0 1 −0.5 −1 0 1 Fig. 2. For reference, we show a finer discretization of v1 (s), v2 (s), ... , vt (s) v (s), ... , v (s) functions above, which shows that interpolating v7 (s) ne 1 is difficult.t+1 r Predictable signal Unpredictable noise Once we have determined the predictable bases, w 11 terpolate them using procedures discussed above to cr David Gleich · Purdue ICASSP the ↵` (s). From the singular values and left singular vec
  • 12. A refined method with ! an error model Don’t even try to interpolate the predictable modes. t(s) r X X f(s) ⇡ uj ↵j (s) + uj j ⌘j j=1 Predictable j=t(s)+1 Unpredictable ⌘j ⇠ N(0, 1) 0 1 r X 2 TA Variance[f] = diag @ j uj uj j=t(s)+1 But now, how to choose t(s)? 12 David Gleich · Purdue ICASSP
  • 13. Our current approach to choosing the predictability v1 v2 t(s) is the largest 𝜏 such that 1 1 0 0 X⌧ 1 @vi −1 −1 0 −1 1 −1 0 1 i v3 v7 1 @s 1 1 i=1 0 0 < threshold −1 −1 −1 0 1 −1 0 1 Better ideas? Come talk to me! We can use more black v` becom Fig. 1. An example of when the functions gradients than red gradients, cult to interpolate. Each will be higher singular-vecto so error plot shows a for red. the example in Section 3, which we interpret as a fu 13 v` (s). While we might have some confidence in an int tion of vDavidand v2 (s), interpolating v3 (s) for s nearb 1 (s) Gleich · Purdue ICASSP
  • 14. An experimental test case A heat equation problem Two parameters that control the material properties 14 David Gleich · Purdue ICASSP
  • 15. Where the error is the worst Error Our Reduced Order Model 10-2 10-3 Histogram of errors The Truth 15 Error 10-3 10-2 David Gleich · Purdue ICASSP
  • 16. A Large Scale Example Nonlinear heat transfer model 80k nodes, 300 time-steps 104 basis runs SVD of 24m x 104 data matrix 500x reduction in wall clock time (100x including the SVD) 16 David Gleich · Purdue ICASSP
  • 17. SVD from QR: R-SVD Old algorithm … Let A = QR T then A= QUR ⌃R VR … helps when A is tall and skinny. 17 David Gleich · Purdue ICASSP
  • 18. Intro to MapReduce Originated at Google for indexing web Data scalable pages and computing PageRank. Maps M M 1 2 1 M The idea Bring the Reduce 2 M M M computations to the data. R 3 4 3 M R M M Express algorithms in " 4 5 5 M Shuffle data-local operations. Fault-tolerance by design Implement one type of Input stored in triplicate communication: shuffle. M Reduce input/" output on disk M Shuffle moves all data with M R the same key to the same M R reducer. Map output" persisted to disk" 18 before shuffle David Gleich · Purdue ICASSP
  • 19. MapReduceTSQR summary MapReduce is great for TSQR! Data A tall and skinny (TS) matrix by rows Map QR factorization of local rows Demmel et al. showed that this construction works to Reduce QR factorization of local rows compute a QR factorization with minimal communication Input 500,000,000-by-100 matrix Each record 1-by-100 row HDFS Size 423.3 GB Time to compute    (the norm of each column) 161 sec. Time to compute    in qr(   ) 387 sec. 19 On a 64-node Hadoop cluster with · Purdue David Gleich 4x2TB, one Core i7-920,ICASSP 12GB RAM/node
  • 20. Key Limitations Computes only R and not Q Can get Q via Q = AR+ with another MR iteration. " (we currently use this for computing the SVD) Not numerically orthogonal; iterative refinement helps. We are working on better ways to compute Q" (with Austin Benson, Jim Demmel) 20 David Gleich · Purdue ICASSP
  • 21. Our vision! To enable analysts and engineers to hypothesize from " Paul G. Constantine " data computations instead of expensive HPC computations. 21 David Gleich · Purdue ICASSP