SlideShare une entreprise Scribd logo
1  sur  33
Principal Component Analysis for
                  Novelty Detection
A journal article submitted to and accepted by Pattern Recognition Letters



                                                 Jordan McBain, P.Eng.
                                            Markus Timusk, PhD, P.Eng.
Condition Monitoring
   Maintenance technique
       Maintenance undertaken when some indicator of health is
        flagged
       Advanced technique employed when cost-benefit analysis
        justifies the expense of monitoring equipment
       Alternative to run-to-failure maintenance and statistically
        determined time-based maintenance
   Employ pattern recognition to automate diagnosis
       Expert system employed to replicate technicians
        maintenance insight
           Computer and sensors replaces technician and screw driver set
            atop vibrating machine – the nature of the vibration used to
            discern state
Pattern Recognition
   Equality insufficient means of classifying real-world
    members of class (noise, variance, etc)
   Pattern recognition
       Real-world signals presumed to be representative of class
        reduced to representative n-dimensional feature vectors
       Plotted in N-dimensional space
       Decision boundary generated with pattern recognition
        techniques
           Employed as classification rule
       Problems
           Choice of features
               How representative?
               Maximize number of features?
               Curse of dimensionality
           Imbalance of data
Principal Component Analysis
   One technique used to find “optimal” set of features
       Finds the axes of normally distributed data
       Select the largest axes and omit smaller ones to define
        new basis
       Project data onto basis to reduce dimensionality of
        problem space
   Each feature presumed to be normally distributed
   N-dimensional scattering of features presumed
    independent
   Combined probability:
            P( A   B) P( A)* P( B)
d                     d                             1 xi       i 2
                                        1                   2
                                                               (            )
p( x )         p ( xi )                                  e          i


         i 1                   i 1      2            i
                                d
                              1    x        i 2                                     1   t         
                                  ( i        )
         1                    2i1                                   1               2
                                                                                      (x )    1
                                                                                                  (x )
               d
                          e             i
                                                                                e
                                                                        d
    (2 ) d                                                   (2 ) | |
                     i
               i 1
                                                    Find principal components
                                                     (i.e. axes of hyper-ellipsoidal
                                                     distribution)
                                                    Select maximum variance
                                                     (largest axes)
                                                    Eigenvalue problem
                                                        Eigenvectors – principle
                                                         components
                                                        Eigenvalues – size of
                                                         axis
Novelty Detection
   Deals with imbalance of data between classes
   Fault detection in machinery
       Easy to collect data representative of healthy state
       Difficult to collect data representative of faulted states
           Costly to break machinery
           Operationally unacceptable
           Poor database of faults kept
           Can never capture them all!
   Model healthy data with decision boundary
       If test patterns fall outside, classify as a fault!
Problem
   PCA is best for selecting a subspace that best
    represents the data
   In pattern recognition, we seek to discriminant
    between classes
   Objective of most feature reduction techniques are
    not optimized for novelty detection
Feature Reduction Techniques
Feature Reduction Techniques
   Feature Selection vs. Feature Extraction
   Selection
       Choosing small subsets of features that are adequate to
        describe classes
       E.g. “Search”
           Examines all subsets of feature combinations to find the one which
            maximizes some objective function
           May employ classifier error as objective function
           Exponential explosion
               Heuristics to mitigate possible
           If computationally feasible, gives the best results
   Extraction
       Computes a small number of new features form the set of old
        features
       E.g. PCA
Principal Component Analysis
   Seeks a subspace in which the data representation
    error is minimal
   Development
       For a set of n vectors in d-dimensional space
           seek the equation of a hyper plane onto which the data may be
            projected with minimal representation error
           Hyper plane fixed at the data’s mean, m
           Hyper plane’s orientation defined by direction vector, w (normal
            definition of a plane)



           Derive error function
   Optimization problem well known eigenvalue
    problem
   Resultant feature space is linear
       May not represent non-linear and changing data well
       Kernel PCA and Dynamic PCA
   Techniques only suitable for representing data not
    discriminating between them




                 Source: Duda, 2000
Multiple Discriminant Analysis
   Seeks to find efficient subspaces for discrimination
    rather than representation
   Development
       Two class problem with d-dimensional set of n-vectors
        grouped into D1 and D2
       Projected onto some direction vector w to give

       Consequently grouped into subsets Y1 and Y 2
       Find the direction vector w such that the distance
        between projected sample means m1 and m2 is
        maximized
           Rationalize the distance against the relative sample size
   Reduces to



   Solution is described as “analogous to the well known
    Rayleigh quotient:”
                                   
                             1
                      w    S w (m1 m2 )

   Technique extended for problems with n-classes
       Objective to maximize the spread between all classes in the
        projected space




                                                    Source: Duda, 2000
Extraction for Novelty Detection
Development
   Objective: distinguish between normal and abnormal
    classes
       KFDA inappropriate (assumes classes group well into
        separate classes)
       Novelty detection – classes may cluster well but abnormal
        classes expected to orbit the normal data
           Means could overlap
               Eliminating previous objective functions
   Approach: find the subspace maximizing difference
    between average spread of the normal class and
    average spread of the abnormal class measured
    from the mean of the normal class
   Mathematically, for an outlier class containing b
    elements and target class containing a-elements
    with mean m_t




   To simplify, introduce outlier scatter matrix, O, for
    outlier data centered at m_t

   Reducing to
   Maximize this objective function
       Find the eigenvectors and eigenvalues of the matrix St-O
   Select the first k largest eigenvalues and use
    corresponding eigenvectors as new basis
   Project data onto new basis
   Proceed with classification
   Limitations
       Still dependant on assumption of normal data distribution
           (as are other PCA techniques)
       Assumption: normal data scatter somewhat circularly and
        outlier data orbit nicely without intruding
           (as with PCA and MDA )
       Machinery vibration data are not normally Gaussian (heuristic)
Validation: Artificial Data
   Artificial 3-d data set
       Normal distribution:
           spherical (radius 50) centered at origin
       Outlier distribution:
           randomly generated spherical distribution (radius 100)
           Not permitted to fall within cylinder concentric with the normal
            data’s sphere and oriented with length parallel to [1,1,1]
Validation: Artifical Data
   Results (reduced to 2 dimensions)
       Subspace’s normal vector only 7 degrees off from
        expected [1,1,1]
Experimental Methodology
Apparatus
   Spectraquest gear dynamics simulator
       3-hp motor
       Magnetic particle brake loading
       National Instruments PXI data acquisition and control
       Accelerometers (sampling 4kHz)
Faults
   4 motors employed
       healthy
       Combo bearing faults
       Broken rotor bars
       Rotor unbalance
   Gearbox faults
       Fault-free conditions
       Missing tooth gear
       Chipped tooth
       Bearing with outer race faults
       Bearings with inner and outer race faults
Feature Extraction
   Autoregressive model
       a model of a statistical process generated by regressing
        previous values of that statistical process with itself
       Sampling of sampled signal that best represents the
        original sampling
       Order 10

Segmentation
   Vibration data segmented into groups based on
    intervals with constant number of shaft rotations
           Gaussian Window
           70% overlap between segments
Results: Proposed Algorithm
Results: Kernel PCA
Results: Kernel FDA
   N.B. Potential for singular matrices
Results: Exhaustive Feature Search
Feature Extraction in the
     Absence of Outliers
Motivation and Development
   The above violates assumption of novelty detection
       Limited data from fault classes
   In the case where we know nothing of the outlier
    classes
       Work with what we have: normal data
           Minimize variance of normal data
Results: Novelty Reduction (Outlier
Absence)
Conclusions
Conclusions
   Reduce a large feature space to a smaller one
       Mitigate the curse of dimensionality
   Objective function tweaked for novelty detection
   Similar to MDA but modified to accommodate case
    where normal and outlier means are closely
    separated
   Results good for artificial and machinery data
   Future work
       Extend technique with kernels
           Difficult problem due to need for mean
   Thanks
       CEMI
       Dr. Mechefske, Queens

Contenu connexe

Tendances

Principal Component Analysis and Clustering
Principal Component Analysis and ClusteringPrincipal Component Analysis and Clustering
Principal Component Analysis and ClusteringUsha Vijay
 
Principal component analysis and lda
Principal component analysis and ldaPrincipal component analysis and lda
Principal component analysis and ldaSuresh Pokharel
 
Introduction to Principle Component Analysis
Introduction to Principle Component AnalysisIntroduction to Principle Component Analysis
Introduction to Principle Component AnalysisSunjeet Jena
 
Implement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratchImplement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratchEshanAgarwal4
 
Independent Component Analysis
Independent Component Analysis Independent Component Analysis
Independent Component Analysis Ibrahim Amer
 
Using Principal Component Analysis to Remove Correlated Signal from Astronomi...
Using Principal Component Analysis to Remove Correlated Signal from Astronomi...Using Principal Component Analysis to Remove Correlated Signal from Astronomi...
Using Principal Component Analysis to Remove Correlated Signal from Astronomi...CvilleDataScience
 
Principal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT SlidesPrincipal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT SlidesAbhishekKumar4995
 
Principal Component Analysis(PCA) understanding document
Principal Component Analysis(PCA) understanding documentPrincipal Component Analysis(PCA) understanding document
Principal Component Analysis(PCA) understanding documentNaveen Kumar
 
Lect4 principal component analysis-I
Lect4 principal component analysis-ILect4 principal component analysis-I
Lect4 principal component analysis-Ihktripathy
 
Statistical Pattern recognition(1)
Statistical Pattern recognition(1)Statistical Pattern recognition(1)
Statistical Pattern recognition(1)Syed Atif Naseem
 
Independent component analysis
Independent component analysisIndependent component analysis
Independent component analysisVanessa S
 
"FingerPrint Recognition Using Principle Component Analysis(PCA)”
"FingerPrint Recognition Using Principle Component Analysis(PCA)”"FingerPrint Recognition Using Principle Component Analysis(PCA)”
"FingerPrint Recognition Using Principle Component Analysis(PCA)”Er. Arpit Sharma
 
Machine learning (11)
Machine learning (11)Machine learning (11)
Machine learning (11)NYversity
 
Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component AnalysisSumit Singh
 

Tendances (20)

Principal Component Analysis and Clustering
Principal Component Analysis and ClusteringPrincipal Component Analysis and Clustering
Principal Component Analysis and Clustering
 
Principal component analysis and lda
Principal component analysis and ldaPrincipal component analysis and lda
Principal component analysis and lda
 
Introduction to Principle Component Analysis
Introduction to Principle Component AnalysisIntroduction to Principle Component Analysis
Introduction to Principle Component Analysis
 
Implement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratchImplement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratch
 
Independent Component Analysis
Independent Component Analysis Independent Component Analysis
Independent Component Analysis
 
Understandig PCA and LDA
Understandig PCA and LDAUnderstandig PCA and LDA
Understandig PCA and LDA
 
Pca
PcaPca
Pca
 
Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component Analysis
 
Using Principal Component Analysis to Remove Correlated Signal from Astronomi...
Using Principal Component Analysis to Remove Correlated Signal from Astronomi...Using Principal Component Analysis to Remove Correlated Signal from Astronomi...
Using Principal Component Analysis to Remove Correlated Signal from Astronomi...
 
Principal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT SlidesPrincipal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT Slides
 
Principal Component Analysis(PCA) understanding document
Principal Component Analysis(PCA) understanding documentPrincipal Component Analysis(PCA) understanding document
Principal Component Analysis(PCA) understanding document
 
Lect4 principal component analysis-I
Lect4 principal component analysis-ILect4 principal component analysis-I
Lect4 principal component analysis-I
 
PCA
PCAPCA
PCA
 
Lda
LdaLda
Lda
 
Statistical Pattern recognition(1)
Statistical Pattern recognition(1)Statistical Pattern recognition(1)
Statistical Pattern recognition(1)
 
Independent component analysis
Independent component analysisIndependent component analysis
Independent component analysis
 
"FingerPrint Recognition Using Principle Component Analysis(PCA)”
"FingerPrint Recognition Using Principle Component Analysis(PCA)”"FingerPrint Recognition Using Principle Component Analysis(PCA)”
"FingerPrint Recognition Using Principle Component Analysis(PCA)”
 
Principal component analysis
Principal component analysisPrincipal component analysis
Principal component analysis
 
Machine learning (11)
Machine learning (11)Machine learning (11)
Machine learning (11)
 
Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component Analysis
 

En vedette

Regularized Principal Component Analysis for Spatial Data
Regularized Principal Component Analysis for Spatial DataRegularized Principal Component Analysis for Spatial Data
Regularized Principal Component Analysis for Spatial DataWen-Ting Wang
 
Steps for Principal Component Analysis (pca) using ERDAS software
Steps for Principal Component Analysis (pca) using ERDAS softwareSteps for Principal Component Analysis (pca) using ERDAS software
Steps for Principal Component Analysis (pca) using ERDAS softwareSwetha A
 
Principal component analysis and matrix factorizations for learning (part 2) ...
Principal component analysis and matrix factorizations for learning (part 2) ...Principal component analysis and matrix factorizations for learning (part 2) ...
Principal component analysis and matrix factorizations for learning (part 2) ...zukun
 
fauvel_igarss.pdf
fauvel_igarss.pdffauvel_igarss.pdf
fauvel_igarss.pdfgrssieee
 
Nonlinear component analysis as a kernel eigenvalue problem
Nonlinear component analysis as a kernel eigenvalue problemNonlinear component analysis as a kernel eigenvalue problem
Nonlinear component analysis as a kernel eigenvalue problemMichele Filannino
 
Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf
Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdfKernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf
Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdfgrssieee
 
Different kind of distance and Statistical Distance
Different kind of distance and Statistical DistanceDifferent kind of distance and Statistical Distance
Different kind of distance and Statistical DistanceKhulna University
 
KPCA_Survey_Report
KPCA_Survey_ReportKPCA_Survey_Report
KPCA_Survey_ReportRandy Salm
 
Adaptive anomaly detection with kernel eigenspace splitting and merging
Adaptive anomaly detection with kernel eigenspace splitting and mergingAdaptive anomaly detection with kernel eigenspace splitting and merging
Adaptive anomaly detection with kernel eigenspace splitting and mergingieeepondy
 
Analyzing Kernel Security and Approaches for Improving it
Analyzing Kernel Security and Approaches for Improving itAnalyzing Kernel Security and Approaches for Improving it
Analyzing Kernel Security and Approaches for Improving itMilan Rajpara
 
Modeling and forecasting age-specific mortality: Lee-Carter method vs. Functi...
Modeling and forecasting age-specific mortality: Lee-Carter method vs. Functi...Modeling and forecasting age-specific mortality: Lee-Carter method vs. Functi...
Modeling and forecasting age-specific mortality: Lee-Carter method vs. Functi...hanshang
 
Explicit Signal to Noise Ratio in Reproducing Kernel Hilbert Spaces.pdf
Explicit Signal to Noise Ratio in Reproducing Kernel Hilbert Spaces.pdfExplicit Signal to Noise Ratio in Reproducing Kernel Hilbert Spaces.pdf
Explicit Signal to Noise Ratio in Reproducing Kernel Hilbert Spaces.pdfgrssieee
 
A Comparative Study between ICA (Independent Component Analysis) and PCA (Pri...
A Comparative Study between ICA (Independent Component Analysis) and PCA (Pri...A Comparative Study between ICA (Independent Component Analysis) and PCA (Pri...
A Comparative Study between ICA (Independent Component Analysis) and PCA (Pri...Sahidul Islam
 
Pca and kpca of ecg signal
Pca and kpca of ecg signalPca and kpca of ecg signal
Pca and kpca of ecg signales712
 
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at GoogleDataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at GoogleHakka Labs
 
Principal component analysis and matrix factorizations for learning (part 1) ...
Principal component analysis and matrix factorizations for learning (part 1) ...Principal component analysis and matrix factorizations for learning (part 1) ...
Principal component analysis and matrix factorizations for learning (part 1) ...zukun
 
ECG: Indication and Interpretation
ECG: Indication and InterpretationECG: Indication and Interpretation
ECG: Indication and InterpretationRakesh Verma
 
Introduction to Statistical Machine Learning
Introduction to Statistical Machine LearningIntroduction to Statistical Machine Learning
Introduction to Statistical Machine Learningmahutte
 
Machine Learning With R
Machine Learning With RMachine Learning With R
Machine Learning With RDavid Chiu
 

En vedette (20)

Regularized Principal Component Analysis for Spatial Data
Regularized Principal Component Analysis for Spatial DataRegularized Principal Component Analysis for Spatial Data
Regularized Principal Component Analysis for Spatial Data
 
Steps for Principal Component Analysis (pca) using ERDAS software
Steps for Principal Component Analysis (pca) using ERDAS softwareSteps for Principal Component Analysis (pca) using ERDAS software
Steps for Principal Component Analysis (pca) using ERDAS software
 
Principal component analysis and matrix factorizations for learning (part 2) ...
Principal component analysis and matrix factorizations for learning (part 2) ...Principal component analysis and matrix factorizations for learning (part 2) ...
Principal component analysis and matrix factorizations for learning (part 2) ...
 
fauvel_igarss.pdf
fauvel_igarss.pdffauvel_igarss.pdf
fauvel_igarss.pdf
 
Nonlinear component analysis as a kernel eigenvalue problem
Nonlinear component analysis as a kernel eigenvalue problemNonlinear component analysis as a kernel eigenvalue problem
Nonlinear component analysis as a kernel eigenvalue problem
 
Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf
Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdfKernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf
Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf
 
Different kind of distance and Statistical Distance
Different kind of distance and Statistical DistanceDifferent kind of distance and Statistical Distance
Different kind of distance and Statistical Distance
 
KPCA_Survey_Report
KPCA_Survey_ReportKPCA_Survey_Report
KPCA_Survey_Report
 
Adaptive anomaly detection with kernel eigenspace splitting and merging
Adaptive anomaly detection with kernel eigenspace splitting and mergingAdaptive anomaly detection with kernel eigenspace splitting and merging
Adaptive anomaly detection with kernel eigenspace splitting and merging
 
Analyzing Kernel Security and Approaches for Improving it
Analyzing Kernel Security and Approaches for Improving itAnalyzing Kernel Security and Approaches for Improving it
Analyzing Kernel Security and Approaches for Improving it
 
Modeling and forecasting age-specific mortality: Lee-Carter method vs. Functi...
Modeling and forecasting age-specific mortality: Lee-Carter method vs. Functi...Modeling and forecasting age-specific mortality: Lee-Carter method vs. Functi...
Modeling and forecasting age-specific mortality: Lee-Carter method vs. Functi...
 
Explicit Signal to Noise Ratio in Reproducing Kernel Hilbert Spaces.pdf
Explicit Signal to Noise Ratio in Reproducing Kernel Hilbert Spaces.pdfExplicit Signal to Noise Ratio in Reproducing Kernel Hilbert Spaces.pdf
Explicit Signal to Noise Ratio in Reproducing Kernel Hilbert Spaces.pdf
 
A Comparative Study between ICA (Independent Component Analysis) and PCA (Pri...
A Comparative Study between ICA (Independent Component Analysis) and PCA (Pri...A Comparative Study between ICA (Independent Component Analysis) and PCA (Pri...
A Comparative Study between ICA (Independent Component Analysis) and PCA (Pri...
 
Pca and kpca of ecg signal
Pca and kpca of ecg signalPca and kpca of ecg signal
Pca and kpca of ecg signal
 
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at GoogleDataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
 
Principal component analysis and matrix factorizations for learning (part 1) ...
Principal component analysis and matrix factorizations for learning (part 1) ...Principal component analysis and matrix factorizations for learning (part 1) ...
Principal component analysis and matrix factorizations for learning (part 1) ...
 
ECG: Indication and Interpretation
ECG: Indication and InterpretationECG: Indication and Interpretation
ECG: Indication and Interpretation
 
Introduction to Statistical Machine Learning
Introduction to Statistical Machine LearningIntroduction to Statistical Machine Learning
Introduction to Statistical Machine Learning
 
Machine Learning With R
Machine Learning With RMachine Learning With R
Machine Learning With R
 
Pca ppt
Pca pptPca ppt
Pca ppt
 

Similaire à Principal Component Analysis For Novelty Detection

Jörg Stelzer
Jörg StelzerJörg Stelzer
Jörg Stelzerbutest
 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Miningbutest
 
Nonequilibrium Network Dynamics_Inference, Fluctuation-Respones & Tipping Poi...
Nonequilibrium Network Dynamics_Inference, Fluctuation-Respones & Tipping Poi...Nonequilibrium Network Dynamics_Inference, Fluctuation-Respones & Tipping Poi...
Nonequilibrium Network Dynamics_Inference, Fluctuation-Respones & Tipping Poi...Förderverein Technische Fakultät
 
Condition Monitoring Of Unsteadily Operating Equipment
Condition Monitoring Of Unsteadily Operating EquipmentCondition Monitoring Of Unsteadily Operating Equipment
Condition Monitoring Of Unsteadily Operating EquipmentJordan McBain
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier홍배 김
 
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkOBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkshesnasuneer
 
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkOBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkshesnasuneer
 
Improvement of Anomaly Detection Algorithms in Hyperspectral Images Using Dis...
Improvement of Anomaly Detection Algorithms in Hyperspectral Images Using Dis...Improvement of Anomaly Detection Algorithms in Hyperspectral Images Using Dis...
Improvement of Anomaly Detection Algorithms in Hyperspectral Images Using Dis...sipij
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Zihui Li
 
Unit-1 Introduction and Mathematical Preliminaries.pptx
Unit-1 Introduction and Mathematical Preliminaries.pptxUnit-1 Introduction and Mathematical Preliminaries.pptx
Unit-1 Introduction and Mathematical Preliminaries.pptxavinashBajpayee1
 
CBM Variable Speed Machinery
CBM Variable Speed MachineryCBM Variable Speed Machinery
CBM Variable Speed MachineryJordan McBain
 
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...Salford Systems
 
State-of-the-art Clustering Techniques: Support Vector Methods and Minimum Br...
State-of-the-art Clustering Techniques: Support Vector Methods and Minimum Br...State-of-the-art Clustering Techniques: Support Vector Methods and Minimum Br...
State-of-the-art Clustering Techniques: Support Vector Methods and Minimum Br...Vincenzo Russo
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_financeStefan Duprey
 
20070702 Text Categorization
20070702 Text Categorization20070702 Text Categorization
20070702 Text Categorizationmidi
 
Introduction to conventional machine learning techniques
Introduction to conventional machine learning techniquesIntroduction to conventional machine learning techniques
Introduction to conventional machine learning techniquesXavier Rafael Palou
 

Similaire à Principal Component Analysis For Novelty Detection (20)

Jörg Stelzer
Jörg StelzerJörg Stelzer
Jörg Stelzer
 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Mining
 
Nonequilibrium Network Dynamics_Inference, Fluctuation-Respones & Tipping Poi...
Nonequilibrium Network Dynamics_Inference, Fluctuation-Respones & Tipping Poi...Nonequilibrium Network Dynamics_Inference, Fluctuation-Respones & Tipping Poi...
Nonequilibrium Network Dynamics_Inference, Fluctuation-Respones & Tipping Poi...
 
Condition Monitoring Of Unsteadily Operating Equipment
Condition Monitoring Of Unsteadily Operating EquipmentCondition Monitoring Of Unsteadily Operating Equipment
Condition Monitoring Of Unsteadily Operating Equipment
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkOBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
 
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkOBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Improvement of Anomaly Detection Algorithms in Hyperspectral Images Using Dis...
Improvement of Anomaly Detection Algorithms in Hyperspectral Images Using Dis...Improvement of Anomaly Detection Algorithms in Hyperspectral Images Using Dis...
Improvement of Anomaly Detection Algorithms in Hyperspectral Images Using Dis...
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
 
Unit-1 Introduction and Mathematical Preliminaries.pptx
Unit-1 Introduction and Mathematical Preliminaries.pptxUnit-1 Introduction and Mathematical Preliminaries.pptx
Unit-1 Introduction and Mathematical Preliminaries.pptx
 
CBM Variable Speed Machinery
CBM Variable Speed MachineryCBM Variable Speed Machinery
CBM Variable Speed Machinery
 
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
 
State-of-the-art Clustering Techniques: Support Vector Methods and Minimum Br...
State-of-the-art Clustering Techniques: Support Vector Methods and Minimum Br...State-of-the-art Clustering Techniques: Support Vector Methods and Minimum Br...
State-of-the-art Clustering Techniques: Support Vector Methods and Minimum Br...
 
Km2417821785
Km2417821785Km2417821785
Km2417821785
 
Lect4
Lect4Lect4
Lect4
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_finance
 
20070702 Text Categorization
20070702 Text Categorization20070702 Text Categorization
20070702 Text Categorization
 
TunUp final presentation
TunUp final presentationTunUp final presentation
TunUp final presentation
 
Introduction to conventional machine learning techniques
Introduction to conventional machine learning techniquesIntroduction to conventional machine learning techniques
Introduction to conventional machine learning techniques
 

Principal Component Analysis For Novelty Detection

  • 1. Principal Component Analysis for Novelty Detection A journal article submitted to and accepted by Pattern Recognition Letters Jordan McBain, P.Eng. Markus Timusk, PhD, P.Eng.
  • 2. Condition Monitoring  Maintenance technique  Maintenance undertaken when some indicator of health is flagged  Advanced technique employed when cost-benefit analysis justifies the expense of monitoring equipment  Alternative to run-to-failure maintenance and statistically determined time-based maintenance  Employ pattern recognition to automate diagnosis  Expert system employed to replicate technicians maintenance insight  Computer and sensors replaces technician and screw driver set atop vibrating machine – the nature of the vibration used to discern state
  • 3. Pattern Recognition  Equality insufficient means of classifying real-world members of class (noise, variance, etc)  Pattern recognition  Real-world signals presumed to be representative of class reduced to representative n-dimensional feature vectors  Plotted in N-dimensional space  Decision boundary generated with pattern recognition techniques  Employed as classification rule  Problems  Choice of features  How representative?  Maximize number of features?  Curse of dimensionality  Imbalance of data
  • 4. Principal Component Analysis  One technique used to find “optimal” set of features  Finds the axes of normally distributed data  Select the largest axes and omit smaller ones to define new basis  Project data onto basis to reduce dimensionality of problem space  Each feature presumed to be normally distributed
  • 5. N-dimensional scattering of features presumed independent  Combined probability: P( A B) P( A)* P( B)
  • 6. d d 1 xi i 2  1 2 ( ) p( x ) p ( xi ) e i i 1 i 1 2 i d 1 x i 2 1   t   ( i ) 1 2i1 1 2 (x ) 1 (x ) d e i e d (2 ) d (2 ) | | i i 1  Find principal components (i.e. axes of hyper-ellipsoidal distribution)  Select maximum variance (largest axes)  Eigenvalue problem  Eigenvectors – principle components  Eigenvalues – size of axis
  • 7. Novelty Detection  Deals with imbalance of data between classes  Fault detection in machinery  Easy to collect data representative of healthy state  Difficult to collect data representative of faulted states  Costly to break machinery  Operationally unacceptable  Poor database of faults kept  Can never capture them all!  Model healthy data with decision boundary  If test patterns fall outside, classify as a fault!
  • 8. Problem  PCA is best for selecting a subspace that best represents the data  In pattern recognition, we seek to discriminant between classes  Objective of most feature reduction techniques are not optimized for novelty detection
  • 10. Feature Reduction Techniques  Feature Selection vs. Feature Extraction  Selection  Choosing small subsets of features that are adequate to describe classes  E.g. “Search”  Examines all subsets of feature combinations to find the one which maximizes some objective function  May employ classifier error as objective function  Exponential explosion  Heuristics to mitigate possible  If computationally feasible, gives the best results  Extraction  Computes a small number of new features form the set of old features  E.g. PCA
  • 11. Principal Component Analysis  Seeks a subspace in which the data representation error is minimal  Development  For a set of n vectors in d-dimensional space  seek the equation of a hyper plane onto which the data may be projected with minimal representation error  Hyper plane fixed at the data’s mean, m  Hyper plane’s orientation defined by direction vector, w (normal definition of a plane)  Derive error function
  • 12. Optimization problem well known eigenvalue problem  Resultant feature space is linear  May not represent non-linear and changing data well  Kernel PCA and Dynamic PCA  Techniques only suitable for representing data not discriminating between them Source: Duda, 2000
  • 13. Multiple Discriminant Analysis  Seeks to find efficient subspaces for discrimination rather than representation  Development  Two class problem with d-dimensional set of n-vectors grouped into D1 and D2  Projected onto some direction vector w to give  Consequently grouped into subsets Y1 and Y 2  Find the direction vector w such that the distance between projected sample means m1 and m2 is maximized  Rationalize the distance against the relative sample size
  • 14. Reduces to  Solution is described as “analogous to the well known Rayleigh quotient:”    1 w S w (m1 m2 )  Technique extended for problems with n-classes  Objective to maximize the spread between all classes in the projected space Source: Duda, 2000
  • 16. Development  Objective: distinguish between normal and abnormal classes  KFDA inappropriate (assumes classes group well into separate classes)  Novelty detection – classes may cluster well but abnormal classes expected to orbit the normal data  Means could overlap  Eliminating previous objective functions  Approach: find the subspace maximizing difference between average spread of the normal class and average spread of the abnormal class measured from the mean of the normal class
  • 17. Mathematically, for an outlier class containing b elements and target class containing a-elements with mean m_t  To simplify, introduce outlier scatter matrix, O, for outlier data centered at m_t  Reducing to
  • 18. Maximize this objective function  Find the eigenvectors and eigenvalues of the matrix St-O  Select the first k largest eigenvalues and use corresponding eigenvectors as new basis  Project data onto new basis  Proceed with classification  Limitations  Still dependant on assumption of normal data distribution  (as are other PCA techniques)  Assumption: normal data scatter somewhat circularly and outlier data orbit nicely without intruding  (as with PCA and MDA )  Machinery vibration data are not normally Gaussian (heuristic)
  • 19. Validation: Artificial Data  Artificial 3-d data set  Normal distribution:  spherical (radius 50) centered at origin  Outlier distribution:  randomly generated spherical distribution (radius 100)  Not permitted to fall within cylinder concentric with the normal data’s sphere and oriented with length parallel to [1,1,1]
  • 20. Validation: Artifical Data  Results (reduced to 2 dimensions)  Subspace’s normal vector only 7 degrees off from expected [1,1,1]
  • 22. Apparatus  Spectraquest gear dynamics simulator  3-hp motor  Magnetic particle brake loading  National Instruments PXI data acquisition and control  Accelerometers (sampling 4kHz)
  • 23. Faults  4 motors employed  healthy  Combo bearing faults  Broken rotor bars  Rotor unbalance  Gearbox faults  Fault-free conditions  Missing tooth gear  Chipped tooth  Bearing with outer race faults  Bearings with inner and outer race faults
  • 24. Feature Extraction  Autoregressive model  a model of a statistical process generated by regressing previous values of that statistical process with itself  Sampling of sampled signal that best represents the original sampling  Order 10 Segmentation  Vibration data segmented into groups based on intervals with constant number of shaft rotations  Gaussian Window  70% overlap between segments
  • 27. Results: Kernel FDA  N.B. Potential for singular matrices
  • 29. Feature Extraction in the Absence of Outliers
  • 30. Motivation and Development  The above violates assumption of novelty detection  Limited data from fault classes  In the case where we know nothing of the outlier classes  Work with what we have: normal data  Minimize variance of normal data
  • 31. Results: Novelty Reduction (Outlier Absence)
  • 33. Conclusions  Reduce a large feature space to a smaller one  Mitigate the curse of dimensionality  Objective function tweaked for novelty detection  Similar to MDA but modified to accommodate case where normal and outlier means are closely separated  Results good for artificial and machinery data  Future work  Extend technique with kernels  Difficult problem due to need for mean  Thanks  CEMI  Dr. Mechefske, Queens