F. Serdio, E. Lughofer, K. Pichler, T. Buchegger and H. Efendic, Data-Driven Residual-Based Fault Detection for Condition Monitoring in Rolling Mills, Proceedings of the IFAC Conference on Manufacturing Modeling, Management and Control, MIM '2013, St. Petersburg, Russia, 2013, pp. 1546-1551. (Winner of MIM 2013 Best paper award)
By mistake the paper is indexed with its preliminary title "Condition Monitoring at Rolling Mills with Data-Driven Residual-Based Fault Detection"
1. Condition Monitoring
at Rolling Mills
with Data-Driven
Residual-Based Fault Detection
Francisco Serdio Fernández
Department of Knowledge-Based
francisco.serdio@jku.at
http://www.flll.jku.at/staff/francisco
Francisco Serdio, Edwin Lughofer, Kurt Pichler,
Thomas Buchegger, Hajrudin Efendic
Mathematical Systems
Johannes Kepler University
Linz - Austria
2. Index
• Residual Based Approach
• Framework
» Data Cleaning
» System Identification
» Model Training
» Model Testing
• Reference Method
» Principal Component Analysis – PCA
» Multi Scale Principal Component Analysis – MSPCA
francisco.serdio@jku.at
http://www.flll.jku.at/staff/francisco
Francisco Serdio, Edwin Lughofer, Kurt Pichler,
Thomas Buchegger, Hajrudin Efendic
3. francisco.serdio@jku.at
http://www.flll.jku.at/staff/francisco
Index
• Current Challenges
» Global approaches
» Fixed thresholds
• Artificial faults
» Constant Failure
» Drift Failure
• Results
» ROC Curves
» Detection Rates
• Conclusions
• Outlook
Francisco Serdio, Edwin Lughofer, Kurt Pichler,
Thomas Buchegger, Hajrudin Efendic
4. Basic Idea of Residual-Based Approach
Increasing the dimensionality of the joint channel space decreases the likelihood that a
fault is affected in all channels with same intensity and direction!
francisco.serdio@jku.at
http://www.flll.jku.at/staff/francisco
Fault No Fault!, but non-smooth
pattern of signal
Francisco Serdio, Edwin Lughofer, Kurt Pichler,
Thomas Buchegger, Hajrudin Efendic
Joint Channel Space
(smooth dependency)
6. – Produces a new dataset to be used in the following step
– Iterative process Identifies which channels explain others
– Produces a model for each previously identified system
– Determines when there is a fault in the running system
francisco.serdio@jku.at
http://www.flll.jku.at/staff/francisco
Framework
• Off-line stage
» Data cleaning
» System identification
» Model training
• On-line stage
» Model testing
Francisco Serdio, Edwin Lughofer, Kurt Pichler,
Thomas Buchegger, Hajrudin Efendic
7. Framework – Data cleaning
• Remove duplicated channels
» Duplicated? R2 greater than 0.95
• Remove outliers
» Outlier? pairwise distance in the training data outlier degree
• Downsample data set
» Keep the shape of the channel
francisco.serdio@jku.at
http://www.flll.jku.at/staff/francisco
• Remove constant channels
» Constant?
• Remove binary channels
» Binary?
Francisco Serdio, Edwin Lughofer, Kurt Pichler,
Thomas Buchegger, Hajrudin Efendic
8. Framework – System Identification
• Identify channel dependencies
» Forward selection with orthogonalization
– Achieves channel ranking according to their importance level
for explaining target (most important first)
» GA based feature selection (included in Box-Cox)
– Outputs individuals with 1’s and 0’s indicating whether a
variable is included or not
• Determine optimal number of dimensions in ranking
scheme
» Find a knee in the cumulative quality sum curve
– Automatically determine by means of gradient
– Keeps the inputs modelling the useful information
– Discards the inputs modelling the noise
francisco.serdio@jku.at
http://www.flll.jku.at/staff/francisco
Francisco Serdio, Edwin Lughofer, Kurt Pichler,
Thomas Buchegger, Hajrudin Efendic
9. Framework – System Identification
• Determine optimal number of dimensions
francisco.serdio@jku.at
http://www.flll.jku.at/staff/francisco
Francisco Serdio, Edwin Lughofer, Kurt Pichler,
Thomas Buchegger, Hajrudin Efendic
10. Framework – Model Training
• Models applied, stepwise increasing non-lin. deg.
» Ridge Regression (linear)
“T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference
and Prediction - Second Edition. Springer, New York Berlin Heidelberg, 2009“
– Global MLR with Tichonov regularization included
» Genetic Box-Cox (slightly non-linear)
”R.M. Sakia. The Box-Cox transformation technique: a review. The Statistician, 41:168--178, 1992.“
– Combining original Box-Cox with GA
- Transform the inputs to introduce slight non-linearities
- Use linear regression over the transformed inputs
- The transformations are learnt using a GA
“E. Lughofer and S. Kindermann. SparseFIS: Data-driven learning of fuzzy systems with sparsity
constraints. IEEE Transactions on Fuzzy Systems, 18(2): 396--411, 2010.“
– Top down fuzzy modeling approach applying numerical sparsity constraints
optimization, out-weighting unimportant rules and parameters
– Employing iterative VQ, projected gradient descent and Semi-Smooth Newton
francisco.serdio@jku.at
http://www.flll.jku.at/staff/francisco
» SparseFIS (highly non-linear)
Francisco Serdio, Edwin Lughofer, Kurt Pichler,
Thomas Buchegger, Hajrudin Efendic
11. Example of Input Transformations
francisco.serdio@jku.at
http://www.flll.jku.at/staff/francisco
Francisco Serdio, Edwin Lughofer, Kurt Pichler,
Thomas Buchegger, Hajrudin Efendic
12. francisco.serdio@jku.at
Overview of Training Methods
http://www.flll.jku.at/staff/francisco
Method Type
Francisco Serdio, Edwin Lughofer, Kurt Pichler,
Thomas Buchegger, Hajrudin Efendic
Training
effort
System
Identification
Model Training
Linear
Regression
Linear Low Forward selection 10-fold cv with mse
Box-Cox
Slightly
non-linear
Medium Genetic algorithm 10-fold cv with mse
SparseFIS
Highly
non-linear
High Forward selection
10-fold cv with mse
+ grid search
13. On-line Analysis of Residual
Signals
• Computation of residuals
» The residuals are the differences between the observed values
– Global: based on CV model error a unique value for each point in
the testing data set is provided
– Local: based on adaptive confidence intervals according to variation
in the data distribution over space a different value for each point
in the testing data set is provided
• Combine residuals and error bars
» The error bars are used to normalize the residuals
– The residuals are now expressed in error bar units
francisco.serdio@jku.at
http://www.flll.jku.at/staff/francisco
and the predicted ones
• Computation of error bars
» Two types: global and local
Francisco Serdio, Edwin Lughofer, Kurt Pichler,
Thomas Buchegger, Hajrudin Efendic
14. On-line Analysis of Residual
Signals
• On-line tracking of the residuals
» The average μ and the standard deviation σ is tracked
– A window of time is used values out of the tolerance band trigger
a fault alarm and do not update the tracking
francisco.serdio@jku.at
http://www.flll.jku.at/staff/francisco
Francisco Serdio, Edwin Lughofer, Kurt Pichler,
Thomas Buchegger, Hajrudin Efendic
Current residual at time instance k
generated from the ith model
Incremantal /
Decremental
μ and σ over
sliding window
with size T
15. Dynamic Residual Signals Analysis -
Example
francisco.serdio@jku.at
http://www.flll.jku.at/staff/francisco
Francisco Serdio, Edwin Lughofer, Kurt Pichler,
Thomas Buchegger, Hajrudin Efendic
Fault with 50% level
Fault with 10% level
16. Reference methods
• Principal Components Analysis – PCA
» State of the art in fault detection
– D. Garcia-Alvarez. Fault detection using principal component analysis (pca) in a
wastewater treatment plant (wwtp). In Proceedings of the 62-th Int. Student's Scientic
Conference, 13-17, Saint-Peterburg, Russia, 2009.
– P.F. Odgaard, B. Lin, and S.B. Jorgensen. Observer and data-driven-model-based
fault detection in power plant coal mills. IEEE Transactions on Energy Conversion,
23(2): 659-668, 2008.
» The monitoring can be reduced to two variables (T2
and Q) characterizing two orthogonal subsets of the
original space
– T-Hotelling (T2) represents the major variation in the data
– Q represents the random noise in the data
francisco.serdio@jku.at
http://www.flll.jku.at/staff/francisco
Francisco Serdio, Edwin Lughofer, Kurt Pichler,
Thomas Buchegger, Hajrudin Efendic
17. Reference method (cont’d)
• Multi Scale Principal Components Analysis – MSPCA
» State of the art in process monitoring
– B.R. Bakshi. Multiscale pca with application to multivariate statistical process
» It uses wavelets to reconstruct the original signal
– Reconstruction attempt to remove useless information from the
signal, mainly noise
» Monitoring uses the same statistics as in PCA
– T-Hotelling (T2) represents the major variation in the data
– Q represents the random noise in the data
francisco.serdio@jku.at
http://www.flll.jku.at/staff/francisco
monitoring. AIChE Journal, 44, 1596-1610, 1998.
Francisco Serdio, Edwin Lughofer, Kurt Pichler,
Thomas Buchegger, Hajrudin Efendic
18. • Global approaches
» PCA and MSPCA uses the dataset as a whole
» When new channels are added or removed to the system, the
• Fixed thresholds
» PCA and MSPCA uses a fixed threshold based on training data
– Does not take into account train and test dataset differences
– When train and test differs considerably, the appoach becomes
useless
– The threshold remains unchanged during the online operation of the
system
francisco.serdio@jku.at
http://www.flll.jku.at/staff/francisco
Current Challenges
method should be trained again
– Low cascadability
» It’s a rigid approach
Francisco Serdio, Edwin Lughofer, Kurt Pichler,
Thomas Buchegger, Hajrudin Efendic
19. Artificial Faults
• Artificial faults were introduced in the data
» Regions where channels values are zero were ignored
• Different fault types with different intesities
» Fault types
– Means a progressive increase in the original signal
– Different slopes → different shapes
» Fault intesities (% added to the original signal)
• Introduction of faults was shuffled 10 times to avoid unlucky
situations (due to a bad coverage of faulty channels)
francisco.serdio@jku.at
– Means a jump in the original signal
» From exponential to logarithmic
http://www.flll.jku.at/staff/francisco
– Constant failure
– Drift failure
– 5%, 10%, 20%, 50%, 100%
Francisco Serdio, Edwin Lughofer, Kurt Pichler,
Thomas Buchegger, Hajrudin Efendic
21. Results
• ROC Curves
» For sensibility analysis facing true positives vs. false
– How much the detection rate influences the overdetection rate
– How much sensible is the method to its parameters
– Which method is best
– A higher AUC (Area Under the Curve) points to a better
method, as higher detection rates (y-axis, values far from x-axis)
can be achieved with lower false alarm rates (x-axis, values
close to y-axis).
francisco.serdio@jku.at
positives Detection vs. Overdetection
» Depict the following useful information
http://www.flll.jku.at/staff/francisco
Francisco Serdio, Edwin Lughofer, Kurt Pichler,
Thomas Buchegger, Hajrudin Efendic
22. Results – Multi Scale PCA
• Shows to be useless for our problem
» The wavelet reconstruction is not able to reconstruct
the signals properly
– Poor channel reconstruction
– The percentage of channels reconstructed using the wavelets,
with accuracy greater or equal to 90% is around 55% to 65%
of the total number of channels for all the datasets
– Noise is introduce during the channel reconstruction, even in
the channels reconstructed with good quality
» Inacceptable overdetection rates in all the datasets
– The method is not able to operate below 10% overdetection
rate useless in our problem
francisco.serdio@jku.at
http://www.flll.jku.at/staff/francisco
Francisco Serdio, Edwin Lughofer, Kurt Pichler,
Thomas Buchegger, Hajrudin Efendic
27. Results – Detection Rates - Scenario
1
francisco.serdio@jku.at
http://www.flll.jku.at/staff/francisco
Francisco Serdio, Edwin Lughofer, Kurt Pichler,
Thomas Buchegger, Hajrudin Efendic
28. Results – Detection Rates - Scenario
2
francisco.serdio@jku.at
http://www.flll.jku.at/staff/francisco
Francisco Serdio, Edwin Lughofer, Kurt Pichler,
Thomas Buchegger, Hajrudin Efendic
29. Results – Detection Rates - Scenario
4
francisco.serdio@jku.at
http://www.flll.jku.at/staff/francisco
Francisco Serdio, Edwin Lughofer, Kurt Pichler,
Thomas Buchegger, Hajrudin Efendic
30. Statistical preference of methods
• Two statistical tests using
» (i) Rankings / (ii) Absolute detection rates
– Plus denotes significant superiority over the other methods
– Minus denotes inferiority to the other methods
– 0 indicates no difference
– na indicates not applicable
francisco.serdio@jku.at
http://www.flll.jku.at/staff/francisco
Francisco Serdio, Edwin Lughofer, Kurt Pichler,
Thomas Buchegger, Hajrudin Efendic
31. Conclusions
• MSPCA is not applicable in our problem
• PCA is either not applicable or outperformed by our residual-based
• In the pessimistic (real-world) case, Box-Cox showed best
performance, thus favoring slight non-linearities in the models
• A significant performance boost over pessimistic case could be
recognized for all models times
» Fault misses can be largely explained by having not a (good) model
francisco.serdio@jku.at
http://www.flll.jku.at/staff/francisco
approach
available for a channel where a fault occurs!
Francisco Serdio, Edwin Lughofer, Kurt Pichler,
Thomas Buchegger, Hajrudin Efendic
32. Outlook
• Deal with the non-stable behaviour of the residuals
(enhanced pattern analysis, model update schemes)
• Deal with the data from different products
(probably operator’s feedback required)
francisco.serdio@jku.at
http://www.flll.jku.at/staff/francisco
Francisco Serdio, Edwin Lughofer, Kurt Pichler,
Thomas Buchegger, Hajrudin Efendic
33. Thanks a lot for your attention!
francisco.serdio@jku.at
http://www.flll.jku.at/staff/francisco
Francisco Serdio, Edwin Lughofer, Kurt Pichler,
Thomas Buchegger, Hajrudin Efendic