IFAC MIM 2013

Condition Monitoring
at Rolling Mills
with Data-Driven
Residual-Based Fault Detection
Francisco Serdio Fernández
Department of Knowledge-Based
francisco.serdio@jku.at
http://www.flll.jku.at/staff/francisco
Francisco Serdio, Edwin Lughofer, Kurt Pichler,
Thomas Buchegger, Hajrudin Efendic
Mathematical Systems
Johannes Kepler University
Linz - Austria

Index
• Residual Based Approach
• Framework
» Data Cleaning
» System Identification
» Model Training
» Model Testing
• Reference Method
» Principal Component Analysis – PCA
» Multi Scale Principal Component Analysis – MSPCA

Index
• Current Challenges
» Global approaches
» Fixed thresholds
• Artificial faults
» Constant Failure
» Drift Failure
• Results
» ROC Curves
» Detection Rates
• Conclusions
• Outlook

Basic Idea of Residual-Based Approach
Increasing the dimensionality of the joint channel space decreases the likelihood that a
fault is affected in all channels with same intensity and direction!
Fault No Fault!, but non-smooth
pattern of signal
Joint Channel Space
(smooth dependency)

Framework

– Produces a new dataset to be used in the following step
– Iterative process  Identifies which channels explain others
– Produces a model for each previously identified system
– Determines when there is a fault in the running system
Framework
• Off-line stage
» Data cleaning
» System identification
» Model training
• On-line stage
» Model testing

Framework – Data cleaning
• Remove duplicated channels
» Duplicated?  R2 greater than 0.95
• Remove outliers
» Outlier?  pairwise distance in the training data  outlier degree
• Downsample data set
» Keep the shape of the channel
• Remove constant channels
» Constant? 
• Remove binary channels
» Binary? 

Framework – System Identification
• Identify channel dependencies
» Forward selection with orthogonalization
– Achieves channel ranking according to their importance level
for explaining target (most important first)
» GA based feature selection (included in Box-Cox)
– Outputs individuals with 1’s and 0’s indicating whether a
variable is included or not
• Determine optimal number of dimensions in ranking
scheme
» Find a knee in the cumulative quality sum curve
– Automatically determine by means of gradient
– Keeps the inputs modelling the useful information
– Discards the inputs modelling the noise

Framework – System Identification
• Determine optimal number of dimensions

Framework – Model Training
• Models applied, stepwise increasing non-lin. deg.
» Ridge Regression (linear)
“T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference
and Prediction - Second Edition. Springer, New York Berlin Heidelberg, 2009“
– Global MLR with Tichonov regularization included
» Genetic Box-Cox (slightly non-linear)
”R.M. Sakia. The Box-Cox transformation technique: a review. The Statistician, 41:168--178, 1992.“
– Combining original Box-Cox with GA
- Transform the inputs to introduce slight non-linearities
- Use linear regression over the transformed inputs
- The transformations are learnt using a GA
“E. Lughofer and S. Kindermann. SparseFIS: Data-driven learning of fuzzy systems with sparsity
constraints. IEEE Transactions on Fuzzy Systems, 18(2): 396--411, 2010.“
– Top down fuzzy modeling approach applying numerical sparsity constraints
optimization, out-weighting unimportant rules and parameters
– Employing iterative VQ, projected gradient descent and Semi-Smooth Newton
» SparseFIS (highly non-linear)

Example of Input Transformations

Overview of Training Methods
Method Type
Training
effort
System
Identification
Model Training
Linear
Regression
Linear Low Forward selection 10-fold cv with mse
Box-Cox
Slightly
non-linear
Medium Genetic algorithm 10-fold cv with mse
SparseFIS
Highly
non-linear
High Forward selection
10-fold cv with mse
+ grid search

On-line Analysis of Residual
Signals
• Computation of residuals
» The residuals are the differences between the observed values
– Global: based on CV model error  a unique value for each point in
the testing data set is provided
– Local: based on adaptive confidence intervals according to variation
in the data distribution over space  a different value for each point
in the testing data set is provided
• Combine residuals and error bars
» The error bars are used to normalize the residuals
– The residuals are now expressed in error bar units
and the predicted ones
• Computation of error bars
» Two types: global and local

On-line Analysis of Residual
Signals
• On-line tracking of the residuals
» The average μ and the standard deviation σ is tracked
– A window of time is used  values out of the tolerance band trigger
a fault alarm and do not update the tracking
Current residual at time instance k
generated from the ith model
Incremantal /
Decremental
μ and σ over
sliding window
with size T

Dynamic Residual Signals Analysis -
Example
Fault with 50% level
Fault with 10% level

Reference methods
• Principal Components Analysis – PCA
» State of the art in fault detection
– D. Garcia-Alvarez. Fault detection using principal component analysis (pca) in a
wastewater treatment plant (wwtp). In Proceedings of the 62-th Int. Student's Scientic
Conference, 13-17, Saint-Peterburg, Russia, 2009.
– P.F. Odgaard, B. Lin, and S.B. Jorgensen. Observer and data-driven-model-based
fault detection in power plant coal mills. IEEE Transactions on Energy Conversion,
23(2): 659-668, 2008.
» The monitoring can be reduced to two variables (T2
and Q) characterizing two orthogonal subsets of the
original space
– T-Hotelling (T2) represents the major variation in the data
– Q represents the random noise in the data

Reference method (cont’d)
• Multi Scale Principal Components Analysis – MSPCA
» State of the art in process monitoring
– B.R. Bakshi. Multiscale pca with application to multivariate statistical process
» It uses wavelets to reconstruct the original signal
– Reconstruction attempt to remove useless information from the
signal, mainly noise
» Monitoring uses the same statistics as in PCA
– T-Hotelling (T2) represents the major variation in the data
– Q represents the random noise in the data
monitoring. AIChE Journal, 44, 1596-1610, 1998.

• Global approaches
» PCA and MSPCA uses the dataset as a whole
» When new channels are added or removed to the system, the
• Fixed thresholds
» PCA and MSPCA uses a fixed threshold based on training data
– Does not take into account train and test dataset differences
– When train and test differs considerably, the appoach becomes
useless
– The threshold remains unchanged during the online operation of the
system
Current Challenges
method should be trained again
– Low cascadability
» It’s a rigid approach

Artificial Faults
• Artificial faults were introduced in the data
» Regions where channels values are zero were ignored
• Different fault types with different intesities
» Fault types
– Means a progressive increase in the original signal
– Different slopes → different shapes
» Fault intesities (% added to the original signal)
• Introduction of faults was shuffled 10 times to avoid unlucky
situations (due to a bad coverage of faulty channels)
– Means a jump in the original signal
» From exponential to logarithmic
– Constant failure
– Drift failure
– 5%, 10%, 20%, 50%, 100%

Artificial Faults Examples

Results
• ROC Curves
» For sensibility analysis facing true positives vs. false
– How much the detection rate influences the overdetection rate
– How much sensible is the method to its parameters
– Which method is best
– A higher AUC (Area Under the Curve) points to a better
method, as higher detection rates (y-axis, values far from x-axis)
can be achieved with lower false alarm rates (x-axis, values
close to y-axis).
positives  Detection vs. Overdetection
» Depict the following useful information

Results – Multi Scale PCA
• Shows to be useless for our problem
» The wavelet reconstruction is not able to reconstruct
the signals properly
– Poor channel reconstruction
– The percentage of channels reconstructed using the wavelets,
with accuracy greater or equal to 90% is around 55% to 65%
of the total number of channels for all the datasets
– Noise is introduce during the channel reconstruction, even in
the channels reconstructed with good quality
» Inacceptable overdetection rates in all the datasets
– The method is not able to operate below 10% overdetection
rate  useless in our problem

Results – Multi Scale PCA

Results – ROC Curves – Scenario 1

Results – Detection Rates - Scenario
1

2

4

Statistical preference of methods
• Two statistical tests using
» (i) Rankings / (ii) Absolute detection rates
– Plus denotes significant superiority over the other methods
– Minus denotes inferiority to the other methods
– 0 indicates no difference
– na indicates not applicable

Conclusions
• MSPCA is not applicable in our problem
• PCA is either not applicable or outperformed by our residual-based
• In the pessimistic (real-world) case, Box-Cox showed best
performance, thus favoring slight non-linearities in the models
• A significant performance boost over pessimistic case could be
recognized for all models times
» Fault misses can be largely explained by having not a (good) model
approach
available for a channel where a fault occurs!

Outlook
• Deal with the non-stable behaviour of the residuals
(enhanced pattern analysis, model update schemes)
• Deal with the data from different products
(probably operator’s feedback required)

Thanks a lot for your attention!

IFAC MIM 2013

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (20)

Similaire à IFAC MIM 2013

Similaire à IFAC MIM 2013 (20)

Dernier

Dernier (20)

IFAC MIM 2013