This paper examines the use of signal processing and feature engineering techniques to design a facial recognition system with image-reconstruction privacy protection. The Fast Fourier Transform (FFT) and Wavelet Transform (WT) are used to derive features from face images in the Yale and Olivetti datasets. Then, the features are selected by a filter. We propose several filters that fall into three categories – conventional filters (rectangular and triangular), unsupervised-learning filter (variance), and supervised-learning filter (SNR, FDR, SD, and t-test). Furthermore, we investigate the role of FFT phase removal as a possible tool for image reconstruction privacy protection. The results show that both filtering and FFT phase removal can prevent privacy-compromising reconstruction of the original images without sacrificing recognition accuracy. Among the filters, we found the SNR and t-test filters to yield the best recognition accuracies while preserving the image-reconstruction privacy. This work presents a great promise for signal processing and feature engineering as a tool toward building privacy-preserving facial recognition systems.
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
Filtering of Frequency Components for Privacy Preserving Facial Recognition
1. FILTERING OF FREQUENCY COMPONENTS FOR PRIVACY-PRESERVING FACIAL
RECOGNITION
Artur Filipowicz, Thee Chanyaswad, S.Y. Kung
Princeton University
Princeton, NJ, USA
ABSTRACT
This paper examines the use of signal processing and fea-
ture engineering techniques to design a facial recognition
system with image-reconstruction privacy protection. The
Fast Fourier Transform (FFT) and Wavelet Transform (WT)
are used to derive features from face images in the Yale and
Olivetti datasets. Then, the features are selected by a filter.
We propose several filters that fall into three categories – con-
ventional filters (rectangular and triangular), unsupervised-
learning filter (variance), and supervised-learning filter (SNR,
FDR, SD, and t-test). Furthermore, we investigate the
role of FFT phase removal as a possible tool for image-
reconstruction privacy protection.
The results show that both filtering and FFT phase re-
moval can prevent privacy-compromising reconstruction of
the original images without sacrificing recognition accuracy.
Among the filters, we found the SNR and t-test filters to yield
the best recognition accuracies while preserving the image-
reconstruction privacy. This work presents a great promise
for signal processing and feature engineering as a tool toward
building privacy-preserving facial recognition systems.
1. INTRODUCTION
Growth of the Internet and digital sensors has led to the gen-
eration of massive amount of personal information. Videos,
photos, emails, banking transactions, browsing history, GPS
tracks, and other data are continuously collected and stored
in the cloud. The data may be circulated around the Internet
and severs without the data owner’s knowledge. The value
and scale of the data create the risk of privacy leakage. To
this end, the data owner should control the privacy of the data
before uploading it to the cloud.
As noted in [1], 9/11, Edward Snowden incident and other
events have resulted in both a demand for recognition systems
and a concern for privacy violation by such systems. A critical
competent of such system is biometric identification, mainly
by facial recognition, which can be performed without con-
sent and at a distance by surveillance cameras. While a lot of
research has gone into improving facial recognition systems -
[2], [3], [4], [5], [6] among others - relatively smaller amount
of research has been done on incorporating privacy into such
systems, with some examples being [7], [8], and [9].
The primary approach to privacy in facial recognition is
cryptography. In [7], Eigenfaces recognition system is per-
formed on homomorphically encrypted data. In this crypto-
graphic system the client wants the server to identify a face
without revealing the image to the server. The server also
does not want to reveal the contents of its database. Thus,
the data are quantized for the Paillier encryption, and both the
server and the client share computation needed for matching
faces. Experimentally, 96% accuracy was achieved in [7] on
the ORL Database of Faces. [8] improved the algorithm pre-
sented in [7] by reducing the time and space complexity with
the use of garbled circuits.
Along a different line of research, [9] used Helper Data
Systems to provide privacy. The approach generates binary
feature vectors by determining reliable bits based on statistics
from sample data. This approach, however, inevitably leaves
privacy up to the cloud server and relies on a level of trust in
the cloud.
Our approach to privacy-preserving facial recognition
rests on the idea that privacy is compromised when an im-
age can be visually inspected by an unauthorized individual.
Therefore, with our definition, as long as the reconstruction
of an image from a feature vector is not meaningful to human
eyes, privacy is maintained. Additionally, we believe that pri-
vacy control of the data should be determined the data owner.
Therefore, our approach puts control over privacy solely in
the hand of the data owner.
Facial recognition systems often utilize the Fast Fourier
Transform (FFT) or Wavelet Transform (WT) as a part of
feature engineering. For example [3], [4], [5], and [6] use
FFT or WT. In recognition systems like these, it is possi-
ble to alter the output of FFT or WT to reduce the quality
of the reconstructed image without sacrificing accuracy of
the classification. Therefore, our approach explores how dif-
ferent filtering methods can best select the features derived
from FFT and WT for the purpose of privacy-preserving fa-
cial recognition. Seven filters are studied in our work. They
belong to three categories – conventional filters (rectangular
2. and triangular), unsupervised-learning filter (variance), and
supervised-learning filters (SNR, FDR, SD, and t-test).
The results on the Yale and Olivetti datasets show that,
the t-test and SNR filters can provide image-reconstruction
privacy, with recognition accuracy cost of 0.6% on Olivetti
dataset and an improvement in accuracy of 0.3% to 1.1% on
Yale dataset; surpassing accuracies reported for these datasets
in literature. Moreover, for FFT, phase removal is also found
to be another complementary method that can effectively pro-
tect privacy with the recognition accuracy cost of at most only
0.6%. Overall, with our methods, privacy can be preserved
without compromising facial recognition performance.
2. MACHINE LEARNING SYSTEMS
Fig. 1. Block diagrams of the machine learning systems used.
We can break down a facial recognition system into two
components – feature engineering and classification. Both
components can be made up of several parts, for example
several sequential WT transforms. We limit our feature engi-
neering to one transform to keep image reconstruction simple,
however this may be justified constraint in time- or power-
sensitive settings such as mobile applications. Additionally,
more complicated systems from literature do not perform bet-
ter.
2.1. Feature Engineering
As depicted in Figure 1, our classification systems for this
investigation begin with an application of FFT or WT to an
image. The output from FFT and WT serves as the features
in our systems. The FFT/WT features are vectorized to form
the feature vector. Then, a filter is applied as a part of feature
selection.
2.2. Classification
Classification is accomplished via Support Vector Machines
(SVM) with both linear and RBF kernels selected using pa-
rameter grid search.
3. FREQUENCY FEATURE FILTERING
The core of our scheme is the use of filters for feature selec-
tion of FFT and WT frequencies. A filter is a binary matrix
which selects features based on some selecting function. Let
Fx,d be a filter Fx,d ∈ {0, 1}d×m
, where m is the number of
features in each example, x is one of the selecting functions,
and d is the number of features selected by the filter.
Fx,d,i,j = 1 if Wx(j), value assigned by the selecting
function to feature j, is one of the d largest Wx, otherwise
Fx,d,i,j = 0. Thus m-dimensional feature vectors are com-
pressed to d-dimensional feature vectors, based on the order-
ing of Wx.
Let X be feature vector for an example:
Xfiltered = Fx,dX (1)
Selection functions assign weight to each feature and di-
vide filters into three categories: conventional, unsupervised,
and supervised. For the following definitions, let µ+
j , σ+
j , and
N+
j be the mean, standard deviation and number of examples
for a target individual, respectively, and let µ−
j , σ−
j , and N−
j
be the mean, standard deviation and number of examples for
all other people in the training set, respectively.
3.1. Conventional Filter
Conventional filters do not utilize training data when deter-
mining which features should be selected. In our investiga-
tion, the rectangle and triangular filters are the conventional
filters as they select a predefined region of features.
3.1.1. Rectangular
Let J be a rectangular region of features,
WRect(j) =
1 if j ∈ J
0 otherwise
(2)
Fig. 2. The rectangular filter. White pixels represent the se-
lected features. The region is intended to select amplitudes of
low frequencies in the FFT transform.
3.1.2. Triangular
This filter is inspired by results in [3], where FFT features
with high variance are the amplitudes of low frequencies and
they form a triangular region. Let J be a triangular region of
features,
3. WT ri(j) =
1 if j ∈ J
0 otherwise
(3)
Fig. 3. The triangular filter. White pixels represent the se-
lected features. The region is intended to select amplitudes of
low frequencies in the FFT transform.
3.2. Unsupervised Filter
The variance-based filter is unsupervised in the sense that it
does not consider labels of the training examples. Feature
selection is done purely based on the variance of each feature
across the training set. The idea for this filter originates in [3].
3.2.1. Variance
WV AR(j) = σ2
j (4)
3.3. Supervised Filter
We devised four supervised filters based on the signal to noise
ratio, Fisher discriminant ratio, symmetric divergence and t-
test statistical tests. The supervised filters require labels for
positive and negative classes. During training, for each indi-
vidual in the dataset, the training examples are divided into
two classes. Positive class contains the pictures of target indi-
vidual and the negative class contains all other pictures. The
signal-to-noise ratio, Fisher discriminant ratio, symmetric di-
vergence and t-test are computed based on those two classes
for each feature, and the final weight for a feature is the mean
of weights cross all of the individuals in the training set.
3.3.1. Signal-to-Noise Ratio
WSNR(j) =
|µ+
j − µ−
j |
σ+
j + σ−
j
(5)
3.3.2. Fisher Discriminant Ratio
WF DR(j) =
(µ+
j − µ−
j )2
(σ+
j )2 + (σ−
j )2
(6)
3.3.3. Symmetric Divergence
WSD(j) =
1
2
(µ+
j )2
(µ−
j )2
+
(µ−
j )2
(µ+
j )2
+
1
2
(µ+
j − µ−
j )2
(σ+
j )2 + (σ−
j )2
− 1 (7)
3.3.4. T-test
WT (j) =
µ+
j − µ−
j
(σ+
j )2
N+
j
+
(σ−
j )2
N−
j
(8)
4. EXPERIMENTAL SETUP
4.1. Datasets
We evaluate our machine learning systems on two datasets
– Yale and Olivetti. The Yale dataset consists of 165 gray-
scaled images of 15 individuals with 11 images each. The
original dataset, presented in [10], has images which along
with a face present hair and clothing of the individual. To
ensure that our system classifies individuals solely based on
facial features, we used a modified dataset from [11] which
contains cropped images. The Olivetti dataset consists of 400
gray-scaled images of 40 individuals with 10 images each.
Both datasets have images of the same size which only con-
tain the face.
4.2. Classification
Ten-fold cross-validation is used to test classification accura-
cies. Grid-search is used to optimize the SVM parameters for
each fold.
One hundred randomized instances of cross-validation are
performed, and the training of supervised and unsupervised
filters is constrained to the appropriate folds. Since the rect-
angular and triangular filters are parameterized by a region,
they are used to determine d for all other filters. For FFT,
the filters are only applied to the amplitudes of frequencies,
whereas for WT, the filters are applied to all components.
4.3. Reconstruction
To reconstruct original images from filtered feature vectors,
we set the values of all features which were filtered out to be
zero. The operation can be viewed as multiplication by the
transpose of the filter.
˜X = FT
x,dXfiltered (9)
Then, the inverse of the FFT or WT is applied on ˜X. For
FFT, since only the magnitude is filtered in our experiment,
we put the original phase back before performing the inverse
FFT. This is assuming the worst-case scenario that the at-
tacker can obtain the phase information from the vectorized
FFT feature vector. As mentioned above, only one transform
4. Filter Yale Olivetti
No filter 0.741 0.975
Rectangular 0.737 0.967
Variance 0.742 0.970
SNR 0.752 0.968
FDR 0.748 0.965
SD 0.686 0.952
T-test 0.75 0.968
Table 1. Accuracies of the FFT with different filters. Top 399
features are selected. The first row (No filter) the baseline
accuracies.
is used at a time to keep this process simple and more com-
plicated systems from literature do not perform better.
5. EXPERIMENTAL RESULTS
5.1. FFT-Baseline
The baseline method serves as a basis for comparison be-
tween the system without regards for privacy and our privacy-
preserving systems. For FFT, our baseline method uses both
the magnitude and phase information as the features. Once
the features derived from FFT are vectorized, we feed them
directly into SVM to perform training/classification.
The baseline accuracy with FFT is 0.741 for the Yale
dataset and 0.975 for the Olivetti dataset.
5.2. FFT with Filters
Based on accuracies in Table 1, the SNR filter has the best or
near best accuracy across both datasets. Figure 4 shows that
it also preserves privacy. Among the filters studied, the face
image reconstructed from the t-test filtered image shows the
highest level of privacy. The t-test filter, therefore, presents
a very attractive option for privacy-preserving facial recogni-
tion.
5.3. WT-Baseline
Similar to FFT, our baseline method derives WT features from
the image and feeds them directly into SVM to perform train-
ing/classification. With WT, the baseline accuracy is 0.82 for
the Yale dataset and 0.963 for the Olivetti dataset.
5.4. WT with Filters
Using the result of the wavelet transform, the t-filter also pro-
duces the best accuracies across the two datasets (Table 2).
Rectangular 0.73 Variance 0.742
SNR 0.752 T-test 0.75
Fig. 4. Reconstructed images after a FFT transform and filter
d = 399. The filter name and accuracy on the Yale dataset is
listed under each image.
Variance 0.755 T-test 0.823
Fig. 5. Reconstructed images after a WT transform and a
filter with d = 399. The filter name and accuracy on the Yale
dataset is listed under each image.
As seen in Figure 5, the t-test filter also preserves pri-
vacy. The reconstructions from the variance and t-test filters
are similar to the reconstructions from the other filters. The
privacy protections appears to be a property of the WT trans-
form rather than any particular filter.
5. Filter Yale Olivetti
No filter 0.82 0.975
Variance 0.755 0.941
SNR 0.821 0.968
FDR 0.813 0.969
SD 0.71 0.883
T-test 0.823 0.969
Table 2. Accuracies of the WT with different filter. Top 399
features are selected. The first row (No filter) are the baseline
accuracies.
6. PRIVACY PROTECTION VIA FFT PHASE
REMOVAL
The Fast Fourier Transform (FFT) decomposes the input sig-
nal into a combination of complex sinusoids of ordered dis-
crete frequencies. Each frequency of the complex sinusoid
has an associated complex coefficient as derived by FFT. In
our context, these complex (frequency-ordered) coefficients
are considered as the output of FFT.
Since the coefficients are complex, they may be repre-
sented in the polar coordinate form, which consists of the
magnitude and the phase. Therefore, in addition to our in-
vestigation on the effects of filtering, we explore how the in-
formation embedded in the magnitude and the phase of FFT
coefficients can affect privacy and recognition performance.
6.1. Experimental Results on FFT Phase Removal
If we only use the information of the magnitude to build the
facial recognition classifier, there is no need for the phase in-
formation to be transmitted between the client and the cloud
at all. Therefore, the attacker would only have the magnitude
information to reconstruct the images. The result is a faceless
image as seen in Figure 6. This, hence, is desirable for the
purpose of privacy protection.
On the other hand, since this system design assumes that
the phase information is not revealed to the cloud, we test
the facial recognition accuracy when using the FFT magni-
tude only. The results show that the cost in accuracy is in-
significant. The accuracy drops by only 0.006 to 0.735 for the
Yale dataset and by only 0.001 to 0.974 for Olivetti dataset.
Hence, this presents a desirable scenario in our application
since privacy is protected, while recognition accuracy is not
significantly compromised.
Fig. 6. Result of reconstruction of a FFT transformed image
with the phase removed.
Method Yale Olivetti
(1) Filtering (FFT) +1.1% -0.6%
(2) Filtering (WT) +0.3% +0.6%
(3) Phase removal -0.6% -0.1%
Table 3. Change in mean accuracy relative to the baseline
accuracy for the three methods. (1) FFT + SNR filter vs FFT
baseline, (2) WT + t-test filter vs WT baseline, (3) FFT phase
removal vs FFT baseline. The positive sign means accuracy
gain compared to the baseline, while the negative sign means
accuracy loss compared to the baseline.
7. PRIVACY PROTECTION VIA FEATURE
ENGINEERING
There is a lot of potential for privacy protection with proper
alterations in the feature engineering process. Across all of
the methods, the cost, in the form of accuracy loss, is very
small (Table 3). In fact, for the wavelet transform, the ac-
curacy increases when filters are applied. Additionally, the
reconstructions from wavelet transform features yield much
better privacy by completely removing the eyes and mouth in
the image. It appears that the wavelet transform forms a very
good basis for privacy preserving feature vectors.
While FFT-based systems lost accuracy under all of these
methods, it is easier to see why the filters preserve privacy.
The supervised statistical filters select both high and low fre-
quencies as seen in Figure 7. Therefore, the reconstruction is
not just a blurred image as is in the case with selecting just the
low frequencies. As fewer of these frequencies are selected,
the image is even more blurred; however, the accuracy is pre-
served by the high-frequency components. The mixture of
low-frequency and high-frequency components avoids details
of the image from being reconstructed fully.
Regarding the filters, which should be applied for perfor-
mance improvements, SNR and t-test are the most useful for
privacy protection. For both FFT and WT, these two filters
allow a large dimension reduction with almost no accuracy
loss.
6. Rectangular Variance
SNR T-test
Fig. 7. FFT Amplitudes selected by the filters (marked in
white).
In the case of FFT-derived features, phase removal pro-
vides a great option for privacy protection, while preserving
the recognition accuracy, as summarized in Figure 7. Once
the phase is removed, filters can also be applied to the mag-
nitude to provide another level of privacy protection or even
for recognition performance improvement/computational cost
saving.
Our system outperforms other systems mentioned in the
literature for the Yale dataset and is very close to the top ac-
curacy for the Olivetti dataset. The best accuracy for the Yale
dataset is 96.36% from [6]. The system used [6] is 3 WT
followed by FFT. However, this paper uses full size images
from the Yale dataset. When the same system is used with
the cropped images the accuracy falls to 70%. The best ac-
curacy for Olivetti is 98% from [3]. In comparison, our best
accuracies are 82.3% for Yale and 97% for Olivetti.
Regarding more complicated systems, [5] uses FFT fol-
lowed by PCA for an accuracy of 93.42% on Yale. This result
is based on the larger Yale images. Using their system on the
cropped images produces an accuracy of 79%. This accuracy
is close to our best accuracy and the system preserves accu-
racy, see Figure 8. For completeness, the same system with
WT produces an accuracy of 78.9%.
Fig. 8. Result of reconstruction of a FFT and PCA trans-
formed image based on [5].
8. CONCLUSION AND FUTURE WORK
Our work shows that the idea of using signal processing (FFT
and WT), and feature-engineering techniques (filtering) show
good promise in providing privacy-preserving systems for
which need is growing with the rise of cloud-based services
and mass surveillance.
Several research directions may spring from this work.
First, other filter designs may be able to improve performance
of the facial recognition further. Consecutive search methods
can be utilized to produce such filters. For FFT, in particular,
the fact that phase removal can provide great privacy protec-
tion already allows the feature engineering step to focus more
on recognition and computational performance.
For WT, our WT image reconstruction shows how the fil-
tered WT features can hide some specific parts of the face
effectively. Specifically, the filtered WT features appear to
remove edge-rich areas of the image, such as the eyes and
mouth. Therefore, it is interesting to study if this selective
hiding can be controlled by the filtering process, as it may
have an application in contextual privacy.
Acknowledgement
This material is based upon work supported in part by the
Brandeis Program of the Defense Advanced Research Project
Agency (DARPA) and Space and Naval Warfare System Cen-
ter Pacific (SSC Pacific) under Contract No. 66001-15-C-
4068.
9. REFERENCES
[1] Kevin W Bowyer, “Face recognition technology: secu-
rity versus privacy,” Technology and Society Magazine,
IEEE, vol. 23, no. 1, pp. 9–19, 2004.
[2] Anissa Bouzalmat, Jamal Kharroubi, and Arsalane
Zarghili, “Comparative study of pca, ica, lda using svm
classifier,” Journal of Emerging Technologies in Web
Intelligence, vol. 6, no. 1, pp. 64–68, 2014.
[3] Hagen Spies and Ian Ricketts, “Face recognition in
fourier space,” in Vision Interface, 2000, vol. 2000, pp.
38–44.
[4] Anissa Bouzalmat, Arsalane Zarghili, and Jamal Khar-
roubi, “Facial face recognition method using fourier
transform filters gabor and r lda,” IJCA Special Issue
on Intelligent Systems and Data Processing, pp. 18–24,
2011.
[5] Zhang Dehai, Ding Da, Li Jin, and Liu Qing, “A pca-
based face recognition method by applying fast fourier
7. transform in pre-processing,” in 3rd International Con-
ference on Multimedia Technology (ICMT-13). Atlantis
Press, 2013.
[6] Ahmed Shabaan Samra, Salah Gad El Taweel Gad Al-
lah, and Rehab Mahmoud Ibrahim, “Face recognition
using wavelet transform, fast fourier transform and dis-
crete cosine transform,” in Circuits and Systems, 2003
IEEE 46th Midwest Symposium on. IEEE, 2003, vol. 1,
pp. 272–275.
[7] Zekeriya Erkin, Martin Franz, Jorge Guajardo, Ste-
fan Katzenbeisser, Inald Lagendijk, and Tomas Toft,
“Privacy-preserving face recognition,” in Privacy En-
hancing Technologies. Springer, 2009, pp. 235–253.
[8] Ahmad-Reza Sadeghi, Thomas Schneider, and Immo
Wehrenberg, “Efficient privacy-preserving face recog-
nition,” in Information, Security and Cryptology–ICISC
2009, pp. 229–244. Springer, 2010.
[9] Tom AM Kevenaar, Geert Jan Schrijen, Michiel van der
Veen, Anton HM Akkermans, and Fei Zuo, “Face
recognition with renewable and privacy preserving bi-
nary templates,” in Automatic Identification Advanced
Technologies, 2005. Fourth IEEE Workshop on. IEEE,
2005, pp. 21–26.
[10] Kuang-Chih Lee, Jeffrey Ho, and David J Kriegman,
“Acquiring linear subspaces for face recognition under
variable lighting,” IEEE Transactions on pattern anal-
ysis and machine intelligence, vol. 27, no. 5, pp. 684–
698, 2005.
[11] Deng Cai, Xiaofei He, Yuxiao Hu, Jiawei Han, and
Thomas Huang, “Learning a spatially smooth subspace
for face recognition,” in 2007 IEEE Conference on
Computer Vision and Pattern Recognition. IEEE, 2007,
pp. 1–7.