Automatic attendance system has two stages: Face recognition and face detection. Though there are many efficient algorithms for frontal face detection and recognition when large pose comes into picture most of the developed algorithm fails to detect. Here in this presentation I used two state-of-the-art results to make an automatic attendance system.
2. OUTLINE
▪ Motivation▪ Motivation
▪ Objective
▪ System Requirements
▪ Design Details
▪ Tried methods
▪ Inspiration▪ Inspiration
▪ Main design
▪ Status so far
▪ Future work
3. MOTIVATION
▪ Taking attendance in large classes is:▪ Taking attendance in large classes is:
▪ Cumbersome
▪ Repetitive
▪ Consumes valuable class time
▪ What if we make an efficient face detection and recognition system for▪ What if we make an efficient face detection and recognition system for
this task?
4. OBJECTIVES
▪ Automatic user identification via face detection and recognition.▪ Automatic user identification via face detection and recognition.
▪ Develop and implement an efficient face detection and recognition
system.
▪ End-to-end face recognition system using deep learning.
5. DIFFICULTIES
▪ Large pose variation▪ Large pose variation
▪ Hidden faces & tiny faces
▪ Different illumination conditions, occlusions
6. SYSTEM REQUIREMENTS
▪ Hardware:▪ Hardware:
▪ A camera
▪ PC or Raspberry pi
▪ Software:
▪ Matlab 2013+
▪ Python 2.7▪ Python 2.7
▪ Lasagne API
8. GOING DEEP INTO FACE RECOGNITION
▪ Various methods are employed to recognize a person in wild.▪ Various methods are employed to recognize a person in wild.
▪ Comparing to traditional handcrafted features such as high dimensional
LBP, Active Appearance Model(AAM), Active Shape Model(ASM) or
Bayesian face, Gaussian face etc.; automatically learnt deep features
based on personal identity are more advantageous.
▪ In most deep learning based face recognition methods the inputs to the
deep model are aligned face images.deep model are aligned face images.
9. TRIED METHODS
▪ Tal Hassner, Shai Harel, Eran Paz, Roee Enbar, "Effective Face Frontalization▪ Tal Hassner, Shai Harel, Eran Paz, Roee Enbar, "Effective Face Frontalization
in Unconstrained Images”, CVPR-2015
10. TRIED METHODS
▪ Tal Hassner, Shai Harel, Eran Paz, Roee Enbar, "Effective Face Frontalization▪ Tal Hassner, Shai Harel, Eran Paz, Roee Enbar, "Effective Face Frontalization
in Unconstrained Images”, CVPR-2015
11. TRIED METHODS
▪ Zhu, Xiangyu, et al. "Face alignment across large poses: A 3d▪ Zhu, Xiangyu, et al. "Face alignment across large poses: A 3d
solution." CVPR-2016
12. TRIED METHODS
▪ Zhu, Xiangyu, et al. "Face alignment across large poses: A 3d▪ Zhu, Xiangyu, et al. "Face alignment across large poses: A 3d
solution." CVPR-2016
13. TRIED METHODS
▪ Zhu, Xiangyu, et al. "Face alignment across large poses: A 3d▪ Zhu, Xiangyu, et al. "Face alignment across large poses: A 3d
solution." CVPR-2016
14. TRIED METHODS
▪ I have tried to implement some more papers but they failed when we
are dealing with large pose.
▪ I have tried to implement some more papers but they failed when we
are dealing with large pose.
▪ Instead of AAM, 3D fitted model (3D frontalisation doesn’t show
significant improvements over simple 2D alignment*), we used Deep
learning techniques to recognize a face using only personal identity
clues.
* Banerjee, Sandipan, et al. "To Frontalize or Not To Frontalize: Do We Really Need Elaborate Pre-Processing to
Improve Face Recognition Performance?." arXiv preprint arXiv:1610.04823 (2016).
15. INSPIRATION
▪ Our work is inspired by some of the state-of-the-art papers.▪ Our work is inspired by some of the state-of-the-art papers.
▪ DeepFace: Closing the Gap to Human-Level Performance in Face Verification, CVPR-2014
▪ FaceNet: A Unified Embedding for Face Recognition and Clustering, CVPR-2015
▪ DeepID3: Face Recognition with Very Deep Neural Networks, CVPR-2015
▪ Supervised Transformer Network for Efficient Face Detection, ECCV-2016
▪ Towards End-to-End Face Recognition through Alignment Learning, arXiv-2017
▪ Spatial transformer networks. NIPS-2015
▪ Finding Tiny Faces. arXiv-2016
16. MAIN DESIGN
▪ The complete architecture has two stages▪ The complete architecture has two stages
▪ Face Detection
▪ Face Recognition
17.
18. ARCHITECTURE FOR DETECTION
▪ They have provided an in-depth analysis of image resolution, object
scale, and spatial context for the purposes of finding small faces.
▪ Still the detailed study of the paper is pending as I have found this
paper very recently. I will briefly describe their architecture in the next
slide
22. WHERE IT FAILS?
▪ For out of plane rotation this proposed method works fine but when 2D▪ For out of plane rotation this proposed method works fine but when 2D
rotation comes into picture then their method suffers from less
accuracy.
▪ Some of the failures are shown in the next slide
30. SPATIAL TRANSFORMER NETWORK
▪ According to the original DeepMind paper, the spatial transformer can▪ According to the original DeepMind paper, the spatial transformer can
be used to implement any parametrizable transformation including
translation, scaling, affine, projective.
▪ Suppose that for the ith target point pt
i = (xt
i ; yt
i ; 1) in the output image,
a grid generator generates its source coordinates (xs
i ; ys
i ; 1) in the input
image according to transformation parameters.
Projective transformation equation
33. SPATIAL TRANSFORMER NETWORK
▪ We use the bilinear kernel so that:▪ We use the bilinear kernel so that:
▪ So overall transformer model will be:
34. SPATIAL TRANSFORMER NETWORK
▪ We use the bilinear kernel so that:▪ We use the bilinear kernel so that:
▪ So overall transformer model will be:
This is equivalent to convolving a sampling kernel
k with the source image of H X W dimension
35. SPATIAL TRANSFORMER NETWORK
▪ We use the bilinear kernel so that:▪ We use the bilinear kernel so that:
▪ So overall transformer model will be:
This is equivalent to convolving a sampling kernel
k with the source image of H X W dimension
▪ All the blocks should be differentiable.
37. SPATIAL TRANSFORMER NETWORK
During the backward propagation, we need to calculate the gradient
of Vi with respect to each of the eight transformation parameters.
38. SPATIAL TRANSFORMER NETWORK
During the backward propagation, we need to calculate the gradient
of Vi with respect to each of the eight transformation parameters.
39. SPATIAL TRANSFORMER NETWORK
During the backward propagation, we need to calculate the gradient
of Vi with respect to each of the eight transformation parameters.
44. SPATIAL TRANSFORMER NETWORK
▪ The similarity transformation is defined here▪ The similarity transformation is defined here
in which α is the rotation angle, λ is the scaling factor, and t1; t2 are the horizontal and
vertical translation displacements respectively. Analogously, the gradients of Vi respected to α
and λ are shown below:
45. SPATIAL TRANSFORMER NETWORK
▪ The similarity transformation is defined here▪ The similarity transformation is defined here
in which α is the rotation angle, λ is the scaling factor, and t1; t2 are the horizontal and
vertical translation displacements respectively. Analogously, the gradients of Vi respected to α
and λ are shown below:
46. SPATIAL TRANSFORMER NETWORK
▪ The similarity transformation is defined here▪ The similarity transformation is defined here
in which α is the rotation angle, λ is the scaling factor, and t1; t2 are the horizontal and
vertical translation displacements respectively. Analogously, the gradients of Vi respected to α
and λ are shown below:
47. STATUS SO FAR
▪ STN is implemented and tested on Labeled Face in Wild (LFW) dataset.▪ STN is implemented and tested on Labeled Face in Wild (LFW) dataset.
▪ Out of 5423 classes, we took only 1000 classes because of the limitation in
computation.
▪ During training we did data augmentation with random 2D-Affine transformation
on face data to increase the training size.
▪ We had 15399 training images, 3501 testing images and 2100 validation images
during training.
▪ We introduced a CNN architecture to extract deep features from the
transformed face.
48. STATUS SO FAR
▪ Output of STN network▪ Output of STN network
49. STATUS SO FAR
Conv
Conv
Pool & Actv
Conv
Conv
Conv
Pool & Actv
Pool & Actv
Pool & Actv
Dense
Dense
Dense
Actv
STN Architecture
Conv
Pool & Actv
Pool & Actv
Dense
Dense
Dense
Actv
Actv
Recognition
Architecture
50. STATUS SO FAR
Conv
Conv
Pool & Actv
Conv
Conv
Conv
Pool & Actv
Pool & Actv
Pool & Actv
Dense
Dense
Dense
Actv
STN Architecture
Conv
Pool & Actv
Pool & Actv
Dense
Dense
Dense
Actv
Actv
Recognition
Architecture
51. FUTURE WORK
▪ Try to validate the architecture in real data (taken from classroom)▪ Try to validate the architecture in real data (taken from classroom)
▪ Without training a new CNN model, compare recognition accuracy with
the ImageNet winning pre-trained models.
▪ Adding 2D rotation invariance face detection with the recent model.