(Research Note) Delving deeper into convolutional neural networks for camera relocalization

Euler angle and gimbal lock
2017/12/13 Delving deeper into convolutional neural networks for camera relocalization 1

Loss of a degree of freedom with Euler angles
When 𝛽 =
𝜋
2
then cos
𝜋
2
= 0 and sin
𝜋
2
= 1

Loss of a degree of freedom with Euler angles

Resolve gimbal lock (Loss of a degree of freedom )
1. Change 𝛽
2. Use different orientation representation
=> quaternion
Rotation don’t commute
𝑅 𝑥 𝑅 𝑦 ≠ 𝑅 𝑦 𝑅 𝑥

Quaternion (四元數)
The history
Complex number
https://www.youtube.com/watch?v=mHVwd8gYLnI&t=2s
Extend Complex number
What is 𝑏𝑐 𝑖𝑗 ?
How to define 𝑖𝑗 ?

Quaternion (四元數)
Forget about 𝑖𝑗, how about define another one 𝑘 ?
https://www.youtube.com/watch?v=mHVwd8gYLnI&t=2s
𝑖
𝑗 𝑘

Double cover of quaternion
There are two distinct quaternions for each distinct orientation frame in 3D space.
The belt trick reflects this double-valued relationship, distinguishing a one-circuit 360-degree rotation
from the equivalent two-circuit 720-degree rotation.*
When applying regression on similar image,
we may get distinct quaternions.
* Andrew J. Hanson (6 February 2006). Visualizing Quaternions. Elsevier. pp. 114–. ISBN 978-0-08-047477-9.

National Chung Cheng University, Taiwan
Robot Vision Laboratory
2017/12/03
Jacky Liu
(Research Note)
Delving deeper into convolutional neural
networks for camera relocalization

About this work
Delving deeper into convolutional neural networks
for camera relocalization
Wu, Jian1 , Ma, Liwei2 , Hu, Xiaolin1
ICRA2017 - IEEE International Conference on Robotics and Automation
1. Tsinghua National Laboratory for Information Science and Technology (TNList), De- partm
ent of Computer Science and Technology, Tsinghua Univer- sity, 100084, Beijing, China
2. Intel Labs China, Intel Corporation, 100090, Beijing, China liwei.ma@intel.com

Contributions
1. Good rotation representation that solve
the double cover problem of quaternion
(which used by PoseNet)
=> Euler6
2. Camera poses in training set are
always very sparse in the whole pose
space.
=> pose synthesis
3. Regressing orientation & translation
together might not be optimal
=> BranchNet

Related work
Camera relocalization
Keypoints
SIFT ORB SCoRe
Keyframes
G.
Klein2008
A. P.
Gee2012
However, these methods only provide a coarse estimation to the
camera pose because of the sparsity of poses in training set.
Camera relocalization Multi-task CNNs

Related work
Camera relocalization - CNN
PoseNet (keyframes-based approach)
• Encodes the key frames in training set into the parameters of models.
SE3-Net
• Point cloud data limits this algorithm to RGB-D
• The number of predicted objects must be specified in training

Related work
Multi-task CNNs
TCDCN
1. Facial landmark detection
2. Appearance attribute and expression
HyperFace
1. Faces detection
2. Localizaing landmarks
3. Head pose
4. Gender
• Sharing lower layer for low level
common knowledge
• Separate higher layer for specific
predictions
R-CNN
1. Human pose estimation
2. Action detection
MCNNs
1. Attribute relationships
2. Attribute classifiers

Related work
• Sharing lower layer for low level
common knowledge
• Separate higher layer for specific
predictions
Input
Task1
Task2

Method
Summary
A. Orientation Representation
B. Pose Synthesis
C. Mutli-task CNN for Camera Relocalization

Method - Orientation Representation
Predict
Q = [0,1,0,0]
Ground truth
Q’ = [0,-1,0,0]
translation orientation
orientation
Even if we got the right orientation,
we still have large error
Quoternion Euler6

Pose Synthesis
Overfitting on sparse trajectory

2017/12/13
Delving deeper into convolutional neural ne
tworks for camera relocalization
19
How to resolve overfitting?
(Hint: 2 methods)

Pose Synthesis
Overfitting on sparse trajectory

Method
Mutli-task CNN for Camera Relocalization
To quantitatively understand relationship between orientation and translation
Translation
Rotation
6𝐷𝑜𝐹 = |𝑋, 𝑌, 𝑍, 𝜙, 𝜃, 𝜓|
Intra group correlations
• Orientation：0.391
• Translation：0.293
(self-correlations are not
involved)
Inter group correlations
• 0.256

Method
Mutli-task CNN for Camera Relocalization
Learn from statictic
• In the extreme case, regressing orientation
and translation separately by two individual
networks may also give better results.
High computation cost of individual network
• But regressing orientation and translation
individually significantly increases the
computing cost.
Balance - branching
translation
orientation
translation
orientation

Method
Summary
A. Orientation Representation
B. Pose Synthesis
C. Mutli-task CNN for Camera Relocalization

Experiment
Dataset: 7Scenes
• Each sequence (seq-XX.zip) consists of 500-1000 frames
• RGBD: 640x480 => 343x256
• Initial learning rate 10−5
(dropped by 90% every 10000 iter.)
• End iteration at 45000
Hardware
• 2 Nvidia Titan X GPU

Network design

Inception network (googlenet)

Network design

PoseNet / BranchNet

Euler6

Data augmentation

Branch

Pretrain
Surprisingly pretain on ImageNet increase error

Did FCN helps?

Efficiency of the BranchNet
Storing weights took 46 MB for BranchNet-Euler6.
Branching networks slowed down the forward speed from 5ms to 6ms per
frame on a NVIDIA Titan X GPU.
BranchNet-Euler6 in the GPU of an Intel NUC mobile platform (Intel CoreTM
i5-6260U) with clCaffe [24], and reached a speed of 43 fps, which meets the
real-time requirement of many robotic applications.

Conclusion
CNN-based camera relocalization
1. A new orientation representation Euler6.
2. The pose synthesis for data augmentation.
3. The BranchNet for multi-task regression.
Experiments showed that all of the above techniques improved the
relocalization accuracy, and
they together reduced the error of previous methods by a significant margin.

Conclusion
• Work well on monocular image => RGBD => SCoRe Forests [2] still
perform better
• They attempted to utilize the depth information by simply add the depth
image as the fourth channel to the original input which has RGB channels
but did not obtain much better results than our current results.
• How to utilize the depth information to improve the performance of CNN
remains to be an
open problem.

Recap
1. Euler => Quoternion => Euler6
2. Correlation analysis => important for multi-task CNN
3. Separate network / Branching => efficiency
4. Data augmentation (pose synthesis)
5. Do we need FC (or other layer)?
6. Did pretrain data set always help?

(Research Note) Delving deeper into convolutional neural networks for camera relocalization

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à (Research Note) Delving deeper into convolutional neural networks for camera relocalization

Similaire à (Research Note) Delving deeper into convolutional neural networks for camera relocalization (20)

Dernier

Dernier (20)

(Research Note) Delving deeper into convolutional neural networks for camera relocalization