https://telecombcn-dl.github.io/2018-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
3. bit.ly/DLCV2018
#DLUPCIntroduction
3D Reconstruction is the process of obtaining the geometric properties of a scene by processing and
combining visual cues from a set of views.
● Surface
● Camera poses
● Albedo (percentage of reflected
radiation or base color)
● Illumination
● ...
Multi-view Stereo: A tutorial. - Carlos Hernández, Google Inc.
4. bit.ly/DLCV2018
#DLUPCMotivation
There’s a bunch of applications which may benefit from 3D reconstruction algorithms.
architecture
autonomous
driving
robotics
3D
reconstruction
3D
printing
medical
...
7. bit.ly/DLCV2018
#DLUPCClassical methods: Limitations
When do they fail?
● Not enough images with enough overlap.
● Featureless or reflecting surfaces.
● Pure rotations.
● Repeated structures.
● Thin structures.
● Non-Lambertian surfaces.
Deep Learning:
● More descriptive, robust
and problem specific
image representations.
● Encoding of prior
knowledge.
8. bit.ly/DLCV2018
#DLUPCDeep learning approaches
Goal: Estimate an irregular surface and its properties from a set of RGB images.
Irregular surfaces do not lie on an Euclidean Space and, in principle, we cannot use the standard
convolution to model them.
11. bit.ly/DLCV2018
#DLUPCVolumetric grids: 3D-R2N2
3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction [ECCV 2016]
Reconstruction quality with respect
the texture strength, averaging
results from 20, 30 and 40 views.
Reconstruction quality with respect
the number of views, averaging on
all texture strengths.
20 views
MVS
3D-R2N2
30 views 40 views
12. bit.ly/DLCV2018
#DLUPCVolumetric grids: 3D-R2N2
3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction [ECCV 2016]
Limitations
● Inefficient use of the representation space.
● Limited resolution 323
due to memory constraints (cubic growth).
● Modeling views as sequences instead of sets.
● Requires 3D supervision.
● Requires post-processing.
15. bit.ly/DLCV2018
#DLUPCPoint clouds: PointOutNet
A Point Set Generation Network for 3D Object Reconstruction from a Single Image [CVPR 2017]
Data terms (not both at the same time):
● Chamfer distance
● Earth Mover’s distance
Statistical term (not both at the same time):
● Minimum-of-N (MoN)
● Conditional VAE
17. bit.ly/DLCV2018
#DLUPCPoint clouds: PointOutNet
A Point Set Generation Network for 3D Object Reconstruction from a Single Image [CVPR 2017]
Limitations
● Lack of details.
● Single view.
● Poor performance on unseen categories.
● Struggling with compositionality.
● Requires post-processing.
19. bit.ly/DLCV2018
#DLUPCMeshes: Pixel2Mesh
Perceptual Feature Pooling
Uses vertex position ci-1
and known camera
pose to project to feature space and estimate
features using bilinear interpolation.
Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images [arXiv 2018]