SlideShare a Scribd company logo
1 of 40
Download to read offline
STEREO MATCHING BY DEEP
LEARNING
Yu Huang
Yu.huang07@gmail.com
Sunnyvale, California
Outline
◦ Self-Supervised Learning for Stereo Matching with Self-Improving Ability
◦ Unsupervised Learning of Stereo Matching
◦ Pyramid Stereo Matching Network
◦ Learning for Disparity Estimation through Feature Constancy
◦ Deep Material-aware Cross-spectral Stereo Matching
◦ SegStereo: Exploiting Semantic Information for Disparity Estimation
◦ DispSegNet: Leveraging Semantics for End-to-End Learning of Disparity
Estimation from Stereo Imagery
◦ Group-wise Correlation Stereo Network
Self-Supervised Learning for Stereo
Matching with Self-Improving Ability
◦ A simple CNN architecture that is able to learn to compute dense disparity
maps directly from the stereo inputs.
◦ Training is performed in an e2e fashion without the need of ground-truth
disparity maps.
◦ The idea is to use image warping error (instead of disparity-map residuals) as
the loss function to drive the learning process, aiming to find a depth-map that
minimizes the warping error.
◦ The network is self-adaptive to different unseen imageries as well as to different
camera settings.
Self-Supervised Learning for Stereo
Matching with Self-Improving Ability
The self-supervised deep stereo matching network architecture. The network consists of five modules,
feature extraction, cross feature volume, 3D feature matching, soft-argmin, and warping loss evaluation.
Self-Supervised Learning for Stereo
Matching with Self-Improving Ability
Feature Volume Construction. The cross feature volume is
constructed by concatenating the learned features extracted
from the left and right images correspondingly. The blue
rectangle represents a feature map from the left image, the
stacked orange rectangle set represents traversed right
feature maps from 0 toward a preset disparity range D.
Different intensities correspond to different level of disparity.
Note that the left feature map is copied D + 1 times to match
the traversed right feature maps.
Self-Supervised Learning for Stereo
Matching with Self-Improving Ability
Diagram of our res-TDM module for 3D feature matching with learned regularization. It takes
cross feature volume as an input, and is followed by a series of 3D convolution and deconvolution.
The output of this module is a 3D disparity volume of dimension H × W × (D + 1).
Self-Supervised Learning for Stereo
Matching with Self-Improving Ability
KITTI-2012
Self-Supervised Learning for Stereo
Matching with Self-Improving Ability
KITTI-2015
Unsupervised Learning of Stereo
Matching
◦ A framework for learning stereo matching costs
without human supervision.
◦ This method updates network parameters in an
iterative manner.
◦ It starts with a randomly initialized network.
◦ Left-right check is adopted to guide the training.
◦ Suitable matching is picked and used as training
data in following iterations.
◦ The system finally converges to a stable state.
Unsupervised Learning of Stereo
Matching
The learning network takes stereo images as input, and generates a disparity map. The architecture is with
two branches where the first is for computing the cost-volume and the other is for jointly filtering the volume.
Unsupervised Learning of Stereo
Matching
Configuration of each component, cost-volume branch
(CVB), image feature branch (IFB) and joint filtering
branch (JF), of our network. Torch notations (channels,
kernel, stride) are used to define the convolutional layers.
Unsupervised Learning of Stereo
Matching
The iterative unsupervised training framework consists of four parts: disparity
prediction, confidence map estimation, training data selection and network training.
Unsupervised Learning of Stereo
Matching
KITTI 2015
Pyramid Stereo Matching Network
◦ Current architectures rely on patch-based Siamese networks, lacking the
means to exploit context info. for finding correspondence in ill- posed regions.
◦ To tackle this problem, PSM-Net, a pyramid stereo matching network,
consisting of two main modules: spatial pyramid pooling and 3D CNN.
◦ The spatial pyramid pooling module takes advantage of the capacity of
global context information by aggregating context in different scales and
locations to form a cost volume.
◦ The 3D CNN learns to regularize cost volume using stacked multiple hourglass
networks in conjunction with intermediate supervision.
◦ Codes of PSMNet: https://github.com/JiaRenChang/PSMNet.
Pyramid Stereo Matching Network
Architecture overview of
proposed PSMNet. The left and
right input stereo images are
fed to two weight-sharing
pipelines consisting of a CNN
for feature maps calculation, an
SPP module for feature
harvesting by concatenating
representations from sub-
regions with different sizes, and
a convolution layer for feature
fusion. The left and right image
features are then used to form
a 4D cost volume, which is fed
into a 3D CNN for cost volume
regularization and disparity
regression.
Pyramid Stereo Matching Network
Table 1. Parameters of the proposed PSMNet architecture. Construction of residual blocks are designated in brackets with the
number of stacked blocks. Downsampling is performed by conv0 1 and conv2 1 with stride of 2. The usage of batch
normalization and ReLU follows ResNet, with exception that PSMNet does not apply ReLU after summation.
Pyramid Stereo Matching Network
KITTI 2015
Pyramid Stereo Matching Network
KITTI 2012
Learning for Disparity Estimation
through Feature Constancy
◦ A network architecture to incorporate all steps: matching cost calculation,
matching cost aggregation, disparity calculation, and disparity refinement.
◦ The network consists of three parts.
◦ 1) calculates the multi-scale shared features.
◦ 2) performs matching cost calculation, matching cost aggregation and disparity
calculation to estimate the initial disparity using shared features.
◦ Note: The initial disparity and the shared features are used to calculate the feature
constancy that measures correctness of the correspondence between two input images.
◦ 3) The initial disparity and the feature constancy are then fed into a sub-network to refine
the initial disparity.
◦ Source code: http://github.com/leonzfa/iResNet.
Learning for Disparity Estimation
through Feature Constancy
The architecture. It incorporates all of the four steps for stereo matching into a single network. Note that, the
skip connections between encoder and decoder at different scales are omitted here for better visualization.
Learning for Disparity Estimation
through Feature Constancy
Learning for Disparity Estimation
through Feature Constancy
Comparison with other
state-of-the-art
methods on the KITTI
2015 dataset.
SegStereo: Exploiting Semantic
Information for Disparity
◦ Appropriate incorporation of semantic cues can greatly rectify prediction in
commonly-used disparity estimation frameworks.
◦ This method conducts semantic feature embedding and regularizes semantic
cues as the loss term to improve learning disparity.
◦ The unified model SegStereo employs semantic features from segmentation
and introduces semantic softmax loss, which helps improve the prediction
accuracy of disparity maps.
◦ The semantic cues work well in both unsupervised and supervised manners.
SegStereo: Exploiting Semantic
Information for Disparity
Extract intermediate features from
stereo input. Calculate the cost
volume via the correlation operator.
The left segmentation feature map is
aggregated into disparity branch as
semantic feature embedding. The
right segmentation feature mapis
warped to left view for per-pixel
semantic prediction with softmax
loss regularization. Both steps
incorporate semantic info. to
improve disparity estimation. The
SegStereo framework enables both
unsupervised and supervised
learning, using photometric loss or
disparity regression loss.
SegStereo: Exploiting Semantic
Information for Disparity
unsupervised
SegStereo
models
SegStereo: Exploiting Semantic
Information for Disparity
Supervised-learning
Deep Material-aware Cross-spectral
Stereo Matching
◦ Cross-spectral imaging provides benefits for recognition and detection tasks.
◦ Stereo matching also provides an opportunity to obtain depth without an
active projector source.
◦ Matching images from different spectral bands is challenging because of
large appearance variations.
◦ A deep learning framework to simultaneously transform images across spectral
bands and estimate disparity.
◦ A material-aware loss function is incorporated within the disparity prediction
network to handle regions with unreliable matching such as light sources, glass
windshields and glossy surfaces.
◦ No depth supervision is required.
Deep Material-aware Cross-spectral
Stereo Matching
The disparity prediction network (DPN) predicts left-right disparity for a RGB-NIR stereo input. The spectral
translation network (STN) converts the left RGB image into a pseudo-NIR image. The two networks are
trained simultaneously with reprojection error. The symmetric CNN in (b) prevents the STN learning disparity.
Deep Material-aware Cross-spectral
Stereo Matching
Intermediate results. (a) Left image. (b) material recognition from DeepLab. (c) RGB-to-NIR filters
corrected by exposure and white balancing. The R,G,B values represent the weights of R,G,B channels.
Deep Material-aware Cross-spectral
Stereo Matching
DispSegNet: Leveraging Semantics for End-to-End
Learning of Disparity Estimation from Stereo Imagery
◦ A CNN architecture improves the quality and accuracy of disparity estimation
with the help of semantic segmentation.
◦ A network structure in which these two tasks are highly coupled.
◦ The two-stage refinement process.
◦ Initial disparity estimates are refined with an embedding learned from the
semantic segmentation branch of the network.
◦ The model is trained using an unsupervised approach, in which images from one
of the stereo pair are warped and compared against images from the other.
◦ A single network is capable of outputting disparity estimates and semantic labels.
◦ Leveraging embedding learned from semantic segmentation improves the
performance of disparity estimation.
DispSegNet: Leveraging Semantics for End-to-End
Learning of Disparity Estimation from Stereo Imagery
Architecture. The pipeline consists of: (a) rectified input stereo images. (b) useful features are extracted from input stereo
images. (c) cost volume is formed by concatenating corresponding features from both sides. (d) initial disparity is estimated
from cost volume using 3D convolution. (e) initial disparity is further improved by fusing segment embedding. The PSP
(Pyramid scene parsing) incorporates more context info. for the semantic segmentation task. (f) estimated disparity and
semantic segmentation from both left and right views are generated from the model.
DispSegNet: Leveraging Semantics for End-to-End
Learning of Disparity Estimation from Stereo Imagery
disparity prediction
DispSegNet: Leveraging Semantics for End-to-End
Learning of Disparity Estimation from Stereo Imagery
3D semantic results
Group-wise Correlation Stereo Network
◦ This method tries to construct the cost volume by group-wise correlation.
◦ The left features and the right features are divided into groups along the
channel dimension, and correlation maps are computed among each group to
obtain multiple matching cost proposals, then packed into a cost volume.
◦ Group-wise correlation provides efficient representations for measuring feature
similarities and will not lose too much information like full correlation.
◦ It also preserves better performance when reducing parameters.
◦ The code is available at https://github.com/xy-guo/GwcNet.
Group-wise Correlation Stereo Network
The pipeline of the proposed group-wise correlation network. The whole network consists of four parts, unary
feature extraction, cost volume construction, 3D convolution aggregation, and disparity prediction. The cost
volume is divided into two parts, concatenation volume (Cat) and group-wise correlation volume (Gwc).
Concatenation volume is built by concatenating the compressed left and right features.
Group-wise Correlation Stereo Network
The structure of 3D aggregation network. The network consists of a pre-hourglass module (four
convolutions at the beginning) and three stacked 3D hourglass networks. Compared with PSMNet,
remove the shortcut connections between different hourglass modules and output modules, thus output
modules 0,1,2 can be removed during inference to save time. 1×1×1 3D convolutions are added to the
shortcut connections within hourglass modules.
Group-wise Correlation Stereo Network
Group-wise Correlation Stereo Network
Table: Structure details of the modules. H,
W represents the height and the width of
the input image. S1/2 denotes the
convolution stride. If not specified, each
3D convolution is with a batch
normalization and ReLU.
* denotes the ReLU is not included.
** denotes convolution only.
Thanks

More Related Content

What's hot

Human Pose Estimation by Deep Learning
Human Pose Estimation by Deep LearningHuman Pose Estimation by Deep Learning
Human Pose Estimation by Deep LearningWei Yang
 
Deep learning-for-pose-estimation-wyang-defense
Deep learning-for-pose-estimation-wyang-defenseDeep learning-for-pose-estimation-wyang-defense
Deep learning-for-pose-estimation-wyang-defenseWei Yang
 
ORB SLAM Proposal for NTU GPU Programming Course 2016
ORB SLAM Proposal for NTU GPU Programming Course 2016ORB SLAM Proposal for NTU GPU Programming Course 2016
ORB SLAM Proposal for NTU GPU Programming Course 2016Mindos Cheng
 
Pose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learningPose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learningYu Huang
 
Computer vision introduction
Computer vision  introduction Computer vision  introduction
Computer vision introduction Wael Badawy
 
Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)Yu Huang
 
ppt on machine learning to deep learning (1).pptx
ppt on machine learning to deep learning (1).pptxppt on machine learning to deep learning (1).pptx
ppt on machine learning to deep learning (1).pptxAnweshaGarima
 
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis taeseon ryu
 
GAN in medical imaging
GAN in medical imagingGAN in medical imaging
GAN in medical imagingCheng-Bin Jin
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learningTonmoy Bhagawati
 
Autoencoders
AutoencodersAutoencoders
AutoencodersCloudxLab
 
Human pose estimation with deep learning
Human pose estimation with deep learningHuman pose estimation with deep learning
Human pose estimation with deep learningengiyad95
 
Single Image Super Resolution Overview
Single Image Super Resolution OverviewSingle Image Super Resolution Overview
Single Image Super Resolution OverviewLEE HOSEONG
 
fusion of Camera and lidar for autonomous driving I
fusion of Camera and lidar for autonomous driving Ifusion of Camera and lidar for autonomous driving I
fusion of Camera and lidar for autonomous driving IYu Huang
 
Computer Vision – From traditional approaches to deep neural networks
Computer Vision – From traditional approaches to deep neural networksComputer Vision – From traditional approaches to deep neural networks
Computer Vision – From traditional approaches to deep neural networksinovex GmbH
 

What's hot (20)

Human Pose Estimation by Deep Learning
Human Pose Estimation by Deep LearningHuman Pose Estimation by Deep Learning
Human Pose Estimation by Deep Learning
 
Deep learning-for-pose-estimation-wyang-defense
Deep learning-for-pose-estimation-wyang-defenseDeep learning-for-pose-estimation-wyang-defense
Deep learning-for-pose-estimation-wyang-defense
 
ORB SLAM Proposal for NTU GPU Programming Course 2016
ORB SLAM Proposal for NTU GPU Programming Course 2016ORB SLAM Proposal for NTU GPU Programming Course 2016
ORB SLAM Proposal for NTU GPU Programming Course 2016
 
Pose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learningPose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learning
 
Computer vision introduction
Computer vision  introduction Computer vision  introduction
Computer vision introduction
 
Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)
 
Computer vision
Computer visionComputer vision
Computer vision
 
Computer Vision
Computer VisionComputer Vision
Computer Vision
 
=SLAM ppt.pdf
=SLAM ppt.pdf=SLAM ppt.pdf
=SLAM ppt.pdf
 
ppt on machine learning to deep learning (1).pptx
ppt on machine learning to deep learning (1).pptxppt on machine learning to deep learning (1).pptx
ppt on machine learning to deep learning (1).pptx
 
Autoencoder
AutoencoderAutoencoder
Autoencoder
 
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
 
GAN in medical imaging
GAN in medical imagingGAN in medical imaging
GAN in medical imaging
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learning
 
Autoencoders
AutoencodersAutoencoders
Autoencoders
 
Human pose estimation with deep learning
Human pose estimation with deep learningHuman pose estimation with deep learning
Human pose estimation with deep learning
 
Single Image Super Resolution Overview
Single Image Super Resolution OverviewSingle Image Super Resolution Overview
Single Image Super Resolution Overview
 
Image recognition
Image recognitionImage recognition
Image recognition
 
fusion of Camera and lidar for autonomous driving I
fusion of Camera and lidar for autonomous driving Ifusion of Camera and lidar for autonomous driving I
fusion of Camera and lidar for autonomous driving I
 
Computer Vision – From traditional approaches to deep neural networks
Computer Vision – From traditional approaches to deep neural networksComputer Vision – From traditional approaches to deep neural networks
Computer Vision – From traditional approaches to deep neural networks
 

Similar to Stereo Matching by Deep Learning

3-d interpretation from stereo images for autonomous driving
3-d interpretation from stereo images for autonomous driving3-d interpretation from stereo images for autonomous driving
3-d interpretation from stereo images for autonomous drivingYu Huang
 
A deep learning based stereo matching model for autonomous vehicle
A deep learning based stereo matching model for autonomous vehicleA deep learning based stereo matching model for autonomous vehicle
A deep learning based stereo matching model for autonomous vehicleIAESIJAI
 
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdfmokamojah
 
Paper 58 disparity-of_stereo_images_by_self_adaptive_algorithm
Paper 58 disparity-of_stereo_images_by_self_adaptive_algorithmPaper 58 disparity-of_stereo_images_by_self_adaptive_algorithm
Paper 58 disparity-of_stereo_images_by_self_adaptive_algorithmMDABDULMANNANMONDAL
 
Unsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object trackingUnsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object trackingYu Huang
 
A Review On Single Image Depth Prediction with Wavelet Decomposition
A Review On Single Image Depth Prediction with Wavelet DecompositionA Review On Single Image Depth Prediction with Wavelet Decomposition
A Review On Single Image Depth Prediction with Wavelet DecompositionIRJET Journal
 
IRJET- Automatic Data Collection from Forms using Optical Character Recognition
IRJET- Automatic Data Collection from Forms using Optical Character RecognitionIRJET- Automatic Data Collection from Forms using Optical Character Recognition
IRJET- Automatic Data Collection from Forms using Optical Character RecognitionIRJET Journal
 
Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling Yu Huang
 
Video Annotation for Visual Tracking via Selection and Refinement_tran.pptx
Video Annotation for Visual Tracking via Selection and Refinement_tran.pptxVideo Annotation for Visual Tracking via Selection and Refinement_tran.pptx
Video Annotation for Visual Tracking via Selection and Refinement_tran.pptxAlyaaMachi
 
mvitelli_ee367_final_report
mvitelli_ee367_final_reportmvitelli_ee367_final_report
mvitelli_ee367_final_reportMatt Vitelli
 
Locate, Size and Count: Accurately Resolving People in Dense Crowds via Detec...
Locate, Size and Count: Accurately Resolving People in Dense Crowds via Detec...Locate, Size and Count: Accurately Resolving People in Dense Crowds via Detec...
Locate, Size and Count: Accurately Resolving People in Dense Crowds via Detec...IRJET Journal
 
Decomposing image generation into layout priction and conditional synthesis
Decomposing image generation into layout priction and conditional synthesisDecomposing image generation into layout priction and conditional synthesis
Decomposing image generation into layout priction and conditional synthesisNaeem Shehzad
 
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNN
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNNAutomatic Detection of Window Regions in Indoor Point Clouds Using R-CNN
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNNZihao(Gerald) Zhang
 
Video Stitching using Improved RANSAC and SIFT
Video Stitching using Improved RANSAC and SIFTVideo Stitching using Improved RANSAC and SIFT
Video Stitching using Improved RANSAC and SIFTIRJET Journal
 
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...IRJET Journal
 
Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...
Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...
Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...CSCJournals
 
IMPROVEMENT IN IMAGE DENOISING OF HANDWRITTEN DIGITS USING AUTOENCODERS IN DE...
IMPROVEMENT IN IMAGE DENOISING OF HANDWRITTEN DIGITS USING AUTOENCODERS IN DE...IMPROVEMENT IN IMAGE DENOISING OF HANDWRITTEN DIGITS USING AUTOENCODERS IN DE...
IMPROVEMENT IN IMAGE DENOISING OF HANDWRITTEN DIGITS USING AUTOENCODERS IN DE...IRJET Journal
 

Similar to Stereo Matching by Deep Learning (20)

3-d interpretation from stereo images for autonomous driving
3-d interpretation from stereo images for autonomous driving3-d interpretation from stereo images for autonomous driving
3-d interpretation from stereo images for autonomous driving
 
A deep learning based stereo matching model for autonomous vehicle
A deep learning based stereo matching model for autonomous vehicleA deep learning based stereo matching model for autonomous vehicle
A deep learning based stereo matching model for autonomous vehicle
 
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
 
32
3232
32
 
Paper 58 disparity-of_stereo_images_by_self_adaptive_algorithm
Paper 58 disparity-of_stereo_images_by_self_adaptive_algorithmPaper 58 disparity-of_stereo_images_by_self_adaptive_algorithm
Paper 58 disparity-of_stereo_images_by_self_adaptive_algorithm
 
Unsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object trackingUnsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object tracking
 
A Review On Single Image Depth Prediction with Wavelet Decomposition
A Review On Single Image Depth Prediction with Wavelet DecompositionA Review On Single Image Depth Prediction with Wavelet Decomposition
A Review On Single Image Depth Prediction with Wavelet Decomposition
 
IRJET- Automatic Data Collection from Forms using Optical Character Recognition
IRJET- Automatic Data Collection from Forms using Optical Character RecognitionIRJET- Automatic Data Collection from Forms using Optical Character Recognition
IRJET- Automatic Data Collection from Forms using Optical Character Recognition
 
Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling
 
Video Annotation for Visual Tracking via Selection and Refinement_tran.pptx
Video Annotation for Visual Tracking via Selection and Refinement_tran.pptxVideo Annotation for Visual Tracking via Selection and Refinement_tran.pptx
Video Annotation for Visual Tracking via Selection and Refinement_tran.pptx
 
mvitelli_ee367_final_report
mvitelli_ee367_final_reportmvitelli_ee367_final_report
mvitelli_ee367_final_report
 
Ay33292297
Ay33292297Ay33292297
Ay33292297
 
Ay33292297
Ay33292297Ay33292297
Ay33292297
 
Locate, Size and Count: Accurately Resolving People in Dense Crowds via Detec...
Locate, Size and Count: Accurately Resolving People in Dense Crowds via Detec...Locate, Size and Count: Accurately Resolving People in Dense Crowds via Detec...
Locate, Size and Count: Accurately Resolving People in Dense Crowds via Detec...
 
Decomposing image generation into layout priction and conditional synthesis
Decomposing image generation into layout priction and conditional synthesisDecomposing image generation into layout priction and conditional synthesis
Decomposing image generation into layout priction and conditional synthesis
 
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNN
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNNAutomatic Detection of Window Regions in Indoor Point Clouds Using R-CNN
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNN
 
Video Stitching using Improved RANSAC and SIFT
Video Stitching using Improved RANSAC and SIFTVideo Stitching using Improved RANSAC and SIFT
Video Stitching using Improved RANSAC and SIFT
 
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
 
Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...
Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...
Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...
 
IMPROVEMENT IN IMAGE DENOISING OF HANDWRITTEN DIGITS USING AUTOENCODERS IN DE...
IMPROVEMENT IN IMAGE DENOISING OF HANDWRITTEN DIGITS USING AUTOENCODERS IN DE...IMPROVEMENT IN IMAGE DENOISING OF HANDWRITTEN DIGITS USING AUTOENCODERS IN DE...
IMPROVEMENT IN IMAGE DENOISING OF HANDWRITTEN DIGITS USING AUTOENCODERS IN DE...
 

More from Yu Huang

Application of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingApplication of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingYu Huang
 
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...Yu Huang
 
Data Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingData Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingYu Huang
 
Techniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingTechniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingYu Huang
 
BEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationBEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationYu Huang
 
BEV Object Detection and Prediction
BEV Object Detection and PredictionBEV Object Detection and Prediction
BEV Object Detection and PredictionYu Huang
 
Fisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIFisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIYu Huang
 
Fisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VFisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VYu Huang
 
Prediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduPrediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduYu Huang
 
Cruise AI under the Hood
Cruise AI under the HoodCruise AI under the Hood
Cruise AI under the HoodYu Huang
 
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)Yu Huang
 
Scenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingScenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingYu Huang
 
How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?Yu Huang
 
Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingYu Huang
 
Simulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgSimulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgYu Huang
 
Multi sensor calibration by deep learning
Multi sensor calibration by deep learningMulti sensor calibration by deep learning
Multi sensor calibration by deep learningYu Huang
 
Prediction and planning for self driving at waymo
Prediction and planning for self driving at waymoPrediction and planning for self driving at waymo
Prediction and planning for self driving at waymoYu Huang
 
Jointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningJointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningYu Huang
 
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingYu Huang
 
Open Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planningOpen Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planningYu Huang
 

More from Yu Huang (20)

Application of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingApplication of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous Driving
 
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
 
Data Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingData Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous Driving
 
Techniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingTechniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous Driving
 
BEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationBEV Joint Detection and Segmentation
BEV Joint Detection and Segmentation
 
BEV Object Detection and Prediction
BEV Object Detection and PredictionBEV Object Detection and Prediction
BEV Object Detection and Prediction
 
Fisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIFisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VI
 
Fisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VFisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving V
 
Prediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduPrediction,Planninng & Control at Baidu
Prediction,Planninng & Control at Baidu
 
Cruise AI under the Hood
Cruise AI under the HoodCruise AI under the Hood
Cruise AI under the Hood
 
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
 
Scenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingScenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous Driving
 
How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?
 
Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous Driving
 
Simulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgSimulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atg
 
Multi sensor calibration by deep learning
Multi sensor calibration by deep learningMulti sensor calibration by deep learning
Multi sensor calibration by deep learning
 
Prediction and planning for self driving at waymo
Prediction and planning for self driving at waymoPrediction and planning for self driving at waymo
Prediction and planning for self driving at waymo
 
Jointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningJointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planning
 
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous driving
 
Open Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planningOpen Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planning
 

Recently uploaded

AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 

Recently uploaded (20)

AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 

Stereo Matching by Deep Learning

  • 1. STEREO MATCHING BY DEEP LEARNING Yu Huang Yu.huang07@gmail.com Sunnyvale, California
  • 2. Outline ◦ Self-Supervised Learning for Stereo Matching with Self-Improving Ability ◦ Unsupervised Learning of Stereo Matching ◦ Pyramid Stereo Matching Network ◦ Learning for Disparity Estimation through Feature Constancy ◦ Deep Material-aware Cross-spectral Stereo Matching ◦ SegStereo: Exploiting Semantic Information for Disparity Estimation ◦ DispSegNet: Leveraging Semantics for End-to-End Learning of Disparity Estimation from Stereo Imagery ◦ Group-wise Correlation Stereo Network
  • 3. Self-Supervised Learning for Stereo Matching with Self-Improving Ability ◦ A simple CNN architecture that is able to learn to compute dense disparity maps directly from the stereo inputs. ◦ Training is performed in an e2e fashion without the need of ground-truth disparity maps. ◦ The idea is to use image warping error (instead of disparity-map residuals) as the loss function to drive the learning process, aiming to find a depth-map that minimizes the warping error. ◦ The network is self-adaptive to different unseen imageries as well as to different camera settings.
  • 4. Self-Supervised Learning for Stereo Matching with Self-Improving Ability The self-supervised deep stereo matching network architecture. The network consists of five modules, feature extraction, cross feature volume, 3D feature matching, soft-argmin, and warping loss evaluation.
  • 5. Self-Supervised Learning for Stereo Matching with Self-Improving Ability Feature Volume Construction. The cross feature volume is constructed by concatenating the learned features extracted from the left and right images correspondingly. The blue rectangle represents a feature map from the left image, the stacked orange rectangle set represents traversed right feature maps from 0 toward a preset disparity range D. Different intensities correspond to different level of disparity. Note that the left feature map is copied D + 1 times to match the traversed right feature maps.
  • 6. Self-Supervised Learning for Stereo Matching with Self-Improving Ability Diagram of our res-TDM module for 3D feature matching with learned regularization. It takes cross feature volume as an input, and is followed by a series of 3D convolution and deconvolution. The output of this module is a 3D disparity volume of dimension H × W × (D + 1).
  • 7. Self-Supervised Learning for Stereo Matching with Self-Improving Ability KITTI-2012
  • 8. Self-Supervised Learning for Stereo Matching with Self-Improving Ability KITTI-2015
  • 9. Unsupervised Learning of Stereo Matching ◦ A framework for learning stereo matching costs without human supervision. ◦ This method updates network parameters in an iterative manner. ◦ It starts with a randomly initialized network. ◦ Left-right check is adopted to guide the training. ◦ Suitable matching is picked and used as training data in following iterations. ◦ The system finally converges to a stable state.
  • 10. Unsupervised Learning of Stereo Matching The learning network takes stereo images as input, and generates a disparity map. The architecture is with two branches where the first is for computing the cost-volume and the other is for jointly filtering the volume.
  • 11. Unsupervised Learning of Stereo Matching Configuration of each component, cost-volume branch (CVB), image feature branch (IFB) and joint filtering branch (JF), of our network. Torch notations (channels, kernel, stride) are used to define the convolutional layers.
  • 12. Unsupervised Learning of Stereo Matching The iterative unsupervised training framework consists of four parts: disparity prediction, confidence map estimation, training data selection and network training.
  • 13. Unsupervised Learning of Stereo Matching KITTI 2015
  • 14. Pyramid Stereo Matching Network ◦ Current architectures rely on patch-based Siamese networks, lacking the means to exploit context info. for finding correspondence in ill- posed regions. ◦ To tackle this problem, PSM-Net, a pyramid stereo matching network, consisting of two main modules: spatial pyramid pooling and 3D CNN. ◦ The spatial pyramid pooling module takes advantage of the capacity of global context information by aggregating context in different scales and locations to form a cost volume. ◦ The 3D CNN learns to regularize cost volume using stacked multiple hourglass networks in conjunction with intermediate supervision. ◦ Codes of PSMNet: https://github.com/JiaRenChang/PSMNet.
  • 15. Pyramid Stereo Matching Network Architecture overview of proposed PSMNet. The left and right input stereo images are fed to two weight-sharing pipelines consisting of a CNN for feature maps calculation, an SPP module for feature harvesting by concatenating representations from sub- regions with different sizes, and a convolution layer for feature fusion. The left and right image features are then used to form a 4D cost volume, which is fed into a 3D CNN for cost volume regularization and disparity regression.
  • 16. Pyramid Stereo Matching Network Table 1. Parameters of the proposed PSMNet architecture. Construction of residual blocks are designated in brackets with the number of stacked blocks. Downsampling is performed by conv0 1 and conv2 1 with stride of 2. The usage of batch normalization and ReLU follows ResNet, with exception that PSMNet does not apply ReLU after summation.
  • 17. Pyramid Stereo Matching Network KITTI 2015
  • 18. Pyramid Stereo Matching Network KITTI 2012
  • 19. Learning for Disparity Estimation through Feature Constancy ◦ A network architecture to incorporate all steps: matching cost calculation, matching cost aggregation, disparity calculation, and disparity refinement. ◦ The network consists of three parts. ◦ 1) calculates the multi-scale shared features. ◦ 2) performs matching cost calculation, matching cost aggregation and disparity calculation to estimate the initial disparity using shared features. ◦ Note: The initial disparity and the shared features are used to calculate the feature constancy that measures correctness of the correspondence between two input images. ◦ 3) The initial disparity and the feature constancy are then fed into a sub-network to refine the initial disparity. ◦ Source code: http://github.com/leonzfa/iResNet.
  • 20. Learning for Disparity Estimation through Feature Constancy The architecture. It incorporates all of the four steps for stereo matching into a single network. Note that, the skip connections between encoder and decoder at different scales are omitted here for better visualization.
  • 21. Learning for Disparity Estimation through Feature Constancy
  • 22. Learning for Disparity Estimation through Feature Constancy Comparison with other state-of-the-art methods on the KITTI 2015 dataset.
  • 23. SegStereo: Exploiting Semantic Information for Disparity ◦ Appropriate incorporation of semantic cues can greatly rectify prediction in commonly-used disparity estimation frameworks. ◦ This method conducts semantic feature embedding and regularizes semantic cues as the loss term to improve learning disparity. ◦ The unified model SegStereo employs semantic features from segmentation and introduces semantic softmax loss, which helps improve the prediction accuracy of disparity maps. ◦ The semantic cues work well in both unsupervised and supervised manners.
  • 24. SegStereo: Exploiting Semantic Information for Disparity Extract intermediate features from stereo input. Calculate the cost volume via the correlation operator. The left segmentation feature map is aggregated into disparity branch as semantic feature embedding. The right segmentation feature mapis warped to left view for per-pixel semantic prediction with softmax loss regularization. Both steps incorporate semantic info. to improve disparity estimation. The SegStereo framework enables both unsupervised and supervised learning, using photometric loss or disparity regression loss.
  • 25. SegStereo: Exploiting Semantic Information for Disparity unsupervised SegStereo models
  • 26. SegStereo: Exploiting Semantic Information for Disparity Supervised-learning
  • 27. Deep Material-aware Cross-spectral Stereo Matching ◦ Cross-spectral imaging provides benefits for recognition and detection tasks. ◦ Stereo matching also provides an opportunity to obtain depth without an active projector source. ◦ Matching images from different spectral bands is challenging because of large appearance variations. ◦ A deep learning framework to simultaneously transform images across spectral bands and estimate disparity. ◦ A material-aware loss function is incorporated within the disparity prediction network to handle regions with unreliable matching such as light sources, glass windshields and glossy surfaces. ◦ No depth supervision is required.
  • 28. Deep Material-aware Cross-spectral Stereo Matching The disparity prediction network (DPN) predicts left-right disparity for a RGB-NIR stereo input. The spectral translation network (STN) converts the left RGB image into a pseudo-NIR image. The two networks are trained simultaneously with reprojection error. The symmetric CNN in (b) prevents the STN learning disparity.
  • 29. Deep Material-aware Cross-spectral Stereo Matching Intermediate results. (a) Left image. (b) material recognition from DeepLab. (c) RGB-to-NIR filters corrected by exposure and white balancing. The R,G,B values represent the weights of R,G,B channels.
  • 31. DispSegNet: Leveraging Semantics for End-to-End Learning of Disparity Estimation from Stereo Imagery ◦ A CNN architecture improves the quality and accuracy of disparity estimation with the help of semantic segmentation. ◦ A network structure in which these two tasks are highly coupled. ◦ The two-stage refinement process. ◦ Initial disparity estimates are refined with an embedding learned from the semantic segmentation branch of the network. ◦ The model is trained using an unsupervised approach, in which images from one of the stereo pair are warped and compared against images from the other. ◦ A single network is capable of outputting disparity estimates and semantic labels. ◦ Leveraging embedding learned from semantic segmentation improves the performance of disparity estimation.
  • 32. DispSegNet: Leveraging Semantics for End-to-End Learning of Disparity Estimation from Stereo Imagery Architecture. The pipeline consists of: (a) rectified input stereo images. (b) useful features are extracted from input stereo images. (c) cost volume is formed by concatenating corresponding features from both sides. (d) initial disparity is estimated from cost volume using 3D convolution. (e) initial disparity is further improved by fusing segment embedding. The PSP (Pyramid scene parsing) incorporates more context info. for the semantic segmentation task. (f) estimated disparity and semantic segmentation from both left and right views are generated from the model.
  • 33. DispSegNet: Leveraging Semantics for End-to-End Learning of Disparity Estimation from Stereo Imagery disparity prediction
  • 34. DispSegNet: Leveraging Semantics for End-to-End Learning of Disparity Estimation from Stereo Imagery 3D semantic results
  • 35. Group-wise Correlation Stereo Network ◦ This method tries to construct the cost volume by group-wise correlation. ◦ The left features and the right features are divided into groups along the channel dimension, and correlation maps are computed among each group to obtain multiple matching cost proposals, then packed into a cost volume. ◦ Group-wise correlation provides efficient representations for measuring feature similarities and will not lose too much information like full correlation. ◦ It also preserves better performance when reducing parameters. ◦ The code is available at https://github.com/xy-guo/GwcNet.
  • 36. Group-wise Correlation Stereo Network The pipeline of the proposed group-wise correlation network. The whole network consists of four parts, unary feature extraction, cost volume construction, 3D convolution aggregation, and disparity prediction. The cost volume is divided into two parts, concatenation volume (Cat) and group-wise correlation volume (Gwc). Concatenation volume is built by concatenating the compressed left and right features.
  • 37. Group-wise Correlation Stereo Network The structure of 3D aggregation network. The network consists of a pre-hourglass module (four convolutions at the beginning) and three stacked 3D hourglass networks. Compared with PSMNet, remove the shortcut connections between different hourglass modules and output modules, thus output modules 0,1,2 can be removed during inference to save time. 1×1×1 3D convolutions are added to the shortcut connections within hourglass modules.
  • 39. Group-wise Correlation Stereo Network Table: Structure details of the modules. H, W represents the height and the width of the input image. S1/2 denotes the convolution stride. If not specified, each 3D convolution is with a batch normalization and ReLU. * denotes the ReLU is not included. ** denotes convolution only.