SlideShare une entreprise Scribd logo
1  sur  23
Télécharger pour lire hors ligne
3D Interpretation from Stereo Images
for Autonomous Driving
Yu Huang
Yu.huang07@gmail.com
Sunnyvale, California
Outline
• Object-Centric Stereo Matching for 3D Object Detection
• Triangulation Learning Network: from Monocular to Stereo 3D Object Detection
• Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object
Detection for Autonomous Driving
• Stereo R-CNN based 3D Object Detection for Autonomous Driving
• 3d object proposals for accurate object class detection
Object-Centric Stereo Matching for 3D
Object Detection
• The current SoA for stereo 3D object detection takes the existing PSMNet stereo matching
network, with no modifications, and converts the estimated disparities into a 3D point cloud,
and feeds this point cloud into a LiDAR-based 3D object detector.
• The issue with existing stereo matching networks is that they are designed for disparity
estimation, not 3D object detection; the shape and accuracy of object point clouds are not
the focus.
• Stereo matching networks commonly suffer from inaccurate depth estimates at object
boundaries, which this method defines as streaking, because BG and FG points are jointly
estimated.
• Existing networks also penalize disparity instead of the estimated position of object point
clouds in their loss functions.
• Here it proposes a 2D box association and object-centric stereo matching method that only
estimates the disparities of the objects of interest to address these two issues.
Object-Centric Stereo Matching for 3D
Object Detection
First, a 2D detector generates 2D boxes in Il and Ir. Next, a box association algorithm matches object detections
across both images. Each matched detection pair is passed into the object-centric stereo network, which jointly
produces a disparity map and instance segmentation mask for each object. Together, these form a disparity
map containing only the objects of interest. Lastly, the disparity map is transformed into a point cloud that can
be used by any LiDAR-based 3D object detection network to predict the 3D bounding boxes.
Object-Centric Stereo Matching for 3D
Object Detection
Qualitative results on KITTI. Ground truth and predictions are in red and green,
respectively. Colored points are predicted by our stereo matching network
while LiDAR points are shown in black for visualization purposes only.
Triangulation Learning Network: from
Monocular to Stereo 3D Object Detection
• For 3D object detection from stereo images, the key challenge is how to effectively utilize
stereo information.
• Different from previous methods using pixel-level depth maps, this method employs 3D
anchors to explicitly construct object-level correspondences between the ROI in stereo
images, from which DNN learns to detect and triangulate the targeted object in 3D space.
• It introduces a cost-efficient channel reweighting strategy that enhances representational
features and weakens noisy signals to facilitate the learning process.
• All of these are flexibly integrated into a solid baseline detector that uses monocular images.
• It is demonstrated that both the monocular baseline and the stereo triangulation learning
network outperform the prior SoA in 3D object detection and localization on KITTI dataset.
Triangulation Learning Network: from
Monocular to Stereo 3D Object Detection
Overview of the 3D detection pipeline. The baseline monocular network is indicated with
blue background, and can be easily extended to stereo inputs by duplicating the baseline
and further integrating with the TLNet (Triangulation Learning Network).
Triangulation Learning Network: from
Monocular to Stereo 3D Object Detection
• The baseline network taking a mono image as the input is composed of a backbone and 3
subsequent modules, i.e. front view anchor generation, 3D box proposal and refinement.
• The three-stage pipeline progressively reduces the searching space by selecting confident
anchors, which highly reduces computational complexity.
• The stereo 3D detection is performed by integrating a triangulation learning network (TL-
Net) into the baseline model.
• Triangulation is known as localizing 3D points from multi- view images in the classical
geometry fields, while this objective is to localize a 3D object and estimates its size and
orientation from stereo images.
• To achieve this, introduce an anchor triangulation scheme, in which the NN uses 3D
anchors as reference to triangulate the targets.
Triangulation Learning Network: from
Monocular to Stereo 3D Object Detection
Front view anchor generation. Potential anchors are of high objectness in the front view. Only
the potential anchors are fed into RPN to reduce searching space and save computational cost.
Triangulation Learning Network: from
Monocular to Stereo 3D Object Detection
Anchor triangulation. By projecting the 3D
anchor box to stereo images, obtain a pair of
RoIs. The left RoI establishes a geometric
correspondence with the right one via the
anchor box. The nearby target is present in
both RoIs with slightly positional differences.
The TLNet takes the RoI pair as input and
utilizes the 3D anchor as reference to localize
the targeted object.
Triangulation Learning Network: from
Monocular to Stereo 3D Object Detection
The TLNet takes as input a pair of left-
right RoI features Fl and Fr with Croi
channels and size Hroi ×Wroi, which are
obtained using RoIAlign by projecting
the same 3D anchor to the left and right
frames. To utilize the left-right coherence
scores to reweight each channel. The
reweighted features are fused using
element-wise addition and passed to
task-specific fully-connected layers to
predict the objectness confidence and
3D bounding box offsets, i.e., the 3D
geometric variance between the anchor
and target.
Triangulation Learning Network: from
Monocular to Stereo 3D Object Detection
Orange bounding boxes are detection results, while the green boxes are ground truths. For the main method, also
visualize the projected 3D bounding boxes in image, i.e., the first and forth rows. The lidar point clouds are
visualized for reference but not used in both training and evaluation. It is shown that the triangulation learning
method can reduce missed detections and improve the performance of depth prediction at distant regions.
Pseudo-LiDAR from Visual Depth Estimation: Bridging the
Gap in 3D Object Detection for Autonomous Driving
• Approaches based on cheaper monocular or stereo imagery data have, until now, resulted in
drastically lower accuracies — a gap that is commonly attributed to poor image-based
depth estimation.
• However, it is not the quality of the data but its representation that accounts for the majority
of the difference.
• Taking the inner workings of CNNs into consideration, convert image-based depth maps to
pseudo- LiDAR representations — essentially mimicking the LiDAR signal.
• With this representation, apply different existing LiDAR-based detection algorithms.
• On the popular KITTI benchmark, this approach achieves impressive improvements over the
existing state-of-the-art in image-based performance — raising the detection accuracy of
objects within the 30m range from the previous state-of-the-art of 22% to an
unprecedented 74%.
Pseudo-LiDAR from Visual Depth Estimation: Bridging the
Gap in 3D Object Detection for Autonomous Driving
The pipeline for image-based 3D object detection. Given stereo or monocular images, first predict the depth map,
followed by back-projecting it into a 3D point cloud in the LiDAR coordinate system. Refer this representation as pseudo-
LiDAR, and process it exactly like LiDAR — any LiDAR-based detection algorithms can be applied.
Pseudo-LiDAR from Visual Depth Estimation: Bridging the
Gap in 3D Object Detection for Autonomous Driving
Apply a single 2D convolution with a uniform kernel to the frontal view depth map (top-left). The resulting depth
map (top-right), after back-projected into pseudo- LiDAR and displayed from the bird’s-eye view (bottom- right),
reveals a large depth distortion in comparison to the original pseudo-LiDAR representation (bottom-left),
especially for far-away objects. Mark points of each car instance by a color. The boxes are super-imposed and
contain all points of the green and cyan cars respectively.
Pseudo-LiDAR from Visual Depth Estimation: Bridging the
Gap in 3D Object Detection for Autonomous Driving
Qualitative comparison. Compare AVOD with LiDAR, pseudo-LiDAR, and frontal-view (stereo).
Ground- truth boxes are in red, predicted boxes in green; the observer in the pseudo-LiDAR
plots (bottom row) is on the very left side looking to the right. The frontal-view approach
(right) even miscalculates the depths of nearby objects and misses far-away objects entirely.
Stereo R-CNN based 3D Object Detection
for Autonomous Driving
• A 3D object detection method for autonomous driving by fully exploiting the sparse and
dense, semantic and geometry information in stereo imagery.
• This method, called Stereo R-CNN, extends Faster R-CNN for stereo inputs to
simultaneously detect and associate object in left and right images.
• Add extra branches after stereo Region Proposal Network (RPN) to predict sparse keypoints,
viewpoints, and object dimensions, which are combined with 2D left-right boxes to calculate
a coarse1 3D object bounding box.
• Then to recover the accurate 3D bounding box by a region-based photometric alignment
using left and right RoIs.
• This method does not require depth input and 3D position supervision, however,
outperforms all existing fully supervised image-based methods.
• Code released at https://github.com/HKUST-Aerial-Robotics/Stereo-RCNN.
Stereo R-CNN based 3D Object Detection
for Autonomous Driving
The stereo R-CNN outputs stereo boxes, keypoints, dimensions, and the viewpoint angle,
followed by the 3D box estimation and the dense 3D box alignment module.
Stereo R-CNN based 3D Object Detection
for Autonomous Driving
Relations between object orientation θ,
azimuth β and viewpoint θ + β. Only same
viewpoints lead to same projections.
Different targets assignment for RPN classification and
regression.
Stereo R-CNN based 3D Object Detection
for Autonomous Driving
3D semantic keypoints, the 2D perspective keypoint, and boundary keypoints.
Stereo R-CNN based 3D Object Detection
for Autonomous Driving
Sparse constraints for the 3D box estimation
Stereo R-CNN based 3D Object Detection
for Autonomous Driving
From top to bottom: detections on left image, right image, and bird’s eye view image.
3-d interpretation from stereo images for autonomous driving

Contenu connexe

Tendances

Tendances (20)

Pedestrian behavior/intention modeling for autonomous driving V
Pedestrian behavior/intention modeling for autonomous driving VPedestrian behavior/intention modeling for autonomous driving V
Pedestrian behavior/intention modeling for autonomous driving V
 
Driving behaviors for adas and autonomous driving XI
Driving behaviors for adas and autonomous driving XIDriving behaviors for adas and autonomous driving XI
Driving behaviors for adas and autonomous driving XI
 
Depth Fusion from RGB and Depth Sensors III
Depth Fusion from RGB and Depth Sensors  IIIDepth Fusion from RGB and Depth Sensors  III
Depth Fusion from RGB and Depth Sensors III
 
Deep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data IIDeep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data II
 
Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)
 
Driving Behavior for ADAS and Autonomous Driving III
Driving Behavior for ADAS and Autonomous Driving IIIDriving Behavior for ADAS and Autonomous Driving III
Driving Behavior for ADAS and Autonomous Driving III
 
Camera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning IICamera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning II
 
Pedestrian behavior/intention modeling for autonomous driving IV
Pedestrian behavior/intention modeling for autonomous driving IVPedestrian behavior/intention modeling for autonomous driving IV
Pedestrian behavior/intention modeling for autonomous driving IV
 
Driving behaviors for adas and autonomous driving XII
Driving behaviors for adas and autonomous driving XIIDriving behaviors for adas and autonomous driving XII
Driving behaviors for adas and autonomous driving XII
 
camera-based Lane detection by deep learning
camera-based Lane detection by deep learningcamera-based Lane detection by deep learning
camera-based Lane detection by deep learning
 
Fisheye Omnidirectional View in Autonomous Driving II
Fisheye Omnidirectional View in Autonomous Driving IIFisheye Omnidirectional View in Autonomous Driving II
Fisheye Omnidirectional View in Autonomous Driving II
 
Driving behaviors for adas and autonomous driving XIII
Driving behaviors for adas and autonomous driving XIIIDriving behaviors for adas and autonomous driving XIII
Driving behaviors for adas and autonomous driving XIII
 
Fisheye-Omnidirectional View in Autonomous Driving III
Fisheye-Omnidirectional View in Autonomous Driving IIIFisheye-Omnidirectional View in Autonomous Driving III
Fisheye-Omnidirectional View in Autonomous Driving III
 
Deep VO and SLAM
Deep VO and SLAMDeep VO and SLAM
Deep VO and SLAM
 
Prediction and planning for self driving at waymo
Prediction and planning for self driving at waymoPrediction and planning for self driving at waymo
Prediction and planning for self driving at waymo
 
Fisheye Omnidirectional View in Autonomous Driving
Fisheye Omnidirectional View in Autonomous DrivingFisheye Omnidirectional View in Autonomous Driving
Fisheye Omnidirectional View in Autonomous Driving
 
Deep Learning’s Application in Radar Signal Data
Deep Learning’s Application in Radar Signal DataDeep Learning’s Application in Radar Signal Data
Deep Learning’s Application in Radar Signal Data
 
BEV Semantic Segmentation
BEV Semantic SegmentationBEV Semantic Segmentation
BEV Semantic Segmentation
 
Deep VO and SLAM IV
Deep VO and SLAM IVDeep VO and SLAM IV
Deep VO and SLAM IV
 
3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV
 

Similaire à 3-d interpretation from stereo images for autonomous driving

10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
mokamojah
 
Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...
Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...
Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...
CSCJournals
 

Similaire à 3-d interpretation from stereo images for autonomous driving (20)

fusion of Camera and lidar for autonomous driving I
fusion of Camera and lidar for autonomous driving Ifusion of Camera and lidar for autonomous driving I
fusion of Camera and lidar for autonomous driving I
 
fusion of Camera and lidar for autonomous driving II
fusion of Camera and lidar for autonomous driving IIfusion of Camera and lidar for autonomous driving II
fusion of Camera and lidar for autonomous driving II
 
3-d interpretation from single 2-d image III
3-d interpretation from single 2-d image III3-d interpretation from single 2-d image III
3-d interpretation from single 2-d image III
 
LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)
 
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
 
Mmpaper draft10
Mmpaper draft10Mmpaper draft10
Mmpaper draft10
 
Mmpaper draft10
Mmpaper draft10Mmpaper draft10
Mmpaper draft10
 
[DL輪読会]ClearGrasp
[DL輪読会]ClearGrasp[DL輪読会]ClearGrasp
[DL輪読会]ClearGrasp
 
Pose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learningPose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learning
 
Stereo Matching by Deep Learning
Stereo Matching by Deep LearningStereo Matching by Deep Learning
Stereo Matching by Deep Learning
 
Deep learning for 3 d point clouds presentation
Deep learning for 3 d point clouds presentationDeep learning for 3 d point clouds presentation
Deep learning for 3 d point clouds presentation
 
3-d interpretation from single 2-d image V
3-d interpretation from single 2-d image V3-d interpretation from single 2-d image V
3-d interpretation from single 2-d image V
 
Depth Fusion from RGB and Depth Sensors II
Depth Fusion from RGB and Depth Sensors IIDepth Fusion from RGB and Depth Sensors II
Depth Fusion from RGB and Depth Sensors II
 
Unsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object trackingUnsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object tracking
 
Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...
Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...
Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...
 
Vision based non-invasive tool for facial swelling assessment
Vision based non-invasive tool for facial swelling assessment Vision based non-invasive tool for facial swelling assessment
Vision based non-invasive tool for facial swelling assessment
 
Sensors optimized for 3 d digitization
Sensors optimized for 3 d digitizationSensors optimized for 3 d digitization
Sensors optimized for 3 d digitization
 
Goal location prediction based on deep learning using RGB-D camera
Goal location prediction based on deep learning using RGB-D cameraGoal location prediction based on deep learning using RGB-D camera
Goal location prediction based on deep learning using RGB-D camera
 
Indoor scene understanding for autonomous agents
Indoor scene understanding for autonomous agentsIndoor scene understanding for autonomous agents
Indoor scene understanding for autonomous agents
 
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
 

Plus de Yu Huang

Plus de Yu Huang (20)

Application of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingApplication of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous Driving
 
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
 
Data Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingData Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous Driving
 
Techniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingTechniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous Driving
 
BEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationBEV Joint Detection and Segmentation
BEV Joint Detection and Segmentation
 
BEV Object Detection and Prediction
BEV Object Detection and PredictionBEV Object Detection and Prediction
BEV Object Detection and Prediction
 
Fisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIFisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VI
 
Fisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VFisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving V
 
Fisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVFisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IV
 
Prediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduPrediction,Planninng & Control at Baidu
Prediction,Planninng & Control at Baidu
 
Cruise AI under the Hood
Cruise AI under the HoodCruise AI under the Hood
Cruise AI under the Hood
 
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
 
Scenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingScenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous Driving
 
How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?
 
Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous Driving
 
Simulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgSimulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atg
 
Multi sensor calibration by deep learning
Multi sensor calibration by deep learningMulti sensor calibration by deep learning
Multi sensor calibration by deep learning
 
Jointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningJointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planning
 
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous driving
 
Open Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planningOpen Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planning
 

Dernier

DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
MayuraD1
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
mphochane1998
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
Kamal Acharya
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
AldoGarca30
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
jaanualu31
 

Dernier (20)

Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
Wadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxWadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptx
 
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEGEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 

3-d interpretation from stereo images for autonomous driving

  • 1. 3D Interpretation from Stereo Images for Autonomous Driving Yu Huang Yu.huang07@gmail.com Sunnyvale, California
  • 2. Outline • Object-Centric Stereo Matching for 3D Object Detection • Triangulation Learning Network: from Monocular to Stereo 3D Object Detection • Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving • Stereo R-CNN based 3D Object Detection for Autonomous Driving • 3d object proposals for accurate object class detection
  • 3. Object-Centric Stereo Matching for 3D Object Detection • The current SoA for stereo 3D object detection takes the existing PSMNet stereo matching network, with no modifications, and converts the estimated disparities into a 3D point cloud, and feeds this point cloud into a LiDAR-based 3D object detector. • The issue with existing stereo matching networks is that they are designed for disparity estimation, not 3D object detection; the shape and accuracy of object point clouds are not the focus. • Stereo matching networks commonly suffer from inaccurate depth estimates at object boundaries, which this method defines as streaking, because BG and FG points are jointly estimated. • Existing networks also penalize disparity instead of the estimated position of object point clouds in their loss functions. • Here it proposes a 2D box association and object-centric stereo matching method that only estimates the disparities of the objects of interest to address these two issues.
  • 4. Object-Centric Stereo Matching for 3D Object Detection First, a 2D detector generates 2D boxes in Il and Ir. Next, a box association algorithm matches object detections across both images. Each matched detection pair is passed into the object-centric stereo network, which jointly produces a disparity map and instance segmentation mask for each object. Together, these form a disparity map containing only the objects of interest. Lastly, the disparity map is transformed into a point cloud that can be used by any LiDAR-based 3D object detection network to predict the 3D bounding boxes.
  • 5. Object-Centric Stereo Matching for 3D Object Detection Qualitative results on KITTI. Ground truth and predictions are in red and green, respectively. Colored points are predicted by our stereo matching network while LiDAR points are shown in black for visualization purposes only.
  • 6. Triangulation Learning Network: from Monocular to Stereo 3D Object Detection • For 3D object detection from stereo images, the key challenge is how to effectively utilize stereo information. • Different from previous methods using pixel-level depth maps, this method employs 3D anchors to explicitly construct object-level correspondences between the ROI in stereo images, from which DNN learns to detect and triangulate the targeted object in 3D space. • It introduces a cost-efficient channel reweighting strategy that enhances representational features and weakens noisy signals to facilitate the learning process. • All of these are flexibly integrated into a solid baseline detector that uses monocular images. • It is demonstrated that both the monocular baseline and the stereo triangulation learning network outperform the prior SoA in 3D object detection and localization on KITTI dataset.
  • 7. Triangulation Learning Network: from Monocular to Stereo 3D Object Detection Overview of the 3D detection pipeline. The baseline monocular network is indicated with blue background, and can be easily extended to stereo inputs by duplicating the baseline and further integrating with the TLNet (Triangulation Learning Network).
  • 8. Triangulation Learning Network: from Monocular to Stereo 3D Object Detection • The baseline network taking a mono image as the input is composed of a backbone and 3 subsequent modules, i.e. front view anchor generation, 3D box proposal and refinement. • The three-stage pipeline progressively reduces the searching space by selecting confident anchors, which highly reduces computational complexity. • The stereo 3D detection is performed by integrating a triangulation learning network (TL- Net) into the baseline model. • Triangulation is known as localizing 3D points from multi- view images in the classical geometry fields, while this objective is to localize a 3D object and estimates its size and orientation from stereo images. • To achieve this, introduce an anchor triangulation scheme, in which the NN uses 3D anchors as reference to triangulate the targets.
  • 9. Triangulation Learning Network: from Monocular to Stereo 3D Object Detection Front view anchor generation. Potential anchors are of high objectness in the front view. Only the potential anchors are fed into RPN to reduce searching space and save computational cost.
  • 10. Triangulation Learning Network: from Monocular to Stereo 3D Object Detection Anchor triangulation. By projecting the 3D anchor box to stereo images, obtain a pair of RoIs. The left RoI establishes a geometric correspondence with the right one via the anchor box. The nearby target is present in both RoIs with slightly positional differences. The TLNet takes the RoI pair as input and utilizes the 3D anchor as reference to localize the targeted object.
  • 11. Triangulation Learning Network: from Monocular to Stereo 3D Object Detection The TLNet takes as input a pair of left- right RoI features Fl and Fr with Croi channels and size Hroi ×Wroi, which are obtained using RoIAlign by projecting the same 3D anchor to the left and right frames. To utilize the left-right coherence scores to reweight each channel. The reweighted features are fused using element-wise addition and passed to task-specific fully-connected layers to predict the objectness confidence and 3D bounding box offsets, i.e., the 3D geometric variance between the anchor and target.
  • 12. Triangulation Learning Network: from Monocular to Stereo 3D Object Detection Orange bounding boxes are detection results, while the green boxes are ground truths. For the main method, also visualize the projected 3D bounding boxes in image, i.e., the first and forth rows. The lidar point clouds are visualized for reference but not used in both training and evaluation. It is shown that the triangulation learning method can reduce missed detections and improve the performance of depth prediction at distant regions.
  • 13. Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving • Approaches based on cheaper monocular or stereo imagery data have, until now, resulted in drastically lower accuracies — a gap that is commonly attributed to poor image-based depth estimation. • However, it is not the quality of the data but its representation that accounts for the majority of the difference. • Taking the inner workings of CNNs into consideration, convert image-based depth maps to pseudo- LiDAR representations — essentially mimicking the LiDAR signal. • With this representation, apply different existing LiDAR-based detection algorithms. • On the popular KITTI benchmark, this approach achieves impressive improvements over the existing state-of-the-art in image-based performance — raising the detection accuracy of objects within the 30m range from the previous state-of-the-art of 22% to an unprecedented 74%.
  • 14. Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving The pipeline for image-based 3D object detection. Given stereo or monocular images, first predict the depth map, followed by back-projecting it into a 3D point cloud in the LiDAR coordinate system. Refer this representation as pseudo- LiDAR, and process it exactly like LiDAR — any LiDAR-based detection algorithms can be applied.
  • 15. Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving Apply a single 2D convolution with a uniform kernel to the frontal view depth map (top-left). The resulting depth map (top-right), after back-projected into pseudo- LiDAR and displayed from the bird’s-eye view (bottom- right), reveals a large depth distortion in comparison to the original pseudo-LiDAR representation (bottom-left), especially for far-away objects. Mark points of each car instance by a color. The boxes are super-imposed and contain all points of the green and cyan cars respectively.
  • 16. Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving Qualitative comparison. Compare AVOD with LiDAR, pseudo-LiDAR, and frontal-view (stereo). Ground- truth boxes are in red, predicted boxes in green; the observer in the pseudo-LiDAR plots (bottom row) is on the very left side looking to the right. The frontal-view approach (right) even miscalculates the depths of nearby objects and misses far-away objects entirely.
  • 17. Stereo R-CNN based 3D Object Detection for Autonomous Driving • A 3D object detection method for autonomous driving by fully exploiting the sparse and dense, semantic and geometry information in stereo imagery. • This method, called Stereo R-CNN, extends Faster R-CNN for stereo inputs to simultaneously detect and associate object in left and right images. • Add extra branches after stereo Region Proposal Network (RPN) to predict sparse keypoints, viewpoints, and object dimensions, which are combined with 2D left-right boxes to calculate a coarse1 3D object bounding box. • Then to recover the accurate 3D bounding box by a region-based photometric alignment using left and right RoIs. • This method does not require depth input and 3D position supervision, however, outperforms all existing fully supervised image-based methods. • Code released at https://github.com/HKUST-Aerial-Robotics/Stereo-RCNN.
  • 18. Stereo R-CNN based 3D Object Detection for Autonomous Driving The stereo R-CNN outputs stereo boxes, keypoints, dimensions, and the viewpoint angle, followed by the 3D box estimation and the dense 3D box alignment module.
  • 19. Stereo R-CNN based 3D Object Detection for Autonomous Driving Relations between object orientation θ, azimuth β and viewpoint θ + β. Only same viewpoints lead to same projections. Different targets assignment for RPN classification and regression.
  • 20. Stereo R-CNN based 3D Object Detection for Autonomous Driving 3D semantic keypoints, the 2D perspective keypoint, and boundary keypoints.
  • 21. Stereo R-CNN based 3D Object Detection for Autonomous Driving Sparse constraints for the 3D box estimation
  • 22. Stereo R-CNN based 3D Object Detection for Autonomous Driving From top to bottom: detections on left image, right image, and bird’s eye view image.