SlideShare une entreprise Scribd logo
1  sur  42
Télécharger pour lire hors ligne
3D Interpretation from Single 2D Image
for Autonomous Driving V
Yu Huang
Yu.huang07@gmail.com
Sunnyvale, California
Outline
• MonoRUn: Monocular 3D Object Detection by Reconstruction and
Uncertainty Propagation
• M3DSSD: Monocular 3D Single Stage Object Detector
• Delving into Localization Errors for Monocular 3D Object Detection
• GrooMeD-NMS: Grouped Mathematically Differentiable NMS for
Monocular 3D Object Detection
• Objects are Different: Flexible Monocular 3D Object Detection
MonoRUn: Monocular 3D Object Detection by Reconstruction
and Uncertainty Propagation
• MonoRUn: Self supervised, learn dense
correspondences and geometry;
• Robust KL loss: minimize the uncertainty
weighted projection error;
• Uncertainty aware region reconstruction
network for 3-d object coordinate
regression;
• uncertainty-driven PnP for object pose
and covariance matrix estimation;
• Codes: https://github.com/tjiiv-
cprg/MonoRUn.
MonoRUn: Monocular 3D Object Detection by Reconstruction
and Uncertainty Propagation
MonoRUn: Monocular 3D Object Detection by Reconstruction
and Uncertainty Propagation
MonoRUn: Monocular 3D Object Detection by Reconstruction
and Uncertainty Propagation
M3DSSD: Monocular 3D Single Stage
Object Detector
• Monocular 3D Single Stage object Detector (M3DSSD) with feature alignment and
asymmetric non-local attention.
• Current anchor-based monocular 3D object detection methods suffer from feature
mismatching.
• To overcome this, propose a two-step feature alignment approach.
• In the first step, the shape alignment is performed to enable the receptive field of
the feature map to focus on the pre-defined anchors with high confidence scores.
• In the second step, the center alignment is used to align the features at 2D/3D
centers.
• Further, it is often difficult to learn global information and capture long-range
relationships, which are important for the depth prediction of objects.
• Asymmetric non-local attention block with multiscale sampling to extract depth-
wise features.
• The code is released at https://github.com/mumianyuxin/M3DSSD.
M3DSSD: Monocular 3D Single Stage
Object Detector
The architecture of M3DSSD. (a) The backbone of the framework, which is modified from DLA-102. (b)
The two-step feature alignment, classification head, 2D/3D center regression heads, and ANAB
especially designed for predicting the depth z3d. (c) Other regression heads.
M3DSSD: Monocular 3D Single Stage
Object Detector
The architecture of shape alignment
and the outcome of shape alignment
on objects. The yellow squares indicate
the sampling location of the AlignConv,
and the anchors are in red.
M3DSSD: Monocular 3D Single Stage
Object Detector
The architectures of center alignment and the
outcome of the center alignment. When
applying center alignment to objects, the
sampling locations on the foreground regions (in
white) all concentrate on the centers of objects
(in yellow) after center alignment, which are
near to the true centers of objects (in red).
M3DSSD: Monocular 3D Single Stage
Object Detector
Asymmetric Non-local Attention Block. The
key and query branches share the same
attention maps, which forces the key and
value to focus on the same place. Bottom:
Pyramid Average Pooling with Attention
(PA2) that generates different level
descriptors in various resolutions.
M3DSSD: Monocular 3D Single Stage
Object Detector
M3DSSD: Monocular 3D Single Stage
Object Detector
M3DSSD: Monocular 3D Single Stage
Object Detector
Delving into Localization Errors for
Monocular 3D Object Detection
• Quantify the impact introduced by each sub-task and found the
‘localization error’ is the vital factor in restricting monocular 3D detection.
• Besides, investigate the underlying reasons behind localization errors,
analyze the issues they might bring, and propose three strategies.
• First, misalignment between the center of the 2D bounding box and the
projected center of the 3D object, which is a vital factor leading to low
localization accuracy.
• Second, we observe that accurately localizing distant objects with existing
technologies is almost impossible, while those samples will mislead the
learned network. To remove such samples from the training set for
improving the overall performance of the detector.
• Lastly, 3D IoU oriented loss for the size estimation of the object, which is not
affected by ‘localization error’.
• Codes: https://github.com/xinzhuma/monodle.
Delving into Localization Errors for
Monocular 3D Object Detection
Delving into Localization Errors for
Monocular 3D Object Detection
• Coupled with the errors accumulated by other tasks such as depth
estimation, it becomes an almost impossible task to accurately estimate the
3D bounding box of distant objects from a single monocular image, unless
the depth estimation is accurate enough (not achieved to date).
• For estimating the coarse center c, 1) use the projected 3D center cw as the
ground-truth for the branch estimating coarse center c and 2) force the
model to learn features from 2D detection simultaneously.
• Here adopting the projected 3D center cw as the ground-truth for the
coarse center c, helps the branch for estimating the coarse center aware of
3D geometry and more related to the task of estimating 3D object center,
which is the key of localization problem.
• 2D detection serves as auxiliary task to learn better 3D aware features.
Delving into Localization Errors for
Monocular 3D Object Detection
• Two schemes are proposed on how to generate the object level training weight for sample:
• Hard coding: discard all samples over a certain distance
• Soft coding: generate it using a reverse sigmoid-like function
• A IoU oriented optimization for 3D size estimation: Specifically, suppose all prediction items
except the 3D size s = [h,w,l]3D are completely correct, then
Delving into Localization Errors for
Monocular 3D Object Detection
Delving into Localization Errors for
Monocular 3D Object Detection
Delving into Localization Errors for
Monocular 3D Object Detection
GrooMeD-NMS: Grouped Mathematically Differentiable
NMS for Monocular 3D Object Detection
• While there were attempts to include NMS in the training pipeline for tasks
such as 2D object detection, they have been less widely
• adopted due to a non-mathematical expression of the NMS.
• It integrate GrooMeD-NMS – a Grouped Mathematically Differentiable NMS
for monocular 3D object detection, such that the network is trained end-to-
end with a loss on the boxes after NMS.
• First formulate NMS as a matrix operation and then group and mask the
boxes in an unsupervised manner to obtain a simple closed-form expression
of the NMS.
• GrooMeDNMS addresses the mismatch between training and inference
pipelines and, therefore, forces the network to select the best 3D box in a
differentiable manner.
GrooMeD-NMS: Grouped Mathematically Differentiable
NMS for Monocular 3D Object Detection
(a) Conventional object detection has a mismatch between training and inference as it uses NMS
only in inference. (b) To address this, propose a novel GrooMeD-NMS layer, such that the network is
trained end-to-end with NMS applied. s and r denote the score of boxes B before and after the NMS
respectively. O denotes the matrix containing IoU2D overlaps of B. Lbefore denotes the losses before the
NMS, while Lafter denotes the loss after the NMS. (c) GrooMeD-NMS layer calculates r in a differentiable
manner giving gradients from Lafter when the best-localized box corresponding to an object is not
selected after NMS.
GrooMeD-NMS: Grouped Mathematically Differentiable
NMS for Monocular 3D Object Detection
GrooMeD-NMS: Grouped Mathematically Differentiable
NMS for Monocular 3D Object Detection
write the rescores r in a matrix formulation as written compactly as
where P, called the Prune Matrix, is obtained when the
pruning function p operates element-wise on O.
to avoid recursion, use as the solution
cluster boxes in an image in an unsupervised manner
based on IoU2D overlaps to obtain the groups G. Grouping
thus mimics the grouping of the classical NMS, but does
not rescore the boxes. Rewrite as
GrooMeD-NMS: Grouped Mathematically Differentiable
NMS for Monocular 3D Object Detection
Classical NMS considers the IoU2D of the top-scored box
with other boxes. This consideration is equivalent to only
keeping the column of O corresponding to the top box
while assigning the rest of the columns to be zero.
Implement this through masking of PGk . Let MGk denote
the binary mask corresponding to group Gk.
Due to Frobenius matrix, simplify further to get
entries in the binary matrix MGk in the column correspond.
to the top scored box are 1 and the rest are 0.
GrooMeD-NMS: Grouped Mathematically Differentiable
NMS for Monocular 3D Object Detection
pruning function
GrooMeD-NMS: Grouped Mathematically Differentiable
NMS for Monocular 3D Object Detection
The method consists of M3DRPN and uses binning and
self-balancing confidence. The boxes’ self-balancing
confidence are used as scores s, which pass through the
GrooMeD-NMS layer to obtain the rescores r. The
rescores signal the network if the best box has not been
selected for a particular object. The target assignment is
For calculating gIoU3D, first calculate the volume V
and hull volume Vhull of the 3D boxes. Vhull is the
product of gIoU2D in Birds Eye View (BEV),
removing the rotations and hull of the Y dimension.
if the best boxes are correctly ranked in one image
and are not in the second, then the gradients only
affect the boxes of the second image.
modification as Image-wise AP-Loss
use the modified AP-Loss as the loss after NMS since
AP-Loss does not suffer from class imbalance.
GrooMeD-NMS: Grouped Mathematically Differentiable
NMS for Monocular 3D Object Detection
GrooMeD-NMS: Grouped Mathematically Differentiable
NMS for Monocular 3D Object Detection
Objects are Different: Flexible Monocular 3D
Object Detection
• Most existing methods adopt the same approach for all objects regardless of
diverse distributions, leading to limited performance for truncated objects.
• A flexible framework for monocular 3D object detection which explicitly decouples
the truncated objects and adaptively combines multiple approaches for object
depth estimation.
• Specifically, decouple the edge of the feature map for predicting long-tail truncated
objects so that the optimization of normal objects is not influenced.
• Furthermore, formulate the object depth estimation as an uncertainty-guided
ensemble of directly regressed object depth and solved depths from different
groups of keypoints.
• Code to be at https://github.com/zhangyp15/MonoFlex.
Objects are Different: Flexible Monocular 3D
Object Detection
Objects are Different: Flexible Monocular 3D
Object Detection
• The framework is extended from CenterNet, where objects are identified by their
representative points and predicted by peaks of the heatmap.
• First, the CNN backbone extracts feature maps from the monocular image as the
input for multiple prediction heads.
• Multiple prediction branches are deployed on the shared backbone to regress
objects’ properties, including the 2D bounding box, dimension, orientation,
keypoints, and depth.
• The image-level localization involves the heatmap and offsets, where the edge
fusion modules are used to decouple the feature learning and prediction of
truncated objects.
• The final depth estimation is an uncertainty guided combination of the regressed
depth and the computed depths from estimated keypoints and dimensions.
• The adaptive depth ensemble adopts four methods for depth estimation and
simultaneously predicts their uncertainties, which are utilized to form an uncertainty
weighted prediction.
Objects are Different: Flexible Monocular 3D
Object Detection
The dimension and orientation can be directly inferred
from appearance-based clues, while the 3D location is
converted to the projected 3D center xc = (uc, vc) and the
object depth z as
Existing methods utilize a unified representation xr, the
center of 2D bounding box xb, for every object. In such
cases, the offset 𝛿c = xc - xb is regressed to derive the
projected 3D center xc.
(a) The 3D location is converted to the projected
center and the object depth. (b) The distribution of
the offsets c from 2D centers to projected 3D
centers. Inside and outside objects exhibit entirely
different distributions.
Divide objects into two groups depending on whether
their projected 3D centers are inside or outside the
image.
The joint learning of 𝛿c can suffer from long-tail
offsets and therefore to decouple the
representations and the offset learning of inside
and outside objects.
Objects are Different: Flexible Monocular 3D
Object Detection
Inside Objects’ discretization offset error
Outside Objects’ discretization offset error
(a) The intersection xI between the image edge and
the line from xb to xc is used to represent the
truncated object. (b) The edge heatmap is generated
with 1D Gaussian distribution. (c) The edge
intersection xI (cyan) is a better representation than
2D center xb (green) for heavily truncated objects.
Since 2D bounding boxes only capture the inside-
image part of objects, the visual locations of xb can
be confusing and even on other objects. By contrast,
the intersection xI disentangles the edge area of the
heatmap to focus on outside objects and offers a
strong boundary prior to simplify the localization.
Objects are Different: Flexible Monocular 3D
Object Detection
• Edge fusion module to further decouple the feature learning and prediction of
outside objects;
• The module first extracts four boundaries of the feature map and concatenates
them into an edge feature vector in clockwise order, which is then processed by
two 1D convolutional layers to learn
• unique features for truncated objects.
• Finally, the processed vector is remapped to the four boundaries and added to
the input feature map.
Objects are Different: Flexible Monocular 3D
Object Detection
Relation of global orientation, local
orientation, and the viewing angle.
Keypoints include the projections of eight vertexes,
top center and bottom center of the 3D bounding box.
The depth of a supporting line of the 3D bounding
box can be computed with the object height and the line’s
pixel height. Split ten keypoints into three groups, each
of which can produce the center depth independently
Objects are Different: Flexible Monocular 3D
Object Detection
Objects are Different: Flexible Monocular 3D
Object Detection
Objects are Different: Flexible Monocular 3D
Object Detection
Objects are Different: Flexible Monocular 3D
Object Detection
3-d interpretation from single 2-d image V

Contenu connexe

Tendances

Depth Fusion from RGB and Depth Sensors II
Depth Fusion from RGB and Depth Sensors IIDepth Fusion from RGB and Depth Sensors II
Depth Fusion from RGB and Depth Sensors IIYu Huang
 
Fisheye Omnidirectional View in Autonomous Driving
Fisheye Omnidirectional View in Autonomous DrivingFisheye Omnidirectional View in Autonomous Driving
Fisheye Omnidirectional View in Autonomous DrivingYu Huang
 
3-d interpretation from single 2-d image III
3-d interpretation from single 2-d image III3-d interpretation from single 2-d image III
3-d interpretation from single 2-d image IIIYu Huang
 
3-d interpretation from single 2-d image for autonomous driving
3-d interpretation from single 2-d image for autonomous driving3-d interpretation from single 2-d image for autonomous driving
3-d interpretation from single 2-d image for autonomous drivingYu Huang
 
3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IVYu Huang
 
LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)Yu Huang
 
Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling Yu Huang
 
Multi sensor calibration by deep learning
Multi sensor calibration by deep learningMulti sensor calibration by deep learning
Multi sensor calibration by deep learningYu Huang
 
Deep vo and slam iii
Deep vo and slam iiiDeep vo and slam iii
Deep vo and slam iiiYu Huang
 
Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)Yu Huang
 
Anchor free object detection by deep learning
Anchor free object detection by deep learningAnchor free object detection by deep learning
Anchor free object detection by deep learningYu Huang
 
Fisheye Omnidirectional View in Autonomous Driving II
Fisheye Omnidirectional View in Autonomous Driving IIFisheye Omnidirectional View in Autonomous Driving II
Fisheye Omnidirectional View in Autonomous Driving IIYu Huang
 
Deep Learning’s Application in Radar Signal Data
Deep Learning’s Application in Radar Signal DataDeep Learning’s Application in Radar Signal Data
Deep Learning’s Application in Radar Signal DataYu Huang
 
Stereo Matching by Deep Learning
Stereo Matching by Deep LearningStereo Matching by Deep Learning
Stereo Matching by Deep LearningYu Huang
 
Deep learning for image video processing
Deep learning for image video processingDeep learning for image video processing
Deep learning for image video processingYu Huang
 
Deep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data IIDeep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data IIYu Huang
 
Pedestrian behavior/intention modeling for autonomous driving V
Pedestrian behavior/intention modeling for autonomous driving VPedestrian behavior/intention modeling for autonomous driving V
Pedestrian behavior/intention modeling for autonomous driving VYu Huang
 
Deep vo and slam ii
Deep vo and slam iiDeep vo and slam ii
Deep vo and slam iiYu Huang
 
Depth Fusion from RGB and Depth Sensors III
Depth Fusion from RGB and Depth Sensors  IIIDepth Fusion from RGB and Depth Sensors  III
Depth Fusion from RGB and Depth Sensors IIIYu Huang
 
Camera-based road Lane detection by deep learning III
Camera-based road Lane detection by deep learning IIICamera-based road Lane detection by deep learning III
Camera-based road Lane detection by deep learning IIIYu Huang
 

Tendances (20)

Depth Fusion from RGB and Depth Sensors II
Depth Fusion from RGB and Depth Sensors IIDepth Fusion from RGB and Depth Sensors II
Depth Fusion from RGB and Depth Sensors II
 
Fisheye Omnidirectional View in Autonomous Driving
Fisheye Omnidirectional View in Autonomous DrivingFisheye Omnidirectional View in Autonomous Driving
Fisheye Omnidirectional View in Autonomous Driving
 
3-d interpretation from single 2-d image III
3-d interpretation from single 2-d image III3-d interpretation from single 2-d image III
3-d interpretation from single 2-d image III
 
3-d interpretation from single 2-d image for autonomous driving
3-d interpretation from single 2-d image for autonomous driving3-d interpretation from single 2-d image for autonomous driving
3-d interpretation from single 2-d image for autonomous driving
 
3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV
 
LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)
 
Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling
 
Multi sensor calibration by deep learning
Multi sensor calibration by deep learningMulti sensor calibration by deep learning
Multi sensor calibration by deep learning
 
Deep vo and slam iii
Deep vo and slam iiiDeep vo and slam iii
Deep vo and slam iii
 
Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)
 
Anchor free object detection by deep learning
Anchor free object detection by deep learningAnchor free object detection by deep learning
Anchor free object detection by deep learning
 
Fisheye Omnidirectional View in Autonomous Driving II
Fisheye Omnidirectional View in Autonomous Driving IIFisheye Omnidirectional View in Autonomous Driving II
Fisheye Omnidirectional View in Autonomous Driving II
 
Deep Learning’s Application in Radar Signal Data
Deep Learning’s Application in Radar Signal DataDeep Learning’s Application in Radar Signal Data
Deep Learning’s Application in Radar Signal Data
 
Stereo Matching by Deep Learning
Stereo Matching by Deep LearningStereo Matching by Deep Learning
Stereo Matching by Deep Learning
 
Deep learning for image video processing
Deep learning for image video processingDeep learning for image video processing
Deep learning for image video processing
 
Deep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data IIDeep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data II
 
Pedestrian behavior/intention modeling for autonomous driving V
Pedestrian behavior/intention modeling for autonomous driving VPedestrian behavior/intention modeling for autonomous driving V
Pedestrian behavior/intention modeling for autonomous driving V
 
Deep vo and slam ii
Deep vo and slam iiDeep vo and slam ii
Deep vo and slam ii
 
Depth Fusion from RGB and Depth Sensors III
Depth Fusion from RGB and Depth Sensors  IIIDepth Fusion from RGB and Depth Sensors  III
Depth Fusion from RGB and Depth Sensors III
 
Camera-based road Lane detection by deep learning III
Camera-based road Lane detection by deep learning IIICamera-based road Lane detection by deep learning III
Camera-based road Lane detection by deep learning III
 

Similaire à 3-d interpretation from single 2-d image V

3-d interpretation from stereo images for autonomous driving
3-d interpretation from stereo images for autonomous driving3-d interpretation from stereo images for autonomous driving
3-d interpretation from stereo images for autonomous drivingYu Huang
 
Deep learning for 3 d point clouds presentation
Deep learning for 3 d point clouds presentationDeep learning for 3 d point clouds presentation
Deep learning for 3 d point clouds presentationVijaylaxmiNagurkar
 
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...Edge AI and Vision Alliance
 
Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud
Point-GNN: Graph Neural Network for 3D Object Detection in a Point CloudPoint-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud
Point-GNN: Graph Neural Network for 3D Object Detection in a Point CloudNuwan Sriyantha Bandara
 
Large Scale Image Retrieval 2022.pdf
Large Scale Image Retrieval 2022.pdfLarge Scale Image Retrieval 2022.pdf
Large Scale Image Retrieval 2022.pdfSamuCerezo
 
FastV2C-HandNet - ICICC 2020
FastV2C-HandNet - ICICC 2020FastV2C-HandNet - ICICC 2020
FastV2C-HandNet - ICICC 2020RohanLekhwani
 
Introduction to 3D Computer Vision and Differentiable Rendering
Introduction to 3D Computer Vision and Differentiable RenderingIntroduction to 3D Computer Vision and Differentiable Rendering
Introduction to 3D Computer Vision and Differentiable RenderingPreferred Networks
 
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdfmokamojah
 
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4IRJET Journal
 
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...Dongmin Choi
 
BEV Semantic Segmentation
BEV Semantic SegmentationBEV Semantic Segmentation
BEV Semantic SegmentationYu Huang
 
Parallel wisard object tracker a rambased tracking system
Parallel wisard object tracker a rambased tracking systemParallel wisard object tracker a rambased tracking system
Parallel wisard object tracker a rambased tracking systemcseij
 
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis taeseon ryu
 
Object detection in images based on homogeneous region segmentation
Object detection in images based on homogeneous region segmentationObject detection in images based on homogeneous region segmentation
Object detection in images based on homogeneous region segmentationAbdeslamAmrane2
 
AUTO AI 2021 talk Real world data augmentations for autonomous driving : B Ra...
AUTO AI 2021 talk Real world data augmentations for autonomous driving : B Ra...AUTO AI 2021 talk Real world data augmentations for autonomous driving : B Ra...
AUTO AI 2021 talk Real world data augmentations for autonomous driving : B Ra...Ravi Kiran B.
 
OBJECT DETECTION FOR SERVICE ROBOT USING RANGE AND COLOR FEATURES OF AN IMAGE
OBJECT DETECTION FOR SERVICE ROBOT USING RANGE AND COLOR FEATURES OF AN IMAGEOBJECT DETECTION FOR SERVICE ROBOT USING RANGE AND COLOR FEATURES OF AN IMAGE
OBJECT DETECTION FOR SERVICE ROBOT USING RANGE AND COLOR FEATURES OF AN IMAGEIJCSEA Journal
 
Object Detection for Service Robot Using Range and Color Features of an Image
Object Detection for Service Robot Using Range and Color Features of an ImageObject Detection for Service Robot Using Range and Color Features of an Image
Object Detection for Service Robot Using Range and Color Features of an ImageIJCSEA Journal
 

Similaire à 3-d interpretation from single 2-d image V (20)

3-d interpretation from stereo images for autonomous driving
3-d interpretation from stereo images for autonomous driving3-d interpretation from stereo images for autonomous driving
3-d interpretation from stereo images for autonomous driving
 
Deep learning for 3 d point clouds presentation
Deep learning for 3 d point clouds presentationDeep learning for 3 d point clouds presentation
Deep learning for 3 d point clouds presentation
 
[DL輪読会]ClearGrasp
[DL輪読会]ClearGrasp[DL輪読会]ClearGrasp
[DL輪読会]ClearGrasp
 
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
 
Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud
Point-GNN: Graph Neural Network for 3D Object Detection in a Point CloudPoint-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud
Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud
 
Large Scale Image Retrieval 2022.pdf
Large Scale Image Retrieval 2022.pdfLarge Scale Image Retrieval 2022.pdf
Large Scale Image Retrieval 2022.pdf
 
FastV2C-HandNet - ICICC 2020
FastV2C-HandNet - ICICC 2020FastV2C-HandNet - ICICC 2020
FastV2C-HandNet - ICICC 2020
 
Introduction to 3D Computer Vision and Differentiable Rendering
Introduction to 3D Computer Vision and Differentiable RenderingIntroduction to 3D Computer Vision and Differentiable Rendering
Introduction to 3D Computer Vision and Differentiable Rendering
 
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
 
Mmpaper draft10
Mmpaper draft10Mmpaper draft10
Mmpaper draft10
 
Mmpaper draft10
Mmpaper draft10Mmpaper draft10
Mmpaper draft10
 
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
 
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
 
BEV Semantic Segmentation
BEV Semantic SegmentationBEV Semantic Segmentation
BEV Semantic Segmentation
 
Parallel wisard object tracker a rambased tracking system
Parallel wisard object tracker a rambased tracking systemParallel wisard object tracker a rambased tracking system
Parallel wisard object tracker a rambased tracking system
 
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
 
Object detection in images based on homogeneous region segmentation
Object detection in images based on homogeneous region segmentationObject detection in images based on homogeneous region segmentation
Object detection in images based on homogeneous region segmentation
 
AUTO AI 2021 talk Real world data augmentations for autonomous driving : B Ra...
AUTO AI 2021 talk Real world data augmentations for autonomous driving : B Ra...AUTO AI 2021 talk Real world data augmentations for autonomous driving : B Ra...
AUTO AI 2021 talk Real world data augmentations for autonomous driving : B Ra...
 
OBJECT DETECTION FOR SERVICE ROBOT USING RANGE AND COLOR FEATURES OF AN IMAGE
OBJECT DETECTION FOR SERVICE ROBOT USING RANGE AND COLOR FEATURES OF AN IMAGEOBJECT DETECTION FOR SERVICE ROBOT USING RANGE AND COLOR FEATURES OF AN IMAGE
OBJECT DETECTION FOR SERVICE ROBOT USING RANGE AND COLOR FEATURES OF AN IMAGE
 
Object Detection for Service Robot Using Range and Color Features of an Image
Object Detection for Service Robot Using Range and Color Features of an ImageObject Detection for Service Robot Using Range and Color Features of an Image
Object Detection for Service Robot Using Range and Color Features of an Image
 

Plus de Yu Huang

Application of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingApplication of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingYu Huang
 
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...Yu Huang
 
Data Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingData Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingYu Huang
 
Techniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingTechniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingYu Huang
 
BEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationBEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationYu Huang
 
BEV Object Detection and Prediction
BEV Object Detection and PredictionBEV Object Detection and Prediction
BEV Object Detection and PredictionYu Huang
 
Fisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIFisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIYu Huang
 
Fisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VFisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VYu Huang
 
Fisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVFisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVYu Huang
 
Prediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduPrediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduYu Huang
 
Cruise AI under the Hood
Cruise AI under the HoodCruise AI under the Hood
Cruise AI under the HoodYu Huang
 
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)Yu Huang
 
Scenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingScenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingYu Huang
 
How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?Yu Huang
 
Simulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgSimulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgYu Huang
 
Prediction and planning for self driving at waymo
Prediction and planning for self driving at waymoPrediction and planning for self driving at waymo
Prediction and planning for self driving at waymoYu Huang
 
Jointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningJointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningYu Huang
 
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingYu Huang
 
Open Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planningOpen Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planningYu Huang
 
Lidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rainLidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rainYu Huang
 

Plus de Yu Huang (20)

Application of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingApplication of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous Driving
 
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
 
Data Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingData Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous Driving
 
Techniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingTechniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous Driving
 
BEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationBEV Joint Detection and Segmentation
BEV Joint Detection and Segmentation
 
BEV Object Detection and Prediction
BEV Object Detection and PredictionBEV Object Detection and Prediction
BEV Object Detection and Prediction
 
Fisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIFisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VI
 
Fisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VFisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving V
 
Fisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVFisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IV
 
Prediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduPrediction,Planninng & Control at Baidu
Prediction,Planninng & Control at Baidu
 
Cruise AI under the Hood
Cruise AI under the HoodCruise AI under the Hood
Cruise AI under the Hood
 
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
 
Scenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingScenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous Driving
 
How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?
 
Simulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgSimulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atg
 
Prediction and planning for self driving at waymo
Prediction and planning for self driving at waymoPrediction and planning for self driving at waymo
Prediction and planning for self driving at waymo
 
Jointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningJointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planning
 
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous driving
 
Open Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planningOpen Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planning
 
Lidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rainLidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rain
 

Dernier

Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - GuideGOPINATHS437943
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingBootNeck1
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating SystemRashmi Bhat
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating SystemRashmi Bhat
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptNarmatha D
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...121011101441
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxsiddharthjain2303
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating SystemRashmi Bhat
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONjhunlian
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 

Dernier (20)

Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - Guide
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event Scheduling
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating System
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating System
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.ppt
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptx
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating System
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 

3-d interpretation from single 2-d image V

  • 1. 3D Interpretation from Single 2D Image for Autonomous Driving V Yu Huang Yu.huang07@gmail.com Sunnyvale, California
  • 2. Outline • MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation • M3DSSD: Monocular 3D Single Stage Object Detector • Delving into Localization Errors for Monocular 3D Object Detection • GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection • Objects are Different: Flexible Monocular 3D Object Detection
  • 3. MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation • MonoRUn: Self supervised, learn dense correspondences and geometry; • Robust KL loss: minimize the uncertainty weighted projection error; • Uncertainty aware region reconstruction network for 3-d object coordinate regression; • uncertainty-driven PnP for object pose and covariance matrix estimation; • Codes: https://github.com/tjiiv- cprg/MonoRUn.
  • 4. MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation
  • 5. MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation
  • 6. MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation
  • 7. M3DSSD: Monocular 3D Single Stage Object Detector • Monocular 3D Single Stage object Detector (M3DSSD) with feature alignment and asymmetric non-local attention. • Current anchor-based monocular 3D object detection methods suffer from feature mismatching. • To overcome this, propose a two-step feature alignment approach. • In the first step, the shape alignment is performed to enable the receptive field of the feature map to focus on the pre-defined anchors with high confidence scores. • In the second step, the center alignment is used to align the features at 2D/3D centers. • Further, it is often difficult to learn global information and capture long-range relationships, which are important for the depth prediction of objects. • Asymmetric non-local attention block with multiscale sampling to extract depth- wise features. • The code is released at https://github.com/mumianyuxin/M3DSSD.
  • 8. M3DSSD: Monocular 3D Single Stage Object Detector The architecture of M3DSSD. (a) The backbone of the framework, which is modified from DLA-102. (b) The two-step feature alignment, classification head, 2D/3D center regression heads, and ANAB especially designed for predicting the depth z3d. (c) Other regression heads.
  • 9. M3DSSD: Monocular 3D Single Stage Object Detector The architecture of shape alignment and the outcome of shape alignment on objects. The yellow squares indicate the sampling location of the AlignConv, and the anchors are in red.
  • 10. M3DSSD: Monocular 3D Single Stage Object Detector The architectures of center alignment and the outcome of the center alignment. When applying center alignment to objects, the sampling locations on the foreground regions (in white) all concentrate on the centers of objects (in yellow) after center alignment, which are near to the true centers of objects (in red).
  • 11. M3DSSD: Monocular 3D Single Stage Object Detector Asymmetric Non-local Attention Block. The key and query branches share the same attention maps, which forces the key and value to focus on the same place. Bottom: Pyramid Average Pooling with Attention (PA2) that generates different level descriptors in various resolutions.
  • 12. M3DSSD: Monocular 3D Single Stage Object Detector
  • 13. M3DSSD: Monocular 3D Single Stage Object Detector
  • 14. M3DSSD: Monocular 3D Single Stage Object Detector
  • 15. Delving into Localization Errors for Monocular 3D Object Detection • Quantify the impact introduced by each sub-task and found the ‘localization error’ is the vital factor in restricting monocular 3D detection. • Besides, investigate the underlying reasons behind localization errors, analyze the issues they might bring, and propose three strategies. • First, misalignment between the center of the 2D bounding box and the projected center of the 3D object, which is a vital factor leading to low localization accuracy. • Second, we observe that accurately localizing distant objects with existing technologies is almost impossible, while those samples will mislead the learned network. To remove such samples from the training set for improving the overall performance of the detector. • Lastly, 3D IoU oriented loss for the size estimation of the object, which is not affected by ‘localization error’. • Codes: https://github.com/xinzhuma/monodle.
  • 16. Delving into Localization Errors for Monocular 3D Object Detection
  • 17. Delving into Localization Errors for Monocular 3D Object Detection • Coupled with the errors accumulated by other tasks such as depth estimation, it becomes an almost impossible task to accurately estimate the 3D bounding box of distant objects from a single monocular image, unless the depth estimation is accurate enough (not achieved to date). • For estimating the coarse center c, 1) use the projected 3D center cw as the ground-truth for the branch estimating coarse center c and 2) force the model to learn features from 2D detection simultaneously. • Here adopting the projected 3D center cw as the ground-truth for the coarse center c, helps the branch for estimating the coarse center aware of 3D geometry and more related to the task of estimating 3D object center, which is the key of localization problem. • 2D detection serves as auxiliary task to learn better 3D aware features.
  • 18. Delving into Localization Errors for Monocular 3D Object Detection • Two schemes are proposed on how to generate the object level training weight for sample: • Hard coding: discard all samples over a certain distance • Soft coding: generate it using a reverse sigmoid-like function • A IoU oriented optimization for 3D size estimation: Specifically, suppose all prediction items except the 3D size s = [h,w,l]3D are completely correct, then
  • 19. Delving into Localization Errors for Monocular 3D Object Detection
  • 20. Delving into Localization Errors for Monocular 3D Object Detection
  • 21. Delving into Localization Errors for Monocular 3D Object Detection
  • 22. GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection • While there were attempts to include NMS in the training pipeline for tasks such as 2D object detection, they have been less widely • adopted due to a non-mathematical expression of the NMS. • It integrate GrooMeD-NMS – a Grouped Mathematically Differentiable NMS for monocular 3D object detection, such that the network is trained end-to- end with a loss on the boxes after NMS. • First formulate NMS as a matrix operation and then group and mask the boxes in an unsupervised manner to obtain a simple closed-form expression of the NMS. • GrooMeDNMS addresses the mismatch between training and inference pipelines and, therefore, forces the network to select the best 3D box in a differentiable manner.
  • 23. GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection (a) Conventional object detection has a mismatch between training and inference as it uses NMS only in inference. (b) To address this, propose a novel GrooMeD-NMS layer, such that the network is trained end-to-end with NMS applied. s and r denote the score of boxes B before and after the NMS respectively. O denotes the matrix containing IoU2D overlaps of B. Lbefore denotes the losses before the NMS, while Lafter denotes the loss after the NMS. (c) GrooMeD-NMS layer calculates r in a differentiable manner giving gradients from Lafter when the best-localized box corresponding to an object is not selected after NMS.
  • 24. GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection
  • 25. GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection write the rescores r in a matrix formulation as written compactly as where P, called the Prune Matrix, is obtained when the pruning function p operates element-wise on O. to avoid recursion, use as the solution cluster boxes in an image in an unsupervised manner based on IoU2D overlaps to obtain the groups G. Grouping thus mimics the grouping of the classical NMS, but does not rescore the boxes. Rewrite as
  • 26. GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection Classical NMS considers the IoU2D of the top-scored box with other boxes. This consideration is equivalent to only keeping the column of O corresponding to the top box while assigning the rest of the columns to be zero. Implement this through masking of PGk . Let MGk denote the binary mask corresponding to group Gk. Due to Frobenius matrix, simplify further to get entries in the binary matrix MGk in the column correspond. to the top scored box are 1 and the rest are 0.
  • 27. GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection pruning function
  • 28. GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection The method consists of M3DRPN and uses binning and self-balancing confidence. The boxes’ self-balancing confidence are used as scores s, which pass through the GrooMeD-NMS layer to obtain the rescores r. The rescores signal the network if the best box has not been selected for a particular object. The target assignment is For calculating gIoU3D, first calculate the volume V and hull volume Vhull of the 3D boxes. Vhull is the product of gIoU2D in Birds Eye View (BEV), removing the rotations and hull of the Y dimension. if the best boxes are correctly ranked in one image and are not in the second, then the gradients only affect the boxes of the second image. modification as Image-wise AP-Loss use the modified AP-Loss as the loss after NMS since AP-Loss does not suffer from class imbalance.
  • 29. GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection
  • 30. GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection
  • 31. Objects are Different: Flexible Monocular 3D Object Detection • Most existing methods adopt the same approach for all objects regardless of diverse distributions, leading to limited performance for truncated objects. • A flexible framework for monocular 3D object detection which explicitly decouples the truncated objects and adaptively combines multiple approaches for object depth estimation. • Specifically, decouple the edge of the feature map for predicting long-tail truncated objects so that the optimization of normal objects is not influenced. • Furthermore, formulate the object depth estimation as an uncertainty-guided ensemble of directly regressed object depth and solved depths from different groups of keypoints. • Code to be at https://github.com/zhangyp15/MonoFlex.
  • 32. Objects are Different: Flexible Monocular 3D Object Detection
  • 33. Objects are Different: Flexible Monocular 3D Object Detection • The framework is extended from CenterNet, where objects are identified by their representative points and predicted by peaks of the heatmap. • First, the CNN backbone extracts feature maps from the monocular image as the input for multiple prediction heads. • Multiple prediction branches are deployed on the shared backbone to regress objects’ properties, including the 2D bounding box, dimension, orientation, keypoints, and depth. • The image-level localization involves the heatmap and offsets, where the edge fusion modules are used to decouple the feature learning and prediction of truncated objects. • The final depth estimation is an uncertainty guided combination of the regressed depth and the computed depths from estimated keypoints and dimensions. • The adaptive depth ensemble adopts four methods for depth estimation and simultaneously predicts their uncertainties, which are utilized to form an uncertainty weighted prediction.
  • 34. Objects are Different: Flexible Monocular 3D Object Detection The dimension and orientation can be directly inferred from appearance-based clues, while the 3D location is converted to the projected 3D center xc = (uc, vc) and the object depth z as Existing methods utilize a unified representation xr, the center of 2D bounding box xb, for every object. In such cases, the offset 𝛿c = xc - xb is regressed to derive the projected 3D center xc. (a) The 3D location is converted to the projected center and the object depth. (b) The distribution of the offsets c from 2D centers to projected 3D centers. Inside and outside objects exhibit entirely different distributions. Divide objects into two groups depending on whether their projected 3D centers are inside or outside the image. The joint learning of 𝛿c can suffer from long-tail offsets and therefore to decouple the representations and the offset learning of inside and outside objects.
  • 35. Objects are Different: Flexible Monocular 3D Object Detection Inside Objects’ discretization offset error Outside Objects’ discretization offset error (a) The intersection xI between the image edge and the line from xb to xc is used to represent the truncated object. (b) The edge heatmap is generated with 1D Gaussian distribution. (c) The edge intersection xI (cyan) is a better representation than 2D center xb (green) for heavily truncated objects. Since 2D bounding boxes only capture the inside- image part of objects, the visual locations of xb can be confusing and even on other objects. By contrast, the intersection xI disentangles the edge area of the heatmap to focus on outside objects and offers a strong boundary prior to simplify the localization.
  • 36. Objects are Different: Flexible Monocular 3D Object Detection • Edge fusion module to further decouple the feature learning and prediction of outside objects; • The module first extracts four boundaries of the feature map and concatenates them into an edge feature vector in clockwise order, which is then processed by two 1D convolutional layers to learn • unique features for truncated objects. • Finally, the processed vector is remapped to the four boundaries and added to the input feature map.
  • 37. Objects are Different: Flexible Monocular 3D Object Detection Relation of global orientation, local orientation, and the viewing angle. Keypoints include the projections of eight vertexes, top center and bottom center of the 3D bounding box. The depth of a supporting line of the 3D bounding box can be computed with the object height and the line’s pixel height. Split ten keypoints into three groups, each of which can produce the center depth independently
  • 38. Objects are Different: Flexible Monocular 3D Object Detection
  • 39. Objects are Different: Flexible Monocular 3D Object Detection
  • 40. Objects are Different: Flexible Monocular 3D Object Detection
  • 41. Objects are Different: Flexible Monocular 3D Object Detection